2016年11月30日 星期三

The Path to Enterprise Machine Learning

Just like any technical or business IT capability, one pre-requisite for adoption is understanding the WHAT and the WHY; and a clear definition of these aspects at the beginning of the Machine Learning journey is critical to charting the course.

2016年11月28日 星期一

Capabilities of AI in context by data types

Knowledge Acquisition
http://www.cio.com/article/3050832/business-intelligence/big-data-and-machine-learning-is-the-glass-half-empty.html

Who knows why ontologies are so hidden ?

Ontologies are structural frameworks for organizing information and are used as knowledge representation. Ontology management supports and expands data modeling methodologies to exploit the business value locked up in information silos. Information leaders must embrace this emerging trend.”
Guido De Simoni - Gartner (23/02/2015)

In short, ontologies become increasingly crucial for several domains and especially for domains which are always more investigated :
  • Knowledge Management, where ontologies allow a clear, deep and shared understanding of any knowledge domain ...
  • Semantic Web, where ontologies can give meaning to published data ...
  • Big Data, where ontologies allow semantic integration across several data sources ...
  • Software Engineering, where ontologies can play the role of a formal business specification ...
  • Artificial Intelligence, where ontologies can represent required knowledge for reasoning ...
  • Robotics, where ontologies can help robots to be aware about their environment ...

Ontologies role in Big Data initiatives

Why Ontologies
In short, people interpret, machines don’t. As such, an effort must be undertaken in order to support adequate usage of digital resources. Ontologies are useful when meanings need to be formally defined.

Ambiguity for computer
The problem is that the word “rice“ or “cook” has no meaning, or semantic content, to the computer.

https://www.linkedin.com/pulse/big-data-initiatives-ontologies-role-pete-ianace?articleId=8647865779102847774#comments-8647865779102847774&trk=sushi_topic_posts_guest

2016年11月8日 星期二

Top Algorithms and Methods Used by Data Scientists

The following table shows usage of different algorithms types: Supervised, Unsupervised, Meta, and other by Employment type.
Table 1: Algorithm usage by Employment Type 
Employment Type% VotersAvg Num Algorithms Used% Used Super-
vised
% Used Unsuper-
vised
% Used Meta% Used Other Methods
Industry59%8.494%81%55%83%
Government/Non-profit4.1%9.591%89%49%89%
Student16%8.194%76%47%77%
Academia12%7.295%81%44%77%
All8.394%82%48%81%

Table 2: Top 10 Algorithms + Deep Learning usage by Employment Type
AlgorithmIndustryGovernment/Non-profitAcademiaStudentAll
Regression71%63%51%64%67%
Clustering58%63%51%58%57%
Decision59%63%38%57%55%
Visualization55%71%28%47%49%
K-NN46%54%48%47%46%
PCA43%57%48%40%43%
Statistics47%49%37%36%43%
Random Forests40%40%29%36%38%
Time series42%54%26%24%37%
Text Mining36%40%33%38%36%
Deep Learning18%9%24%19%19%

2016年11月7日 星期一

Machine Learning: A Complete and Detailed Overview

The 10 Algorithms Machine Learning 
http://www.kdnuggets.com/2016/08/10-algorithms-machine-learning-engineers.html

 Support Vector Machines
Naïve Bayes Classification


Decision Trees
Ordinary Least Squares Regression
Logistic Regression
Ensemble Methods
Clustering Algorithms
Principal Component Analysis
Independent Component Analysis
Singular Value Decomposition.



The cognitive platform

These cognitive services can be fueled by publically available web and social data, your own private information, or data you acquire from data partners or others. The APIs on this platform can be grouped into four categories:
  • Language: A set of APIs including, but not limited to, classifying natural language text, conversations, entity extraction, semantic concept extraction, document conversion, language translation, passage retrieval and ranking, relationship extraction, tone analysis, and so on.
  • Speech: A set of APIs for converting speech to text and text to speech, including the ability to train with your own language models.
  • Vision: APIs to find new insights, derive significant value, and take meaningful action from images.
  • Data Insights: Pre-enriched content (for example, news and blogs) with natural language processing to allow for highly targeted search and trend analysis.
Each of these APIs can perform a different task, and in combination they can be adapted to solve numerous business problems or create deeply engaging experiences. When you combine these cognitive services and overlay with (traditional) data analytics capabilities, it facilitates for complex discoveries, predictive insights, and engines to carry the decisions that are driven by the insights.
Graph showing unstructured data moving to knowledge

The Machine Learning Framework

An average data scientist deals with loads of data daily. Some say over 60-70% time is spent in data cleaning, munging and bringing data to a suitable format such that machine learning models can be applied on that data. This post focuses on the second part, i.e., applying machine learning models, including the preprocessing steps. The pipelines discussed in this post come as a result of over a hundred machine learning competitions that I’ve taken part in. It must be noted that the discussion here is very general but very useful and there can also be very complicated methods which exist and are practised by professionals.
Figure from: A. Thakur and A. Krohn-Grimberghe, AutoCompete: A Framework for Machine Learning Competitions, AutoML Workshop, International Conference on Machine Learning 2015.
http://blog.kaggle.com/2016/07/21/approaching-almost-any-machine-learning-problem-abhishek-thakur/

Architecture: Real-Time Stream Processing for IoT

This article describes the infrastructure to handle streams of data fed from millions of intelligent devices in the Internet of Things (IoT). The architecture for this type of real-time stream processing must deal with data import, processing, storage, and analysis of hundreds of millions of events per hour. The architecture below depicts just such a system.
Diagram
IoT Stream Processing Architecture

https://cloud.google.com/solutions/architecture/real-time-stream-processing-iot

Selecting the right chart for data visualization needs


Chart Selection - Data Visualization Needs
http://www.bigdataanalyticsguide.com/2016/07/25/selecting-right-chart-data-visualization-needs/

Data Science Automation For Big Data and IoT Environments

The purpose of data science is not only to do machine learning or statistical analysis, but also to derive insights out of the data that a user with no statistics knowledge can understand.
The half of data science that requires manual intervention is still to be automated. However, those are areas that involve the experience and wisdom of a people: a data scientist, a business expert, a software developer, a data integrator, everyone who currently contributes to making a data-science project operational. This makes it difficult to automate every aspect of data science. However, we can think of data science automation as a two level architecture, wherein:
– Different data science disciplines/components are automated
– All the individual automated components are interconnected to form a coherent data-science system
The required elements of an automated data science system
Figure 1. The required elements of an automated data science system.

Types of Bots: An Overview

Learn more about all the different varieties of bots, and what they can do for you http://botnerds.com/types-of-bots/ In this articl...