2016年12月20日 星期二

Text Analytics

Text Analytics is the process of converting unstructured text data into meaningful data for analysis, to measure customer opinions, product reviews, feedback, to provide search facility, sentimental analysis and entity modeling to support fact based decision making. Text analysis uses many linguistic, statistical, and machine learning techniques.
http://www.predictiveanalyticstoday.com/text-analytics/

Text Analytics
Text Analytics Process Flow


2016年12月7日 星期三

NLP survy

Gate-vs-UIMA-vs-OpenNLP
https://app.assembla.com/spaces/extraction-of-cost-data/wiki/Gate-vs-UIMA-vs-OpenNLP

NLP相關資源
https://handong1587.github.io/deep_learning/2015/10/09/nlp.html

NLP 笔记--徐阿衡

NLP 笔记 - Sentiment Analysis
06-01
论文笔记 - Learning to Extract Conditional Knowledge for Question Answering using Dialogue
05-24
NLP 笔记 - Text Summarization
05-10
NLP 笔记 - Machine Translation
05-01
NLP笔记 - NLU之意图分类
04-27
NLP 笔记 - Compositional Semantics
04-13
NLP笔记 - Relation Extraction


Deep Learning for NLP course
oxford: https://github.com/oxford-cs-deepnlp-2017/lectures

The course provides a deep excursion into cutting-edge research in deep learning applied to NLP. The final project will involve training a complex recurrent neural network and applying it to a large scale NLP problem.
http://cs224d.stanford.edu/syllabus.html
http://cs224d.stanford.edu/

How to Generate a Good Word Embedding? (source code)
Folder embedding contains all embedding algorithms we used in this paper.

Folder evaluation contains all evaluation tasks in the paper.
https://github.com/licstar/compare

Blog文章
  • Deep Learning in NLP
  • 维基百科简体中文语料的获取
  • 《How to Generate a Good Word Embedding?》导读
http://licstar.net/archives/category/%E8%87%AA%E7%84%B6%E8%AF%AD%E8%A8%80%E5%A4%84%E7%90%86

Using NLP, Machine Learning & Deep Learning Algorithms to Extract Meaning from Text
https://www.infoq.com/presentations/nlp-machine-learning-meaning-text

NLP频道
https://liweinlp.com/?p=29
http://blog.sciencenet.cn/blog-362400-902391.html

立委博士,自然語言處理(NLP)資深架構師,Pinciple Scientist, jd-valley, Netbase前首席科學家,期間指揮團隊研發了18種語言的理解和應用系統。特別是漢語和英語,具有世界一流的分析(parsing)精度,並且做到魯棒、線速,scale up to大數據,語義落地到數據挖掘和問答產品。Cymfony前研發副總,曾榮獲第一屆問答系統第一名(TREC-8 QA Track),並贏得17個美國國防部的信息抽取項目(PI for 17 SBIRs)。立委NLP工作的應用方向包括大數據輿情挖掘、客戶情報、信息抽取、知識圖譜、問答系統、智能助理、語義搜索等等。

Introduction to NLP Architecture
https://www.linkedin.com/pulse/introduction-nlp-architecture-wei-li
   

Introduction to Natural Language Processing (NLP) 2016
The field of study that focuses on the interactions between human language and computers is called Natural Language Processing, or NLP for short. 
http://blog.algorithmia.com/introduction-natural-language-processing-nlp/


2016年12月6日 星期二

FinTech Industry

31 Hottest FinTech Startups Defining the New York FinTech Industry
7 Most Important Things You Should Know About the Global FinTech Industry
How Banks Are Joining Hands With FinTech Firms to Serve Customers
Industry report - FinTech patents: where finance meets technology

31 Hottest FinTech Startups Defining the New York FinTech Industry
https://letstalkpayments.com/31-hottest-fintech-startups-defining-the-new-york-fintech-industry/
new york fintech

2016年12月5日 星期一

自然語言處理常用工具及選擇匯總

NLP Tools (http://www.coli.uni-saarland.de/~csporled/page.php?id=tools)
General
  • NLTK: the Natural Language Processing Toolkit
  • WEKA: easy to use toolkit to play around with different machine learning algorithms
  • CoNLL shared task data: annotated data sets for a number of NLP tasks in a number of languages
Web Crawler
Information Retrieval
Language Identification
Pre-processing (Sentence Splitters, Tokenisers, POS Taggers, Lemmatisers, Morphological Analysers)
Syntactic Analysis, Parsers
Text Mining / Information Extraction
Semantic Analysis
Webpages with further information on NLP resources
    NLTK is the most famous Python Natural Language Processing Toolkit, here I will give a detail tutorial about NLTK.


    Biomedical natural language processing
    包含各種語言模型的來源和工具

    http://bio.nlplab.org/


    Awesome Community-Curated NLP List
    https://github.com/alvations/awesome-community-curated-nlp

    用于自然语言处理的Java或Python
    https://gxnotes.com/article/51044.html

    A curated list of resources for NLP (Natural Language Processing) for Chinese 中文自然语言处理相关资料
    https://github.com/crownpku/awesome-chinese-nlp

    1. Chinese NLP Toolkits 中文NLP工具
    • Toolkits 综合NLP工具包
    • 常用的英文或支持多语言的NLP工具包
    • Chinese Word Segment 中文分词
    • Information Extraction 信息提取
    • QA & Chatbot 问答和聊天机器人
    2. Corpus 中文语料
    3. Organizations 相关中文NLP组织和会议
    4. Learning Materials 学习资料


    Natural Language Processing Tools (http://www.phontron.com/nlptools.php)

    Open-source NLP software
    Slide 1
    http://entopix.com/so-you-need-to-understand-language-data-open-source-nlp-software-can-help.html

    Python NLTK Tools List for Natural Language Processing (NLP)
    http://www.datasciencecentral.com/profiles/blogs/python-nlp-tools

    Some of the most popular NLP/NLU platforms
    • IBM's Watson Conversation Service
    • Microsoft LUIS
    • Google Natural Language API
    • Wit.ai
    • Api.ai
    • Alexa Skills Kit
    • Recast.AI
    • Pat

    2016年12月4日 星期日

    Natural Language Processing

    [cml_media_alt id='1915']celi-natural-language-processing[/cml_media_alt]

    https://www.celi.it/en/technology/natural-language-processing/

    Steps in NLP
    https://www.tutorialspoint.com/artificial_intelligence/artificial_intelligence_natural_language_processing.htm
    NLP Steps
    other: https://www.tekkkies.com/soft-computing-techniques-and-applications/

    Machine Learning vs. Natural Language Processing
    https://www.lexalytics.com/lexablog/2012/machine-learning-vs-natural-language-processing-part-1

    1 Building Natural Language Generation Systems Robert Dale and Ehud Reiter. - ppt download

    1 Building Natural Language Generation Systems Robert Dale and Ehud Reiter. - ppt download

    Text Analytics and Natural Language Processing in the Era of Big Data

    What is Text Analytics Good For?
    Text analytics spans across virtually all verticals. We frequently come across text analytics use cases in finance, insurance, media, and retail industries, but even oil and gas companies can derive value from text analytics. The table on the right outlines verticals and their most frequent text analytics use-cases. Next, we will describe some of these verticals and use-cases in more detail.

    https://blog.pivotal.io/data-science-pivotal/features/text-analytics-and-natural-language-processing-in-the-era-of-big-data

    3 Key Capabilities Necessary

    What is Natural Language Generation?


    http://resources.narrativescience.com/h/i/124944227-what-is-natural-language-generation

    2016年11月30日 星期三

    The Path to Enterprise Machine Learning

    Just like any technical or business IT capability, one pre-requisite for adoption is understanding the WHAT and the WHY; and a clear definition of these aspects at the beginning of the Machine Learning journey is critical to charting the course.

    2016年11月28日 星期一

    Capabilities of AI in context by data types

    Knowledge Acquisition
    http://www.cio.com/article/3050832/business-intelligence/big-data-and-machine-learning-is-the-glass-half-empty.html

    Who knows why ontologies are so hidden ?

    Ontologies are structural frameworks for organizing information and are used as knowledge representation. Ontology management supports and expands data modeling methodologies to exploit the business value locked up in information silos. Information leaders must embrace this emerging trend.”
    Guido De Simoni - Gartner (23/02/2015)

    In short, ontologies become increasingly crucial for several domains and especially for domains which are always more investigated :
    • Knowledge Management, where ontologies allow a clear, deep and shared understanding of any knowledge domain ...
    • Semantic Web, where ontologies can give meaning to published data ...
    • Big Data, where ontologies allow semantic integration across several data sources ...
    • Software Engineering, where ontologies can play the role of a formal business specification ...
    • Artificial Intelligence, where ontologies can represent required knowledge for reasoning ...
    • Robotics, where ontologies can help robots to be aware about their environment ...

    Ontologies role in Big Data initiatives

    Why Ontologies
    In short, people interpret, machines don’t. As such, an effort must be undertaken in order to support adequate usage of digital resources. Ontologies are useful when meanings need to be formally defined.

    Ambiguity for computer
    The problem is that the word “rice“ or “cook” has no meaning, or semantic content, to the computer.

    https://www.linkedin.com/pulse/big-data-initiatives-ontologies-role-pete-ianace?articleId=8647865779102847774#comments-8647865779102847774&trk=sushi_topic_posts_guest

    2016年11月8日 星期二

    Top Algorithms and Methods Used by Data Scientists

    The following table shows usage of different algorithms types: Supervised, Unsupervised, Meta, and other by Employment type.
    Table 1: Algorithm usage by Employment Type 
    Employment Type% VotersAvg Num Algorithms Used% Used Super-
    vised
    % Used Unsuper-
    vised
    % Used Meta% Used Other Methods
    Industry59%8.494%81%55%83%
    Government/Non-profit4.1%9.591%89%49%89%
    Student16%8.194%76%47%77%
    Academia12%7.295%81%44%77%
    All8.394%82%48%81%

    Table 2: Top 10 Algorithms + Deep Learning usage by Employment Type
    AlgorithmIndustryGovernment/Non-profitAcademiaStudentAll
    Regression71%63%51%64%67%
    Clustering58%63%51%58%57%
    Decision59%63%38%57%55%
    Visualization55%71%28%47%49%
    K-NN46%54%48%47%46%
    PCA43%57%48%40%43%
    Statistics47%49%37%36%43%
    Random Forests40%40%29%36%38%
    Time series42%54%26%24%37%
    Text Mining36%40%33%38%36%
    Deep Learning18%9%24%19%19%

    2016年11月7日 星期一

    Machine Learning: A Complete and Detailed Overview

    The 10 Algorithms Machine Learning 
    http://www.kdnuggets.com/2016/08/10-algorithms-machine-learning-engineers.html

     Support Vector Machines
    Naïve Bayes Classification


    Decision Trees
    Ordinary Least Squares Regression
    Logistic Regression
    Ensemble Methods
    Clustering Algorithms
    Principal Component Analysis
    Independent Component Analysis
    Singular Value Decomposition.



    The cognitive platform

    These cognitive services can be fueled by publically available web and social data, your own private information, or data you acquire from data partners or others. The APIs on this platform can be grouped into four categories:
    • Language: A set of APIs including, but not limited to, classifying natural language text, conversations, entity extraction, semantic concept extraction, document conversion, language translation, passage retrieval and ranking, relationship extraction, tone analysis, and so on.
    • Speech: A set of APIs for converting speech to text and text to speech, including the ability to train with your own language models.
    • Vision: APIs to find new insights, derive significant value, and take meaningful action from images.
    • Data Insights: Pre-enriched content (for example, news and blogs) with natural language processing to allow for highly targeted search and trend analysis.
    Each of these APIs can perform a different task, and in combination they can be adapted to solve numerous business problems or create deeply engaging experiences. When you combine these cognitive services and overlay with (traditional) data analytics capabilities, it facilitates for complex discoveries, predictive insights, and engines to carry the decisions that are driven by the insights.
    Graph showing unstructured data moving to knowledge

    The Machine Learning Framework

    An average data scientist deals with loads of data daily. Some say over 60-70% time is spent in data cleaning, munging and bringing data to a suitable format such that machine learning models can be applied on that data. This post focuses on the second part, i.e., applying machine learning models, including the preprocessing steps. The pipelines discussed in this post come as a result of over a hundred machine learning competitions that I’ve taken part in. It must be noted that the discussion here is very general but very useful and there can also be very complicated methods which exist and are practised by professionals.
    Figure from: A. Thakur and A. Krohn-Grimberghe, AutoCompete: A Framework for Machine Learning Competitions, AutoML Workshop, International Conference on Machine Learning 2015.
    http://blog.kaggle.com/2016/07/21/approaching-almost-any-machine-learning-problem-abhishek-thakur/

    Architecture: Real-Time Stream Processing for IoT

    This article describes the infrastructure to handle streams of data fed from millions of intelligent devices in the Internet of Things (IoT). The architecture for this type of real-time stream processing must deal with data import, processing, storage, and analysis of hundreds of millions of events per hour. The architecture below depicts just such a system.
    Diagram
    IoT Stream Processing Architecture

    https://cloud.google.com/solutions/architecture/real-time-stream-processing-iot

    Selecting the right chart for data visualization needs


    Chart Selection - Data Visualization Needs
    http://www.bigdataanalyticsguide.com/2016/07/25/selecting-right-chart-data-visualization-needs/

    Data Science Automation For Big Data and IoT Environments

    The purpose of data science is not only to do machine learning or statistical analysis, but also to derive insights out of the data that a user with no statistics knowledge can understand.
    The half of data science that requires manual intervention is still to be automated. However, those are areas that involve the experience and wisdom of a people: a data scientist, a business expert, a software developer, a data integrator, everyone who currently contributes to making a data-science project operational. This makes it difficult to automate every aspect of data science. However, we can think of data science automation as a two level architecture, wherein:
    – Different data science disciplines/components are automated
    – All the individual automated components are interconnected to form a coherent data-science system
    The required elements of an automated data science system
    Figure 1. The required elements of an automated data science system.

    2016年10月30日 星期日

    Microsoft R Server available free to students with DreamSpark

    Revolution R renamed Microsoft R, available free to developers and students

    Since Microsoft acquired Revolution Analytics, there have been a steady stream of updates to Revolution R Open and Revolution R Enterprise (not to mention integration of R with SQL Server, PowerBI, Azure and Cortana Analytics).
    Microsoft r open
    Revolution R Enterprise, the big-data capable R distribution for servers, Hadoop clusters, and data warehouses has been updated for its new release, Microsoft R Server 2016.

    Microsoft R Server provides a number of inherently parallel, distributed algorithms for statistical analysis and machine learning. These include a high performance implementations of Generalized Linear Models, K-means clustering, the Naïve Bayes classifier, decision trees, random forests and much more.

    http://blog.revolutionanalytics.com/2016/01/microsoft-r-open.html
    http://blog.revolutionanalytics.com/2016/01/r-dreamspark.html

    2016年10月26日 星期三

    Big Data & Machine Learning Solutions Decision Tree


    Big Data Solutions Decision Tree

     Process of solution selection for Big Data projects is very complex with a lot of factors. Here is the decision tree, which maps the three types of problems to specific solutions. 
    big-data-decision-tree-v1-10

    Machine Learning Solutions Decision Tree

    Machine learning is a technique of data science that helps computers learn from existing data in order to forecast future behaviors, outcomes, and trends. Currently there are lot of products which can be used for this on-premises or in the cloud, based on single node or multiple nodes, in relational database or in Hadoop based storage.
    machine-learning-dt-v1-02

    2016年10月24日 星期一

    Any data science project should be driven by business problems that means data science serves an organization by providing answers for its business problems and strategies in decision making process.
    The chart below is a mapping from business problems into types of learning methods but it’s not a mapping from a specific business application to a specific scientific method. The right methods should be chosen according to a specific business problem and the end performance matric.

    https://www.linkedin.com/pulse/data-science-landscape-ling-zhang

    KDnuggets Data Science Software Poll Big Data vs Deep Learning affinity for top tools

    http://bbs.pinggu.org/thread-4678414-1-1.html

    Advanced Analytics & Business Intelligence Comparison Table

    We know that analytics refers to the skills, technologies, applications and practices for continuous iterative exploration and investigation of data to gain insight and drive business planning. Analytics consists of two major areas: Business Intelligence and Advanced Analytics.
    Advanced Analytics & Business Intelligence Comparison
    http://newscentral.exsees.com/item/53349ecf406c333c9e3aa977a47166d8-28d29ae28711ca128d5e6fc7395808a6

    Types of Bots: An Overview

    Learn more about all the different varieties of bots, and what they can do for you http://botnerds.com/types-of-bots/ In this articl...