2013年9月9日 星期一
2013年8月8日 星期四
Inverted list
Sample document collection
Id Contents
1. The only way not to think about money is to have a great deal of it.
2. When I was young I thought that money was the most important thing in life; now that I am old I know that it is.
3. A man is usually more careful of his money than he is of his principles.
The indexes are:
List 1 Just the document lists. The format is (d1, d2, . . .), where dn is the document id number.
List 2 Document lists with word frequencies. The format is (d1:f1, d2:f2, . . .), where dn is the document id number and fn is the word frequency.
List 3 Document lists and word positions with word granularity. The format is (d1:(w1, w2, . . .), (d2:(w1, w2, . . .), . . .), where dn is the document id number and wn are the word positions.
Id Contents
1. The only way not to think about money is to have a great deal of it.
2. When I was young I thought that money was the most important thing in life; now that I am old I know that it is.
3. A man is usually more careful of his money than he is of his principles.
The indexes are:
List 1 Just the document lists. The format is (d1, d2, . . .), where dn is the document id number.
List 2 Document lists with word frequencies. The format is (d1:f1, d2:f2, . . .), where dn is the document id number and fn is the word frequency.
List 3 Document lists and word positions with word granularity. The format is (d1:(w1, w2, . . .), (d2:(w1, w2, . . .), . . .), where dn is the document id number and wn are the word positions.
基板溫度屬高溫區,表面化學反應速率高,反應成分之質量傳輸率限制磊晶之生長率。此時,薄膜磊晶生長率與厚度之均勻度則受反應器內反應物之流場性質及輸送現象(transport phenomena),包含
- 氣流、速度、溫度分布、熱傳及質傳等因素之影響
- 氣體反應生成物之流速、進氣方式、基板
- 與反應室內之溫度、壓力及其幾何形狀等參數
- 反應器內因氣體之間溫差而有溫度梯度之存在,流場中之熱效應可藉由溫度特徵△T 表示之
- 熱浮力(thermal buoyancy effect)為反應器中最重要之熱效應,該熱效應之強弱可由一無因次之Grashof數表示之。
- 在MOCVD 反應器中,由於溫度梯度與反應氣體各成分之濃度梯度共存,而有所謂熱質傳耦合現象。
以上摘錄至:郭峰鳴(93),MOCVD 反應器之氮化鎵薄膜成長參數探討.
2013年6月20日 星期四
Aspect Extraction
Mukherjee and Liu (2012). Aspect Extraction through Semi-Supervised Modeling. ACL.
- A key task of the framework is to extract aspects of entities that have been commented in opinion documents.
- Two main types:
- The first type only extracts aspect terms without grouping them;
- The second type uses statistical topic models to extract aspects and group them.
- This paper that given some seeds in the user interested categories.
- The models are related to the DFLDA model in (Andrzejewski et al., 2009), while DF-LDA is only for topics/aspects.
- There are many existing works on aspect extraction
- to find frequent noun terms and possibly with the help of dependency relations
- to use supervised sequence labeling
- Aspect and sentiment extraction using topic modeling come in two flavors:
- discovering aspect words sentiment wise (放在一起表示)
- separately discovering both aspects and sentiments (used Maximum-Entropy, Mei
et al., 2007; Zhao et al., 2010) - 思考上述兩種方法的優缺點,改進的空間
- One problem with these existing models is that many discovered aspects are not understandable / meaningful to users.
- Standard LDA and existing aspect and sentiment models based on document level, so many “non-specific” terms being pulled and clustered
- Aspect terms tend to be nouns or noun phrases and sentiment terms tend to be adjectives, adverbs
- Separateing aspects and opinion words can be very useful.
- can be used to construct a domain-dependent sentiment lexicon and applied to tasks such as sentiment classification.
- Global topic models may not be suitable for detecing rateable aspects.
- Aspects are important because without knowing them, the opinions expressed in a sentence or a review are of limited use.
2013年6月19日 星期三
multi-aspect sentence
Many sentences in real reviews often involve two or more aspects.
The first sentence contains three single-aspect segments: an environment-segment (环境不错/ the environment is nice), a food-segment (菜品一般/ the quality of food is so so), and a charge-segment (很贵/ the food is very expensive)
The first sentence contains three single-aspect segments: an environment-segment (环境不错/ the environment is nice), a food-segment (菜品一般/ the quality of food is so so), and a charge-segment (很贵/ the food is very expensive)
2013年6月17日 星期一
- topic: a multinomial distribution over words that represents a coherent concept in text.
- aspect: a multinomial distribution over words that represents a more speci c topic in reviews, for example,"lens" in camera reviews.
- senti-aspect: a multinomial distribution over words that represents a pair of aspect and sentiment, for example, "screen, positive" in a laptop review.
- affective word: a word that expresses a feeling, for example "satisfied", "disappointed".
- evaluative word: a word that expresses sentiment by evaluating an aspect, for example, "excellent", "nice".
- general evaluative word: an evaluative word that expresses a consistent sentiment every time it is used, for example, "good", "bad".
- aspect-specific evaluative word: an evaluative word that may express di erent sentiments depending on the aspect, for example, a "small" font size on a monitor that is hard to read vs. a "small" vacuum that is portable.
- sentiment word: a word that conveys sentiment. It is either an a ective word, general evaluative word, or aspect-speci c evaluative word.
source: Jo and Oh, WSDM'11.
Gold-standard lexicon
The gold-standard lexicon mentioned in the former case is obtained through one
of the following ways:
a) by manually tagging words from a domain corpus;
b) by one or more domain experts choosing aspects and keywords without the use of a
corpus; or
c) using review sets that have already been annotated with aspects and
keywords by the original reviewers
of the following ways:
a) by manually tagging words from a domain corpus;
b) by one or more domain experts choosing aspects and keywords without the use of a
corpus; or
c) using review sets that have already been annotated with aspects and
keywords by the original reviewers
文章 (Atom)
Types of Bots: An Overview
Learn more about all the different varieties of bots, and what they can do for you http://botnerds.com/types-of-bots/ In this articl...