2011年8月4日 星期四

Relationship between topic modeling and documents clustering

I am learning to use topic modeling for documents clustering. I would like to clarify whether my understaning of the relationship between latent dirichlet allocation (LDA) and the generic task of document clustering is
correct or not?

The LDA analysis tends to output the topic proportions for each document. This is not the direct result of document clustering. However, we can treat this probability proportions as a feature reprsentation for each document. Afterwards, we can invoke other established clustering method, like K-means, to cluster documents based on the feature configurations generated by LDA analysis.

The best metric we found for computing the semantic similarity of topics was a pairwise topic coherence, using the coherence metric from "Automatic Evaluation of Topic Coherence," by Newman et al., NAACL 2010.

沒有留言:

張貼留言

Types of Bots: An Overview

Learn more about all the different varieties of bots, and what they can do for you http://botnerds.com/types-of-bots/ In this articl...