LDA topic modeling using gensim

This example shows how to train and inspect an LDA topic model. As we have discussed in the lecture, topic models do two things at the same time:

  • Finding the topics. A topic is a distribution over words: for instance, there might be a topic about books which is likely to generate words such as author, book, write, etc.
  • Analyzing each document as a mixture of topics. For instance, a document that describes a French author might be a mix of a France-related topic and a book-related topic.

We will be using the gensim library, which is the most well-known Python package for topic modeling. However, it's worth noting that scikit-learn also recently added LDA to their library. See their documentation if you're interested.

To run the example, you need to install gensim. This is easy to do if you're using Anaconda.

In [1]:
import gensim

We increase the verbosity level so that we can see the progress of the algorithm. If you think it's too talkative, remove this part.

In [2]:
import logging

# for gensim to output some progress information while it's training
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

Reading the data and training the LDA model

We will use our usual dataset that consists of product reviews. As usual, each document is stored as one line in the file. (This is identical to the file we used in Assignment 2, except that we've removed the first three columns which do not contain text.) This is a fairly small example, and most real-world uses of topic models have used much larger collections of documents. (See the other notebook.)

We first create a TextCorpus, which is the component that reads documents from the file. See gensim's official documentation for more information.

In [3]:
corpus = gensim.corpora.textcorpus.TextCorpus('amazon_reviews.txt')
2018-12-18 14:56:02,860 : INFO : Initializing dictionary
2018-12-18 14:56:02,862 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2018-12-18 14:56:07,361 : INFO : adding document #10000 to Dictionary(40988 unique tokens: ['album', 'albums', 'artist', 'bad', 'bought']...)
2018-12-18 14:56:08,197 : INFO : built Dictionary(44504 unique tokens: ['album', 'albums', 'artist', 'bad', 'bought']...) from 11914 documents (total 668467 corpus positions)

Now we can train the LDA model. For details, see gensim's documentation of the class LdaModel.

The important parts here are

  • num_topics: the number of topics we'd like to use. We set this to 10 here, but if you want you can experiment with a larger number of topics.
  • passes: the number of iterations to use in the training algorithm. Using a higher number will lead to a longer training time, but sometimes higher-quality topics.
  • alpha: a parameter that controls the behavior of the Dirichlet prior used in the model. If set to a value close to zero, the model will tend to use a fewer number of topics per document; conversely, if it's a higher value, then there will be more topics per document. If set to auto, this parameter will be tuned automatically.

This training step will take a few minutes, depending on the efficiency of your machine and the value you set for passes.

In [4]:
model = gensim.models.LdaModel(corpus, id2word=corpus.dictionary,
                               alpha='auto',
                               num_topics=10,
                               passes=5)
2018-12-18 14:56:08,206 : INFO : using autotuned alpha, starting with [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]
2018-12-18 14:56:08,207 : INFO : using symmetric eta at 0.1
2018-12-18 14:56:08,217 : INFO : using serial LDA version on this node
2018-12-18 14:56:08,273 : INFO : running online (multi-pass) LDA training, 10 topics, 5 passes over the supplied corpus of 11914 documents, updating model once every 2000 documents, evaluating perplexity every 11914 documents, iterating 50x with a convergence threshold of 0.001000
2018-12-18 14:56:09,128 : INFO : PROGRESS: pass 0, at document #2000/11914
2018-12-18 14:56:10,265 : INFO : optimized alpha [0.08634485, 0.09387619, 0.08644903, 0.0948077, 0.10046951, 0.08811322, 0.095233485, 0.10149326, 0.10340158, 0.09332717]
2018-12-18 14:56:10,270 : INFO : merging changes from 2000 documents into a model of 11914 documents
2018-12-18 14:56:10,325 : INFO : topic #0 (0.086): 0.006*"camera" + 0.006*"book" + 0.006*"good" + 0.004*"like" + 0.004*"love" + 0.004*"better" + 0.004*"great" + 0.004*"music" + 0.004*"time" + 0.003*"quot"
2018-12-18 14:56:10,328 : INFO : topic #2 (0.086): 0.006*"like" + 0.005*"book" + 0.005*"great" + 0.004*"album" + 0.004*"time" + 0.003*"version" + 0.003*"film" + 0.003*"song" + 0.003*"best" + 0.003*"use"
2018-12-18 14:56:10,332 : INFO : topic #4 (0.100): 0.007*"great" + 0.007*"book" + 0.006*"use" + 0.005*"lens" + 0.005*"software" + 0.005*"good" + 0.005*"bought" + 0.005*"like" + 0.004*"product" + 0.004*"time"
2018-12-18 14:56:10,334 : INFO : topic #7 (0.101): 0.013*"book" + 0.007*"like" + 0.007*"good" + 0.005*"camera" + 0.004*"film" + 0.004*"time" + 0.004*"dvd" + 0.004*"movie" + 0.004*"great" + 0.004*"use"
2018-12-18 14:56:10,335 : INFO : topic #8 (0.103): 0.012*"like" + 0.009*"camera" + 0.007*"good" + 0.006*"great" + 0.006*"book" + 0.006*"time" + 0.005*"product" + 0.004*"best" + 0.003*"quality" + 0.003*"years"
2018-12-18 14:56:10,336 : INFO : topic diff=7.784213, rho=1.000000
2018-12-18 14:56:11,166 : INFO : PROGRESS: pass 0, at document #4000/11914
2018-12-18 14:56:12,212 : INFO : optimized alpha [0.082364716, 0.091729924, 0.080401495, 0.09297181, 0.10669526, 0.083107665, 0.093821146, 0.10455921, 0.10452702, 0.09157719]
2018-12-18 14:56:12,216 : INFO : merging changes from 2000 documents into a model of 11914 documents
2018-12-18 14:56:12,262 : INFO : topic #2 (0.080): 0.006*"like" + 0.005*"album" + 0.005*"book" + 0.004*"song" + 0.004*"great" + 0.004*"time" + 0.003*"version" + 0.003*"best" + 0.003*"new" + 0.003*"music"
2018-12-18 14:56:12,264 : INFO : topic #0 (0.082): 0.006*"book" + 0.005*"good" + 0.004*"music" + 0.004*"better" + 0.004*"camera" + 0.004*"like" + 0.004*"new" + 0.004*"great" + 0.003*"film" + 0.003*"love"
2018-12-18 14:56:12,265 : INFO : topic #7 (0.105): 0.023*"book" + 0.007*"good" + 0.006*"like" + 0.006*"read" + 0.004*"dvd" + 0.004*"great" + 0.004*"camera" + 0.004*"time" + 0.003*"new" + 0.003*"little"
2018-12-18 14:56:12,267 : INFO : topic #8 (0.105): 0.015*"camera" + 0.011*"like" + 0.007*"good" + 0.007*"time" + 0.007*"great" + 0.006*"product" + 0.004*"book" + 0.004*"quality" + 0.004*"bought" + 0.003*"best"
2018-12-18 14:56:12,270 : INFO : topic #4 (0.107): 0.009*"use" + 0.007*"great" + 0.006*"lens" + 0.006*"product" + 0.006*"software" + 0.006*"new" + 0.005*"bought" + 0.005*"book" + 0.005*"good" + 0.004*"time"
2018-12-18 14:56:12,272 : INFO : topic diff=1.320120, rho=0.707107
2018-12-18 14:56:13,106 : INFO : PROGRESS: pass 0, at document #6000/11914
2018-12-18 14:56:14,051 : INFO : optimized alpha [0.07893851, 0.09394589, 0.07705263, 0.092635415, 0.116420195, 0.08057451, 0.09525558, 0.10774793, 0.10923241, 0.08955219]
2018-12-18 14:56:14,054 : INFO : merging changes from 2000 documents into a model of 11914 documents
2018-12-18 14:56:14,100 : INFO : topic #2 (0.077): 0.005*"like" + 0.004*"great" + 0.004*"album" + 0.004*"book" + 0.004*"best" + 0.004*"song" + 0.003*"time" + 0.003*"quot" + 0.003*"new" + 0.003*"music"
2018-12-18 14:56:14,101 : INFO : topic #0 (0.079): 0.005*"book" + 0.004*"good" + 0.004*"better" + 0.004*"new" + 0.003*"quot" + 0.003*"like" + 0.003*"music" + 0.003*"love" + 0.003*"great" + 0.003*"film"
2018-12-18 14:56:14,103 : INFO : topic #7 (0.108): 0.030*"book" + 0.008*"read" + 0.006*"like" + 0.006*"good" + 0.004*"dvd" + 0.004*"great" + 0.003*"author" + 0.003*"books" + 0.003*"reading" + 0.003*"new"
2018-12-18 14:56:14,104 : INFO : topic #8 (0.109): 0.023*"camera" + 0.011*"like" + 0.008*"good" + 0.007*"great" + 0.007*"time" + 0.006*"quality" + 0.005*"product" + 0.004*"digital" + 0.004*"bought" + 0.004*"buy"
2018-12-18 14:56:14,105 : INFO : topic #4 (0.116): 0.011*"use" + 0.008*"software" + 0.007*"lens" + 0.007*"product" + 0.006*"great" + 0.006*"new" + 0.005*"bought" + 0.005*"good" + 0.005*"time" + 0.004*"work"
2018-12-18 14:56:14,107 : INFO : topic diff=1.132977, rho=0.577350
2018-12-18 14:56:14,972 : INFO : PROGRESS: pass 0, at document #8000/11914
2018-12-18 14:56:15,904 : INFO : optimized alpha [0.07736978, 0.09765883, 0.075947836, 0.09120177, 0.12580423, 0.07942948, 0.09697951, 0.11309826, 0.11496529, 0.091191486]
2018-12-18 14:56:15,908 : INFO : merging changes from 2000 documents into a model of 11914 documents
2018-12-18 14:56:15,955 : INFO : topic #2 (0.076): 0.004*"like" + 0.004*"quot" + 0.004*"best" + 0.004*"great" + 0.003*"new" + 0.003*"book" + 0.003*"time" + 0.003*"song" + 0.003*"album" + 0.003*"music"
2018-12-18 14:56:15,957 : INFO : topic #0 (0.077): 0.005*"quot" + 0.004*"new" + 0.004*"book" + 0.003*"man" + 0.003*"good" + 0.003*"better" + 0.003*"love" + 0.003*"like" + 0.003*"life" + 0.003*"great"
2018-12-18 14:56:15,958 : INFO : topic #7 (0.113): 0.034*"book" + 0.009*"read" + 0.006*"like" + 0.005*"good" + 0.004*"books" + 0.004*"reading" + 0.004*"author" + 0.003*"story" + 0.003*"great" + 0.003*"new"
2018-12-18 14:56:15,960 : INFO : topic #8 (0.115): 0.027*"camera" + 0.010*"like" + 0.009*"good" + 0.009*"great" + 0.007*"quality" + 0.007*"time" + 0.006*"product" + 0.006*"pictures" + 0.005*"bought" + 0.005*"use"
2018-12-18 14:56:15,962 : INFO : topic #4 (0.126): 0.012*"use" + 0.009*"lens" + 0.009*"product" + 0.008*"software" + 0.006*"great" + 0.005*"time" + 0.005*"new" + 0.005*"work" + 0.005*"good" + 0.004*"bought"
2018-12-18 14:56:15,967 : INFO : topic diff=1.097450, rho=0.500000
2018-12-18 14:56:16,847 : INFO : PROGRESS: pass 0, at document #10000/11914
2018-12-18 14:56:17,661 : INFO : optimized alpha [0.076843336, 0.1011191, 0.075514294, 0.09158413, 0.13493104, 0.08016366, 0.10035147, 0.116368204, 0.122194715, 0.09320569]
2018-12-18 14:56:17,665 : INFO : merging changes from 2000 documents into a model of 11914 documents
2018-12-18 14:56:17,711 : INFO : topic #2 (0.076): 0.005*"quot" + 0.004*"like" + 0.003*"best" + 0.003*"great" + 0.003*"new" + 0.003*"time" + 0.002*"book" + 0.002*"music" + 0.002*"years" + 0.002*"bar"
2018-12-18 14:56:17,712 : INFO : topic #0 (0.077): 0.005*"quot" + 0.004*"new" + 0.004*"man" + 0.003*"love" + 0.003*"good" + 0.003*"book" + 0.003*"life" + 0.003*"better" + 0.003*"like" + 0.003*"great"
2018-12-18 14:56:17,713 : INFO : topic #7 (0.116): 0.036*"book" + 0.011*"read" + 0.006*"like" + 0.005*"good" + 0.005*"books" + 0.004*"reading" + 0.004*"author" + 0.004*"story" + 0.004*"time" + 0.003*"people"
2018-12-18 14:56:17,716 : INFO : topic #8 (0.122): 0.029*"camera" + 0.010*"like" + 0.010*"good" + 0.009*"quality" + 0.008*"great" + 0.007*"time" + 0.006*"pictures" + 0.006*"product" + 0.006*"bought" + 0.005*"use"
2018-12-18 14:56:17,717 : INFO : topic #4 (0.135): 0.012*"use" + 0.009*"product" + 0.008*"lens" + 0.007*"software" + 0.006*"time" + 0.006*"work" + 0.005*"program" + 0.005*"new" + 0.005*"great" + 0.005*"version"
2018-12-18 14:56:17,719 : INFO : topic diff=0.956733, rho=0.447214
2018-12-18 14:56:19,797 : INFO : -9.509 per-word bound, 728.6 perplexity estimate based on a held-out corpus of 1914 documents with 106900 words
2018-12-18 14:56:19,798 : INFO : PROGRESS: pass 0, at document #11914/11914
2018-12-18 14:56:20,566 : INFO : optimized alpha [0.07633501, 0.10563685, 0.074937664, 0.09151283, 0.14486805, 0.08066378, 0.10215627, 0.122843616, 0.12933499, 0.09556736]
2018-12-18 14:56:20,570 : INFO : merging changes from 1914 documents into a model of 11914 documents
2018-12-18 14:56:20,614 : INFO : topic #2 (0.075): 0.005*"quot" + 0.004*"like" + 0.003*"oil" + 0.003*"bar" + 0.003*"best" + 0.003*"great" + 0.003*"bars" + 0.002*"new" + 0.002*"time" + 0.002*"flaxseed"
2018-12-18 14:56:20,616 : INFO : topic #0 (0.076): 0.005*"quot" + 0.003*"new" + 0.003*"man" + 0.003*"life" + 0.003*"love" + 0.003*"girls" + 0.003*"story" + 0.003*"good" + 0.002*"like" + 0.002*"book"
2018-12-18 14:56:20,617 : INFO : topic #7 (0.123): 0.038*"book" + 0.012*"read" + 0.006*"books" + 0.005*"like" + 0.005*"good" + 0.004*"reading" + 0.004*"author" + 0.004*"people" + 0.004*"time" + 0.004*"story"
2018-12-18 14:56:20,620 : INFO : topic #8 (0.129): 0.031*"camera" + 0.010*"good" + 0.009*"like" + 0.008*"great" + 0.008*"quality" + 0.007*"pictures" + 0.007*"time" + 0.006*"bought" + 0.006*"product" + 0.006*"use"
2018-12-18 14:56:20,621 : INFO : topic #4 (0.145): 0.012*"use" + 0.010*"product" + 0.008*"lens" + 0.008*"software" + 0.006*"time" + 0.006*"program" + 0.005*"work" + 0.005*"version" + 0.005*"new" + 0.005*"great"
2018-12-18 14:56:20,623 : INFO : topic diff=0.882805, rho=0.408248
2018-12-18 14:56:21,520 : INFO : PROGRESS: pass 1, at document #2000/11914
2018-12-18 14:56:22,300 : INFO : optimized alpha [0.071796745, 0.102217466, 0.07051551, 0.087702796, 0.14583421, 0.07703891, 0.09780689, 0.11963288, 0.13057074, 0.094000675]
2018-12-18 14:56:22,305 : INFO : merging changes from 2000 documents into a model of 11914 documents
2018-12-18 14:56:22,350 : INFO : topic #2 (0.071): 0.006*"quot" + 0.004*"like" + 0.003*"oil" + 0.003*"organic" + 0.003*"great" + 0.003*"best" + 0.002*"new" + 0.002*"chocolate" + 0.002*"bar" + 0.002*"bars"
2018-12-18 14:56:22,351 : INFO : topic #0 (0.072): 0.006*"quot" + 0.003*"girls" + 0.003*"new" + 0.003*"man" + 0.003*"love" + 0.003*"lisbon" + 0.003*"life" + 0.003*"story" + 0.002*"good" + 0.002*"like"
2018-12-18 14:56:22,353 : INFO : topic #7 (0.120): 0.038*"book" + 0.011*"read" + 0.005*"books" + 0.005*"like" + 0.005*"good" + 0.005*"author" + 0.004*"reading" + 0.004*"people" + 0.004*"story" + 0.004*"time"
2018-12-18 14:56:22,354 : INFO : topic #8 (0.131): 0.031*"camera" + 0.010*"good" + 0.009*"like" + 0.009*"great" + 0.008*"quality" + 0.007*"pictures" + 0.007*"time" + 0.006*"bought" + 0.006*"product" + 0.006*"battery"
2018-12-18 14:56:22,356 : INFO : topic #4 (0.146): 0.013*"use" + 0.010*"product" + 0.009*"software" + 0.008*"lens" + 0.006*"time" + 0.006*"program" + 0.005*"version" + 0.005*"work" + 0.005*"great" + 0.005*"new"
2018-12-18 14:56:22,357 : INFO : topic diff=0.564847, rho=0.354507
2018-12-18 14:56:23,214 : INFO : PROGRESS: pass 1, at document #4000/11914
2018-12-18 14:56:23,884 : INFO : optimized alpha [0.07004829, 0.10112711, 0.068367645, 0.08615708, 0.1499619, 0.07524326, 0.09563226, 0.12002435, 0.13225144, 0.094251975]
2018-12-18 14:56:23,889 : INFO : merging changes from 2000 documents into a model of 11914 documents
2018-12-18 14:56:23,936 : INFO : topic #2 (0.068): 0.006*"quot" + 0.003*"like" + 0.003*"bar" + 0.002*"best" + 0.002*"great" + 0.002*"oil" + 0.002*"bars" + 0.002*"new" + 0.002*"chocolate" + 0.002*"organic"
2018-12-18 14:56:23,937 : INFO : topic #0 (0.070): 0.006*"quot" + 0.003*"new" + 0.003*"man" + 0.003*"girls" + 0.003*"life" + 0.002*"love" + 0.002*"story" + 0.002*"little" + 0.002*"day" + 0.002*"good"
2018-12-18 14:56:23,939 : INFO : topic #7 (0.120): 0.041*"book" + 0.012*"read" + 0.006*"books" + 0.005*"like" + 0.005*"author" + 0.005*"good" + 0.005*"reading" + 0.004*"people" + 0.004*"story" + 0.004*"time"
2018-12-18 14:56:23,940 : INFO : topic #8 (0.132): 0.032*"camera" + 0.010*"good" + 0.009*"great" + 0.009*"like" + 0.008*"quality" + 0.008*"pictures" + 0.007*"bought" + 0.007*"time" + 0.006*"product" + 0.006*"use"
2018-12-18 14:56:23,942 : INFO : topic #4 (0.150): 0.012*"use" + 0.010*"product" + 0.009*"software" + 0.008*"lens" + 0.006*"time" + 0.006*"version" + 0.006*"work" + 0.006*"new" + 0.006*"program" + 0.005*"great"
2018-12-18 14:56:23,943 : INFO : topic diff=0.504631, rho=0.354507
2018-12-18 14:56:24,808 : INFO : PROGRESS: pass 1, at document #6000/11914
2018-12-18 14:56:25,488 : INFO : optimized alpha [0.06862009, 0.10221882, 0.06698574, 0.08493563, 0.15431319, 0.074318536, 0.09560415, 0.12073314, 0.13690458, 0.0935877]
2018-12-18 14:56:25,492 : INFO : merging changes from 2000 documents into a model of 11914 documents
2018-12-18 14:56:25,540 : INFO : topic #2 (0.067): 0.006*"quot" + 0.003*"like" + 0.003*"bar" + 0.003*"best" + 0.002*"bars" + 0.002*"great" + 0.002*"new" + 0.002*"oil" + 0.002*"chocolate" + 0.002*"century"
2018-12-18 14:56:25,541 : INFO : topic #0 (0.069): 0.006*"quot" + 0.005*"talk" + 0.003*"new" + 0.003*"life" + 0.003*"man" + 0.003*"girls" + 0.002*"love" + 0.002*"little" + 0.002*"day" + 0.002*"story"
2018-12-18 14:56:25,542 : INFO : topic #7 (0.121): 0.042*"book" + 0.012*"read" + 0.006*"books" + 0.006*"like" + 0.005*"author" + 0.005*"reading" + 0.005*"good" + 0.004*"people" + 0.003*"story" + 0.003*"time"
2018-12-18 14:56:25,544 : INFO : topic #8 (0.137): 0.033*"camera" + 0.010*"good" + 0.009*"great" + 0.009*"like" + 0.009*"pictures" + 0.008*"quality" + 0.007*"bought" + 0.007*"use" + 0.007*"time" + 0.006*"battery"
2018-12-18 14:56:25,545 : INFO : topic #4 (0.154): 0.012*"use" + 0.010*"product" + 0.009*"software" + 0.007*"lens" + 0.006*"time" + 0.006*"program" + 0.006*"version" + 0.006*"work" + 0.006*"new" + 0.005*"like"
2018-12-18 14:56:25,546 : INFO : topic diff=0.463345, rho=0.354507
2018-12-18 14:56:26,444 : INFO : PROGRESS: pass 1, at document #8000/11914
2018-12-18 14:56:27,148 : INFO : optimized alpha [0.06797242, 0.10434836, 0.06664647, 0.08362258, 0.15831791, 0.07368199, 0.095921576, 0.12339767, 0.14173222, 0.09521131]
2018-12-18 14:56:27,152 : INFO : merging changes from 2000 documents into a model of 11914 documents
2018-12-18 14:56:27,197 : INFO : topic #2 (0.067): 0.005*"quot" + 0.003*"oil" + 0.003*"like" + 0.003*"bars" + 0.002*"best" + 0.002*"bar" + 0.002*"great" + 0.002*"new" + 0.002*"simon" + 0.002*"century"
2018-12-18 14:56:27,198 : INFO : topic #0 (0.068): 0.005*"quot" + 0.004*"talk" + 0.003*"man" + 0.003*"new" + 0.003*"life" + 0.002*"girls" + 0.002*"love" + 0.002*"boys" + 0.002*"story" + 0.002*"young"
2018-12-18 14:56:27,200 : INFO : topic #7 (0.123): 0.040*"book" + 0.012*"read" + 0.006*"books" + 0.005*"like" + 0.005*"reading" + 0.005*"author" + 0.004*"good" + 0.004*"people" + 0.003*"time" + 0.003*"story"
2018-12-18 14:56:27,201 : INFO : topic #8 (0.142): 0.032*"camera" + 0.010*"good" + 0.010*"great" + 0.009*"pictures" + 0.009*"like" + 0.008*"quality" + 0.008*"use" + 0.007*"bought" + 0.006*"time" + 0.006*"battery"
2018-12-18 14:56:27,202 : INFO : topic #4 (0.158): 0.012*"use" + 0.010*"product" + 0.009*"software" + 0.008*"lens" + 0.006*"program" + 0.006*"time" + 0.006*"work" + 0.006*"version" + 0.005*"new" + 0.005*"like"
2018-12-18 14:56:27,204 : INFO : topic diff=0.442825, rho=0.354507
2018-12-18 14:56:28,139 : INFO : PROGRESS: pass 1, at document #10000/11914
2018-12-18 14:56:28,783 : INFO : optimized alpha [0.067877494, 0.10665549, 0.06667248, 0.08375033, 0.16272981, 0.074068666, 0.09809148, 0.12508546, 0.1484623, 0.097160235]
2018-12-18 14:56:28,788 : INFO : merging changes from 2000 documents into a model of 11914 documents
2018-12-18 14:56:28,850 : INFO : topic #2 (0.067): 0.005*"quot" + 0.004*"oil" + 0.003*"bar" + 0.003*"like" + 0.003*"bars" + 0.003*"flavor" + 0.002*"best" + 0.002*"taste" + 0.002*"energy" + 0.002*"great"
2018-12-18 14:56:28,852 : INFO : topic #0 (0.068): 0.006*"quot" + 0.003*"new" + 0.003*"man" + 0.003*"match" + 0.003*"talk" + 0.003*"life" + 0.003*"jack" + 0.002*"love" + 0.002*"young" + 0.002*"day"
2018-12-18 14:56:28,854 : INFO : topic #7 (0.125): 0.040*"book" + 0.012*"read" + 0.006*"books" + 0.005*"like" + 0.005*"author" + 0.005*"reading" + 0.004*"good" + 0.004*"people" + 0.004*"time" + 0.004*"story"
2018-12-18 14:56:28,855 : INFO : topic #8 (0.148): 0.032*"camera" + 0.010*"good" + 0.010*"great" + 0.009*"quality" + 0.009*"like" + 0.008*"pictures" + 0.008*"use" + 0.007*"bought" + 0.007*"time" + 0.006*"battery"
2018-12-18 14:56:28,857 : INFO : topic #4 (0.163): 0.012*"use" + 0.010*"product" + 0.009*"software" + 0.007*"lens" + 0.007*"program" + 0.007*"time" + 0.006*"work" + 0.006*"version" + 0.006*"new" + 0.005*"like"
2018-12-18 14:56:28,859 : INFO : topic diff=0.387757, rho=0.354507
2018-12-18 14:56:30,769 : INFO : -9.045 per-word bound, 528.3 perplexity estimate based on a held-out corpus of 1914 documents with 106900 words
2018-12-18 14:56:30,770 : INFO : PROGRESS: pass 1, at document #11914/11914
2018-12-18 14:56:31,398 : INFO : optimized alpha [0.06785011, 0.110511445, 0.06661658, 0.08341892, 0.16930822, 0.074460015, 0.09913144, 0.13015336, 0.15529038, 0.0995038]
2018-12-18 14:56:31,402 : INFO : merging changes from 1914 documents into a model of 11914 documents
2018-12-18 14:56:31,446 : INFO : topic #2 (0.067): 0.005*"oil" + 0.005*"quot" + 0.004*"bar" + 0.003*"bars" + 0.003*"like" + 0.003*"flaxseed" + 0.002*"flavor" + 0.002*"chocolate" + 0.002*"energy" + 0.002*"taste"
2018-12-18 14:56:31,447 : INFO : topic #0 (0.068): 0.005*"quot" + 0.003*"girls" + 0.003*"man" + 0.003*"life" + 0.003*"new" + 0.003*"talk" + 0.003*"match" + 0.003*"boys" + 0.002*"young" + 0.002*"lisbon"
2018-12-18 14:56:31,448 : INFO : topic #7 (0.130): 0.040*"book" + 0.013*"read" + 0.006*"books" + 0.005*"like" + 0.005*"reading" + 0.005*"author" + 0.005*"people" + 0.004*"good" + 0.004*"time" + 0.003*"story"
2018-12-18 14:56:31,450 : INFO : topic #8 (0.155): 0.032*"camera" + 0.010*"good" + 0.009*"great" + 0.008*"use" + 0.008*"like" + 0.008*"pictures" + 0.008*"quality" + 0.007*"bought" + 0.006*"time" + 0.006*"battery"
2018-12-18 14:56:31,451 : INFO : topic #4 (0.169): 0.012*"use" + 0.011*"product" + 0.009*"software" + 0.007*"program" + 0.007*"lens" + 0.006*"time" + 0.006*"version" + 0.006*"work" + 0.005*"new" + 0.005*"like"
2018-12-18 14:56:31,453 : INFO : topic diff=0.353471, rho=0.354507
2018-12-18 14:56:32,341 : INFO : PROGRESS: pass 2, at document #2000/11914
2018-12-18 14:56:32,992 : INFO : optimized alpha [0.064975254, 0.108354785, 0.06383052, 0.08109162, 0.16854978, 0.072104886, 0.09625615, 0.12803721, 0.15818642, 0.098763436]
2018-12-18 14:56:32,997 : INFO : merging changes from 2000 documents into a model of 11914 documents
2018-12-18 14:56:33,043 : INFO : topic #2 (0.064): 0.006*"quot" + 0.005*"oil" + 0.003*"organic" + 0.003*"bar" + 0.003*"chocolate" + 0.003*"bars" + 0.003*"like" + 0.002*"taste" + 0.002*"flaxseed" + 0.002*"best"
2018-12-18 14:56:33,044 : INFO : topic #0 (0.065): 0.006*"quot" + 0.004*"girls" + 0.003*"lisbon" + 0.003*"life" + 0.003*"man" + 0.003*"boys" + 0.003*"new" + 0.003*"trip" + 0.002*"talk" + 0.002*"love"
2018-12-18 14:56:33,046 : INFO : topic #7 (0.128): 0.039*"book" + 0.012*"read" + 0.006*"books" + 0.005*"like" + 0.005*"author" + 0.005*"people" + 0.005*"reading" + 0.004*"good" + 0.004*"time" + 0.004*"story"
2018-12-18 14:56:33,049 : INFO : topic #8 (0.158): 0.031*"camera" + 0.010*"good" + 0.009*"great" + 0.009*"pictures" + 0.008*"like" + 0.008*"use" + 0.008*"quality" + 0.007*"bought" + 0.006*"time" + 0.006*"battery"
2018-12-18 14:56:33,053 : INFO : topic #4 (0.169): 0.012*"use" + 0.011*"product" + 0.010*"software" + 0.007*"program" + 0.007*"lens" + 0.006*"time" + 0.006*"version" + 0.006*"work" + 0.005*"new" + 0.005*"like"
2018-12-18 14:56:33,055 : INFO : topic diff=0.291769, rho=0.334132
2018-12-18 14:56:33,993 : INFO : PROGRESS: pass 2, at document #4000/11914
2018-12-18 14:56:34,595 : INFO : optimized alpha [0.06398557, 0.10819837, 0.062549315, 0.080205716, 0.17147906, 0.071209945, 0.0952072, 0.12890473, 0.16075408, 0.099569075]
2018-12-18 14:56:34,599 : INFO : merging changes from 2000 documents into a model of 11914 documents
2018-12-18 14:56:34,645 : INFO : topic #2 (0.063): 0.005*"quot" + 0.003*"oil" + 0.003*"bar" + 0.003*"bars" + 0.003*"chocolate" + 0.003*"organic" + 0.002*"like" + 0.002*"taste" + 0.002*"energy" + 0.002*"century"
2018-12-18 14:56:34,646 : INFO : topic #0 (0.064): 0.006*"quot" + 0.004*"pluto" + 0.003*"girls" + 0.003*"new" + 0.003*"life" + 0.003*"man" + 0.003*"jack" + 0.002*"harry" + 0.002*"young" + 0.002*"day"
2018-12-18 14:56:34,647 : INFO : topic #7 (0.129): 0.042*"book" + 0.013*"read" + 0.006*"books" + 0.006*"author" + 0.005*"like" + 0.005*"people" + 0.005*"reading" + 0.004*"good" + 0.004*"time" + 0.003*"written"
2018-12-18 14:56:34,648 : INFO : topic #8 (0.161): 0.031*"camera" + 0.010*"good" + 0.010*"great" + 0.009*"pictures" + 0.008*"use" + 0.008*"like" + 0.008*"quality" + 0.008*"bought" + 0.006*"time" + 0.006*"product"
2018-12-18 14:56:34,649 : INFO : topic #4 (0.171): 0.012*"use" + 0.011*"product" + 0.009*"software" + 0.007*"version" + 0.007*"time" + 0.007*"program" + 0.006*"work" + 0.006*"new" + 0.006*"lens" + 0.005*"like"
2018-12-18 14:56:34,653 : INFO : topic diff=0.259530, rho=0.334132
2018-12-18 14:56:35,522 : INFO : PROGRESS: pass 2, at document #6000/11914
2018-12-18 14:56:36,115 : INFO : optimized alpha [0.06320265, 0.1098197, 0.06175724, 0.07964794, 0.17457394, 0.07084673, 0.09553237, 0.1299756, 0.16594937, 0.09956497]
2018-12-18 14:56:36,119 : INFO : merging changes from 2000 documents into a model of 11914 documents
2018-12-18 14:56:36,162 : INFO : topic #2 (0.062): 0.005*"quot" + 0.003*"bar" + 0.003*"oil" + 0.003*"bars" + 0.002*"chocolate" + 0.002*"like" + 0.002*"taste" + 0.002*"century" + 0.002*"organic" + 0.002*"protein"
2018-12-18 14:56:36,163 : INFO : topic #0 (0.063): 0.006*"talk" + 0.005*"quot" + 0.003*"girls" + 0.003*"life" + 0.003*"new" + 0.003*"man" + 0.003*"pluto" + 0.002*"jack" + 0.002*"young" + 0.002*"boys"
2018-12-18 14:56:36,166 : INFO : topic #7 (0.130): 0.042*"book" + 0.013*"read" + 0.006*"books" + 0.005*"author" + 0.005*"like" + 0.005*"reading" + 0.005*"people" + 0.004*"good" + 0.004*"written" + 0.004*"time"
2018-12-18 14:56:36,168 : INFO : topic #8 (0.166): 0.031*"camera" + 0.011*"good" + 0.010*"great" + 0.009*"pictures" + 0.009*"use" + 0.008*"like" + 0.008*"quality" + 0.008*"bought" + 0.006*"time" + 0.006*"battery"
2018-12-18 14:56:36,170 : INFO : topic #4 (0.175): 0.012*"use" + 0.010*"product" + 0.010*"software" + 0.007*"program" + 0.007*"version" + 0.006*"time" + 0.006*"work" + 0.006*"new" + 0.005*"lens" + 0.005*"like"
2018-12-18 14:56:36,172 : INFO : topic diff=0.239888, rho=0.334132
2018-12-18 14:56:37,162 : INFO : PROGRESS: pass 2, at document #8000/11914
2018-12-18 14:56:37,788 : INFO : optimized alpha [0.06306409, 0.111893706, 0.06187676, 0.07900274, 0.17772733, 0.07063882, 0.09612917, 0.1331771, 0.17044489, 0.10144843]
2018-12-18 14:56:37,792 : INFO : merging changes from 2000 documents into a model of 11914 documents
2018-12-18 14:56:37,846 : INFO : topic #2 (0.062): 0.004*"quot" + 0.004*"oil" + 0.003*"bars" + 0.003*"bar" + 0.002*"simon" + 0.002*"chocolate" + 0.002*"taste" + 0.002*"flavor" + 0.002*"like" + 0.002*"flaxseed"
2018-12-18 14:56:37,847 : INFO : topic #0 (0.063): 0.005*"quot" + 0.005*"talk" + 0.003*"man" + 0.003*"life" + 0.003*"girls" + 0.003*"new" + 0.003*"match" + 0.002*"boys" + 0.002*"young" + 0.002*"self"
2018-12-18 14:56:37,849 : INFO : topic #7 (0.133): 0.040*"book" + 0.012*"read" + 0.006*"books" + 0.005*"like" + 0.005*"author" + 0.005*"reading" + 0.005*"people" + 0.004*"good" + 0.004*"life" + 0.004*"time"
2018-12-18 14:56:37,850 : INFO : topic #8 (0.170): 0.031*"camera" + 0.010*"great" + 0.010*"good" + 0.009*"use" + 0.009*"pictures" + 0.008*"like" + 0.008*"quality" + 0.008*"bought" + 0.006*"time" + 0.006*"lens"
2018-12-18 14:56:37,852 : INFO : topic #4 (0.178): 0.012*"use" + 0.011*"product" + 0.010*"software" + 0.007*"program" + 0.006*"version" + 0.006*"time" + 0.006*"work" + 0.006*"new" + 0.005*"lens" + 0.005*"like"
2018-12-18 14:56:37,853 : INFO : topic diff=0.234663, rho=0.334132
2018-12-18 14:56:38,734 : INFO : PROGRESS: pass 2, at document #10000/11914
2018-12-18 14:56:39,346 : INFO : optimized alpha [0.06336792, 0.11437441, 0.06227016, 0.07945616, 0.180678, 0.07118713, 0.098185554, 0.13508083, 0.17687304, 0.10372514]
2018-12-18 14:56:39,350 : INFO : merging changes from 2000 documents into a model of 11914 documents
2018-12-18 14:56:39,397 : INFO : topic #2 (0.062): 0.005*"oil" + 0.004*"quot" + 0.004*"bar" + 0.003*"bars" + 0.003*"flavor" + 0.003*"energy" + 0.002*"taste" + 0.002*"chocolate" + 0.002*"like" + 0.002*"layout"
2018-12-18 14:56:39,399 : INFO : topic #0 (0.063): 0.005*"quot" + 0.004*"match" + 0.004*"talk" + 0.003*"jack" + 0.003*"man" + 0.003*"life" + 0.003*"new" + 0.002*"young" + 0.002*"boys" + 0.002*"girls"
2018-12-18 14:56:39,400 : INFO : topic #7 (0.135): 0.040*"book" + 0.012*"read" + 0.006*"books" + 0.005*"author" + 0.005*"like" + 0.005*"reading" + 0.005*"people" + 0.004*"good" + 0.004*"time" + 0.004*"life"
2018-12-18 14:56:39,402 : INFO : topic #8 (0.177): 0.030*"camera" + 0.010*"good" + 0.010*"great" + 0.010*"use" + 0.009*"pictures" + 0.008*"quality" + 0.008*"like" + 0.008*"bought" + 0.007*"time" + 0.006*"lens"
2018-12-18 14:56:39,403 : INFO : topic #4 (0.181): 0.011*"use" + 0.011*"product" + 0.009*"software" + 0.007*"program" + 0.007*"time" + 0.006*"version" + 0.006*"work" + 0.006*"new" + 0.005*"like" + 0.004*"support"
2018-12-18 14:56:39,404 : INFO : topic diff=0.202241, rho=0.334132
2018-12-18 14:56:41,227 : INFO : -8.933 per-word bound, 488.7 perplexity estimate based on a held-out corpus of 1914 documents with 106900 words
2018-12-18 14:56:41,227 : INFO : PROGRESS: pass 2, at document #11914/11914
2018-12-18 14:56:41,795 : INFO : optimized alpha [0.06358816, 0.11825916, 0.06264903, 0.07937897, 0.1859219, 0.07170674, 0.099306926, 0.1399331, 0.1828663, 0.10602817]
2018-12-18 14:56:41,798 : INFO : merging changes from 1914 documents into a model of 11914 documents
2018-12-18 14:56:41,845 : INFO : topic #2 (0.063): 0.006*"oil" + 0.004*"bar" + 0.004*"quot" + 0.004*"bars" + 0.003*"flaxseed" + 0.003*"flavor" + 0.003*"chocolate" + 0.002*"energy" + 0.002*"taste" + 0.002*"aspirin"
2018-12-18 14:56:41,847 : INFO : topic #0 (0.064): 0.004*"quot" + 0.003*"girls" + 0.003*"match" + 0.003*"talk" + 0.003*"life" + 0.003*"boys" + 0.003*"man" + 0.002*"young" + 0.002*"new" + 0.002*"lisbon"
2018-12-18 14:56:41,848 : INFO : topic #7 (0.140): 0.040*"book" + 0.013*"read" + 0.006*"books" + 0.005*"people" + 0.005*"like" + 0.005*"reading" + 0.005*"author" + 0.004*"time" + 0.004*"good" + 0.004*"life"
2018-12-18 14:56:41,850 : INFO : topic #8 (0.183): 0.030*"camera" + 0.010*"good" + 0.010*"great" + 0.010*"use" + 0.008*"pictures" + 0.008*"quality" + 0.008*"like" + 0.007*"bought" + 0.007*"lens" + 0.006*"time"
2018-12-18 14:56:41,851 : INFO : topic #4 (0.186): 0.011*"use" + 0.011*"product" + 0.010*"software" + 0.007*"program" + 0.007*"version" + 0.007*"time" + 0.006*"work" + 0.006*"new" + 0.005*"like" + 0.004*"support"
2018-12-18 14:56:41,853 : INFO : topic diff=0.184814, rho=0.334132
2018-12-18 14:56:42,805 : INFO : PROGRESS: pass 3, at document #2000/11914
2018-12-18 14:56:43,428 : INFO : optimized alpha [0.0614248, 0.11650903, 0.060612462, 0.077745415, 0.18441208, 0.069895625, 0.09724629, 0.13767914, 0.18540838, 0.105441816]
2018-12-18 14:56:43,432 : INFO : merging changes from 2000 documents into a model of 11914 documents
2018-12-18 14:56:43,479 : INFO : topic #2 (0.061): 0.005*"oil" + 0.005*"quot" + 0.004*"organic" + 0.003*"bar" + 0.003*"chocolate" + 0.003*"bars" + 0.002*"taste" + 0.002*"flaxseed" + 0.002*"flavor" + 0.002*"energy"
2018-12-18 14:56:43,480 : INFO : topic #0 (0.061): 0.005*"quot" + 0.004*"girls" + 0.003*"lisbon" + 0.003*"boys" + 0.003*"trip" + 0.003*"life" + 0.003*"talk" + 0.002*"man" + 0.002*"match" + 0.002*"new"
2018-12-18 14:56:43,482 : INFO : topic #7 (0.138): 0.039*"book" + 0.012*"read" + 0.006*"books" + 0.005*"people" + 0.005*"author" + 0.005*"like" + 0.005*"reading" + 0.004*"good" + 0.004*"time" + 0.004*"life"
2018-12-18 14:56:43,484 : INFO : topic #4 (0.184): 0.012*"use" + 0.011*"product" + 0.010*"software" + 0.007*"program" + 0.007*"version" + 0.007*"time" + 0.006*"work" + 0.005*"new" + 0.005*"like" + 0.004*"support"
2018-12-18 14:56:43,485 : INFO : topic #8 (0.185): 0.029*"camera" + 0.010*"good" + 0.010*"great" + 0.009*"use" + 0.009*"pictures" + 0.008*"like" + 0.008*"quality" + 0.008*"lens" + 0.007*"bought" + 0.006*"time"
2018-12-18 14:56:43,487 : INFO : topic diff=0.168672, rho=0.316910
2018-12-18 14:56:44,348 : INFO : PROGRESS: pass 3, at document #4000/11914
2018-12-18 14:56:44,922 : INFO : optimized alpha [0.0608762, 0.11660235, 0.059773117, 0.07732493, 0.18629074, 0.06936401, 0.09643756, 0.13877718, 0.18722421, 0.10628112]
2018-12-18 14:56:44,927 : INFO : merging changes from 2000 documents into a model of 11914 documents
2018-12-18 14:56:44,972 : INFO : topic #2 (0.060): 0.004*"quot" + 0.004*"oil" + 0.004*"bar" + 0.003*"bars" + 0.003*"chocolate" + 0.003*"organic" + 0.002*"taste" + 0.002*"energy" + 0.002*"flavor" + 0.002*"century"
2018-12-18 14:56:44,973 : INFO : topic #0 (0.061): 0.005*"quot" + 0.004*"pluto" + 0.004*"girls" + 0.003*"jack" + 0.003*"life" + 0.002*"new" + 0.002*"man" + 0.002*"young" + 0.002*"harry" + 0.002*"lisbon"
2018-12-18 14:56:44,975 : INFO : topic #7 (0.139): 0.041*"book" + 0.013*"read" + 0.006*"books" + 0.005*"author" + 0.005*"people" + 0.005*"like" + 0.005*"reading" + 0.004*"good" + 0.004*"time" + 0.004*"life"
2018-12-18 14:56:44,976 : INFO : topic #4 (0.186): 0.011*"product" + 0.011*"use" + 0.010*"software" + 0.007*"version" + 0.007*"program" + 0.007*"time" + 0.006*"work" + 0.006*"new" + 0.005*"like" + 0.004*"easy"
2018-12-18 14:56:44,977 : INFO : topic #8 (0.187): 0.029*"camera" + 0.010*"good" + 0.010*"great" + 0.009*"use" + 0.008*"pictures" + 0.008*"like" + 0.008*"quality" + 0.008*"bought" + 0.007*"lens" + 0.006*"time"
2018-12-18 14:56:44,978 : INFO : topic diff=0.155727, rho=0.316910
2018-12-18 14:56:45,873 : INFO : PROGRESS: pass 3, at document #6000/11914
2018-12-18 14:56:46,473 : INFO : optimized alpha [0.06046999, 0.11833395, 0.05928892, 0.07708426, 0.18821934, 0.069245584, 0.0969573, 0.13971975, 0.19200556, 0.10613277]
2018-12-18 14:56:46,477 : INFO : merging changes from 2000 documents into a model of 11914 documents
2018-12-18 14:56:46,522 : INFO : topic #2 (0.059): 0.004*"quot" + 0.004*"bar" + 0.004*"oil" + 0.003*"bars" + 0.003*"chocolate" + 0.003*"taste" + 0.002*"organic" + 0.002*"flavor" + 0.002*"protein" + 0.002*"century"
2018-12-18 14:56:46,524 : INFO : topic #0 (0.060): 0.006*"talk" + 0.004*"quot" + 0.003*"girls" + 0.003*"pluto" + 0.003*"life" + 0.002*"jack" + 0.002*"man" + 0.002*"new" + 0.002*"boys" + 0.002*"young"
2018-12-18 14:56:46,525 : INFO : topic #7 (0.140): 0.042*"book" + 0.013*"read" + 0.006*"books" + 0.005*"author" + 0.005*"people" + 0.005*"like" + 0.005*"reading" + 0.004*"good" + 0.004*"life" + 0.004*"written"
2018-12-18 14:56:46,526 : INFO : topic #4 (0.188): 0.011*"use" + 0.011*"product" + 0.011*"software" + 0.007*"program" + 0.007*"version" + 0.007*"time" + 0.006*"work" + 0.006*"new" + 0.005*"like" + 0.005*"support"
2018-12-18 14:56:46,528 : INFO : topic #8 (0.192): 0.030*"camera" + 0.010*"good" + 0.010*"great" + 0.009*"use" + 0.009*"pictures" + 0.008*"like" + 0.008*"bought" + 0.008*"quality" + 0.008*"lens" + 0.006*"time"
2018-12-18 14:56:46,529 : INFO : topic diff=0.147591, rho=0.316910
2018-12-18 14:56:47,468 : INFO : PROGRESS: pass 3, at document #8000/11914
2018-12-18 14:56:48,120 : INFO : optimized alpha [0.060520984, 0.12029553, 0.059623644, 0.07665714, 0.19110547, 0.069238216, 0.09773496, 0.1427711, 0.19572613, 0.10791825]
2018-12-18 14:56:48,124 : INFO : merging changes from 2000 documents into a model of 11914 documents
2018-12-18 14:56:48,174 : INFO : topic #2 (0.060): 0.004*"oil" + 0.003*"bars" + 0.003*"quot" + 0.003*"bar" + 0.003*"simon" + 0.002*"chocolate" + 0.002*"taste" + 0.002*"energy" + 0.002*"flavor" + 0.002*"flaxseed"
2018-12-18 14:56:48,175 : INFO : topic #0 (0.061): 0.005*"talk" + 0.004*"quot" + 0.003*"girls" + 0.003*"match" + 0.003*"life" + 0.003*"man" + 0.003*"boys" + 0.002*"young" + 0.002*"new" + 0.002*"pluto"
2018-12-18 14:56:48,177 : INFO : topic #7 (0.143): 0.040*"book" + 0.012*"read" + 0.006*"books" + 0.005*"like" + 0.005*"people" + 0.005*"author" + 0.005*"reading" + 0.004*"good" + 0.004*"life" + 0.004*"time"
2018-12-18 14:56:48,178 : INFO : topic #4 (0.191): 0.011*"product" + 0.011*"use" + 0.010*"software" + 0.008*"program" + 0.007*"version" + 0.007*"time" + 0.006*"work" + 0.006*"new" + 0.005*"like" + 0.005*"support"
2018-12-18 14:56:48,179 : INFO : topic #8 (0.196): 0.029*"camera" + 0.010*"great" + 0.010*"good" + 0.010*"use" + 0.009*"pictures" + 0.009*"lens" + 0.008*"like" + 0.008*"quality" + 0.008*"bought" + 0.006*"time"
2018-12-18 14:56:48,181 : INFO : topic diff=0.152778, rho=0.316910
2018-12-18 14:56:49,093 : INFO : PROGRESS: pass 3, at document #10000/11914
2018-12-18 14:56:49,720 : INFO : optimized alpha [0.060992148, 0.122971565, 0.060147185, 0.07726761, 0.19306761, 0.06994274, 0.099832535, 0.14456677, 0.20131853, 0.109953575]
2018-12-18 14:56:49,723 : INFO : merging changes from 2000 documents into a model of 11914 documents
2018-12-18 14:56:49,768 : INFO : topic #2 (0.060): 0.005*"oil" + 0.004*"bar" + 0.003*"bars" + 0.003*"flavor" + 0.003*"quot" + 0.003*"energy" + 0.003*"taste" + 0.002*"chocolate" + 0.002*"layout" + 0.002*"simon"
2018-12-18 14:56:49,770 : INFO : topic #0 (0.061): 0.004*"match" + 0.004*"talk" + 0.003*"quot" + 0.003*"jack" + 0.003*"life" + 0.003*"man" + 0.002*"young" + 0.002*"girls" + 0.002*"new" + 0.002*"boys"
2018-12-18 14:56:49,771 : INFO : topic #7 (0.145): 0.040*"book" + 0.012*"read" + 0.006*"books" + 0.005*"people" + 0.005*"author" + 0.005*"reading" + 0.005*"like" + 0.004*"good" + 0.004*"life" + 0.004*"time"
2018-12-18 14:56:49,773 : INFO : topic #4 (0.193): 0.011*"product" + 0.011*"use" + 0.010*"software" + 0.008*"program" + 0.007*"time" + 0.007*"version" + 0.006*"work" + 0.006*"new" + 0.005*"like" + 0.005*"support"
2018-12-18 14:56:49,774 : INFO : topic #8 (0.201): 0.028*"camera" + 0.010*"good" + 0.010*"use" + 0.010*"great" + 0.009*"lens" + 0.008*"pictures" + 0.008*"quality" + 0.008*"like" + 0.008*"bought" + 0.006*"time"
2018-12-18 14:56:49,775 : INFO : topic diff=0.132728, rho=0.316910
2018-12-18 14:56:51,591 : INFO : -8.884 per-word bound, 472.4 perplexity estimate based on a held-out corpus of 1914 documents with 106900 words
2018-12-18 14:56:51,591 : INFO : PROGRESS: pass 3, at document #11914/11914
2018-12-18 14:56:52,161 : INFO : optimized alpha [0.06136091, 0.12692146, 0.0607173, 0.07741995, 0.19764979, 0.07052825, 0.10087189, 0.1487561, 0.20603725, 0.112185344]
2018-12-18 14:56:52,165 : INFO : merging changes from 1914 documents into a model of 11914 documents
2018-12-18 14:56:52,211 : INFO : topic #2 (0.061): 0.006*"oil" + 0.005*"bar" + 0.004*"bars" + 0.003*"flaxseed" + 0.003*"flavor" + 0.003*"energy" + 0.003*"chocolate" + 0.003*"quot" + 0.003*"aspirin" + 0.003*"taste"
2018-12-18 14:56:52,212 : INFO : topic #0 (0.061): 0.004*"girls" + 0.003*"match" + 0.003*"talk" + 0.003*"quot" + 0.003*"boys" + 0.003*"life" + 0.003*"young" + 0.002*"lisbon" + 0.002*"jack" + 0.002*"trip"
2018-12-18 14:56:52,213 : INFO : topic #7 (0.149): 0.040*"book" + 0.013*"read" + 0.006*"books" + 0.005*"people" + 0.005*"reading" + 0.005*"author" + 0.005*"like" + 0.004*"time" + 0.004*"good" + 0.004*"life"
2018-12-18 14:56:52,215 : INFO : topic #4 (0.198): 0.011*"product" + 0.011*"use" + 0.010*"software" + 0.008*"program" + 0.007*"version" + 0.007*"time" + 0.006*"work" + 0.006*"new" + 0.005*"like" + 0.004*"support"
2018-12-18 14:56:52,217 : INFO : topic #8 (0.206): 0.028*"camera" + 0.010*"good" + 0.010*"use" + 0.010*"great" + 0.009*"lens" + 0.008*"pictures" + 0.008*"quality" + 0.008*"like" + 0.007*"bought" + 0.006*"time"
2018-12-18 14:56:52,218 : INFO : topic diff=0.124297, rho=0.316910
2018-12-18 14:56:53,151 : INFO : PROGRESS: pass 4, at document #2000/11914
2018-12-18 14:56:53,755 : INFO : optimized alpha [0.059583467, 0.12535296, 0.059051506, 0.07613088, 0.19572115, 0.06891588, 0.09910477, 0.14632666, 0.2079373, 0.11137755]
2018-12-18 14:56:53,759 : INFO : merging changes from 2000 documents into a model of 11914 documents
2018-12-18 14:56:53,804 : INFO : topic #2 (0.059): 0.006*"oil" + 0.004*"organic" + 0.004*"bar" + 0.004*"quot" + 0.003*"chocolate" + 0.003*"bars" + 0.002*"taste" + 0.002*"energy" + 0.002*"flaxseed" + 0.002*"flavor"
2018-12-18 14:56:53,806 : INFO : topic #0 (0.060): 0.004*"girls" + 0.004*"lisbon" + 0.004*"quot" + 0.003*"trip" + 0.003*"boys" + 0.003*"talk" + 0.003*"match" + 0.003*"life" + 0.002*"young" + 0.002*"jack"
2018-12-18 14:56:53,808 : INFO : topic #7 (0.146): 0.039*"book" + 0.012*"read" + 0.006*"books" + 0.005*"people" + 0.005*"author" + 0.005*"like" + 0.005*"reading" + 0.004*"time" + 0.004*"good" + 0.004*"life"
2018-12-18 14:56:53,809 : INFO : topic #4 (0.196): 0.012*"product" + 0.011*"use" + 0.011*"software" + 0.008*"program" + 0.007*"version" + 0.007*"time" + 0.006*"work" + 0.006*"new" + 0.005*"like" + 0.004*"support"
2018-12-18 14:56:53,810 : INFO : topic #8 (0.208): 0.028*"camera" + 0.010*"good" + 0.010*"use" + 0.010*"great" + 0.009*"lens" + 0.008*"pictures" + 0.008*"like" + 0.008*"quality" + 0.007*"bought" + 0.006*"time"
2018-12-18 14:56:53,812 : INFO : topic diff=0.121894, rho=0.302102
2018-12-18 14:56:54,673 : INFO : PROGRESS: pass 4, at document #4000/11914
2018-12-18 14:56:55,208 : INFO : optimized alpha [0.05927282, 0.12558211, 0.058459375, 0.07583372, 0.19716793, 0.06849651, 0.09843996, 0.14702544, 0.2086132, 0.11191535]
2018-12-18 14:56:55,212 : INFO : merging changes from 2000 documents into a model of 11914 documents
2018-12-18 14:56:55,258 : INFO : topic #2 (0.058): 0.004*"oil" + 0.004*"bar" + 0.003*"bars" + 0.003*"chocolate" + 0.003*"quot" + 0.003*"organic" + 0.003*"energy" + 0.003*"taste" + 0.002*"flavor" + 0.002*"sugar"
2018-12-18 14:56:55,259 : INFO : topic #0 (0.059): 0.005*"pluto" + 0.004*"girls" + 0.004*"quot" + 0.003*"jack" + 0.003*"life" + 0.002*"talk" + 0.002*"young" + 0.002*"lisbon" + 0.002*"boys" + 0.002*"trip"
2018-12-18 14:56:55,261 : INFO : topic #7 (0.147): 0.041*"book" + 0.013*"read" + 0.006*"books" + 0.005*"people" + 0.005*"author" + 0.005*"like" + 0.005*"reading" + 0.004*"good" + 0.004*"time" + 0.004*"life"
2018-12-18 14:56:55,262 : INFO : topic #4 (0.197): 0.012*"product" + 0.011*"use" + 0.010*"software" + 0.008*"version" + 0.007*"program" + 0.007*"time" + 0.006*"work" + 0.006*"new" + 0.005*"like" + 0.004*"support"
2018-12-18 14:56:55,265 : INFO : topic #8 (0.209): 0.028*"camera" + 0.010*"good" + 0.010*"use" + 0.010*"great" + 0.009*"lens" + 0.008*"pictures" + 0.008*"like" + 0.008*"quality" + 0.008*"bought" + 0.006*"time"
2018-12-18 14:56:55,267 : INFO : topic diff=0.119259, rho=0.302102
2018-12-18 14:56:56,116 : INFO : PROGRESS: pass 4, at document #6000/11914
2018-12-18 14:56:56,666 : INFO : optimized alpha [0.059031352, 0.12719044, 0.05815531, 0.07570896, 0.19876207, 0.06848211, 0.098980345, 0.14748886, 0.21217017, 0.11158238]
2018-12-18 14:56:56,670 : INFO : merging changes from 2000 documents into a model of 11914 documents
2018-12-18 14:56:56,713 : INFO : topic #2 (0.058): 0.004*"bar" + 0.004*"oil" + 0.004*"bars" + 0.003*"quot" + 0.003*"chocolate" + 0.003*"taste" + 0.002*"organic" + 0.002*"flavor" + 0.002*"protein" + 0.002*"sugar"
2018-12-18 14:56:56,714 : INFO : topic #0 (0.059): 0.006*"talk" + 0.004*"girls" + 0.003*"pluto" + 0.003*"quot" + 0.003*"life" + 0.003*"jack" + 0.002*"boys" + 0.002*"young" + 0.002*"larry" + 0.002*"interview"
2018-12-18 14:56:56,716 : INFO : topic #7 (0.147): 0.041*"book" + 0.013*"read" + 0.006*"books" + 0.005*"author" + 0.005*"people" + 0.005*"like" + 0.005*"reading" + 0.004*"good" + 0.004*"life" + 0.004*"written"
2018-12-18 14:56:56,717 : INFO : topic #4 (0.199): 0.011*"product" + 0.011*"use" + 0.011*"software" + 0.008*"program" + 0.007*"version" + 0.007*"time" + 0.006*"new" + 0.006*"work" + 0.005*"like" + 0.005*"support"
2018-12-18 14:56:56,719 : INFO : topic #8 (0.212): 0.029*"camera" + 0.010*"good" + 0.010*"great" + 0.010*"use" + 0.009*"pictures" + 0.009*"lens" + 0.008*"like" + 0.008*"quality" + 0.008*"bought" + 0.006*"time"
2018-12-18 14:56:56,721 : INFO : topic diff=0.113383, rho=0.302102
2018-12-18 14:56:57,629 : INFO : PROGRESS: pass 4, at document #8000/11914
2018-12-18 14:56:58,205 : INFO : optimized alpha [0.059157416, 0.12901017, 0.058562417, 0.0754325, 0.201494, 0.06851614, 0.09967826, 0.15024628, 0.21519847, 0.11322299]
2018-12-18 14:56:58,209 : INFO : merging changes from 2000 documents into a model of 11914 documents
2018-12-18 14:56:58,253 : INFO : topic #2 (0.059): 0.005*"oil" + 0.003*"bars" + 0.003*"bar" + 0.003*"simon" + 0.003*"taste" + 0.003*"chocolate" + 0.002*"energy" + 0.002*"flavor" + 0.002*"flaxseed" + 0.002*"quot"
2018-12-18 14:56:58,254 : INFO : topic #0 (0.059): 0.005*"talk" + 0.003*"match" + 0.003*"girls" + 0.003*"quot" + 0.003*"life" + 0.003*"boys" + 0.003*"young" + 0.002*"pluto" + 0.002*"man" + 0.002*"interview"
2018-12-18 14:56:58,256 : INFO : topic #7 (0.150): 0.040*"book" + 0.012*"read" + 0.006*"books" + 0.005*"people" + 0.005*"like" + 0.005*"author" + 0.005*"reading" + 0.004*"good" + 0.004*"life" + 0.004*"time"
2018-12-18 14:56:58,257 : INFO : topic #4 (0.201): 0.011*"product" + 0.011*"use" + 0.010*"software" + 0.008*"program" + 0.007*"version" + 0.007*"time" + 0.006*"work" + 0.006*"new" + 0.005*"like" + 0.005*"support"
2018-12-18 14:56:58,259 : INFO : topic #8 (0.215): 0.028*"camera" + 0.010*"use" + 0.010*"great" + 0.010*"good" + 0.010*"lens" + 0.009*"pictures" + 0.008*"like" + 0.008*"quality" + 0.008*"bought" + 0.006*"time"
2018-12-18 14:56:58,261 : INFO : topic diff=0.120880, rho=0.302102
2018-12-18 14:56:59,139 : INFO : PROGRESS: pass 4, at document #10000/11914
2018-12-18 14:56:59,680 : INFO : optimized alpha [0.05974569, 0.13139239, 0.059190955, 0.076095715, 0.20303233, 0.069254205, 0.1017349, 0.15186909, 0.21969807, 0.11506421]
2018-12-18 14:56:59,684 : INFO : merging changes from 2000 documents into a model of 11914 documents
2018-12-18 14:56:59,728 : INFO : topic #2 (0.059): 0.005*"oil" + 0.004*"bar" + 0.003*"bars" + 0.003*"flavor" + 0.003*"energy" + 0.003*"taste" + 0.002*"chocolate" + 0.002*"quot" + 0.002*"simon" + 0.002*"layout"
2018-12-18 14:56:59,729 : INFO : topic #0 (0.060): 0.004*"match" + 0.004*"talk" + 0.003*"jack" + 0.003*"quot" + 0.003*"life" + 0.003*"young" + 0.002*"girls" + 0.002*"boys" + 0.002*"interview" + 0.002*"man"
2018-12-18 14:56:59,731 : INFO : topic #7 (0.152): 0.039*"book" + 0.012*"read" + 0.006*"books" + 0.005*"people" + 0.005*"author" + 0.005*"reading" + 0.005*"like" + 0.004*"life" + 0.004*"good" + 0.004*"time"
2018-12-18 14:56:59,734 : INFO : topic #4 (0.203): 0.011*"product" + 0.011*"use" + 0.010*"software" + 0.008*"program" + 0.007*"time" + 0.007*"version" + 0.006*"work" + 0.006*"new" + 0.005*"support" + 0.005*"like"
2018-12-18 14:56:59,735 : INFO : topic #8 (0.220): 0.028*"camera" + 0.010*"use" + 0.010*"good" + 0.010*"great" + 0.009*"lens" + 0.008*"pictures" + 0.008*"quality" + 0.008*"like" + 0.007*"bought" + 0.006*"time"
2018-12-18 14:56:59,737 : INFO : topic diff=0.106338, rho=0.302102
2018-12-18 14:57:01,543 : INFO : -8.859 per-word bound, 464.3 perplexity estimate based on a held-out corpus of 1914 documents with 106900 words
2018-12-18 14:57:01,544 : INFO : PROGRESS: pass 4, at document #11914/11914
2018-12-18 14:57:02,087 : INFO : optimized alpha [0.060151167, 0.13510872, 0.059808377, 0.076338075, 0.2068379, 0.06986798, 0.10278567, 0.15559638, 0.22334093, 0.11710954]
2018-12-18 14:57:02,091 : INFO : merging changes from 1914 documents into a model of 11914 documents
2018-12-18 14:57:02,138 : INFO : topic #2 (0.060): 0.006*"oil" + 0.005*"bar" + 0.004*"bars" + 0.003*"flavor" + 0.003*"flaxseed" + 0.003*"energy" + 0.003*"chocolate" + 0.003*"taste" + 0.003*"aspirin" + 0.002*"ingredients"
2018-12-18 14:57:02,139 : INFO : topic #0 (0.060): 0.004*"girls" + 0.004*"match" + 0.003*"talk" + 0.003*"boys" + 0.003*"life" + 0.003*"young" + 0.003*"lisbon" + 0.003*"jack" + 0.002*"trip" + 0.002*"quot"
2018-12-18 14:57:02,141 : INFO : topic #7 (0.156): 0.040*"book" + 0.013*"read" + 0.006*"books" + 0.005*"people" + 0.005*"reading" + 0.005*"author" + 0.005*"like" + 0.004*"time" + 0.004*"life" + 0.004*"good"
2018-12-18 14:57:02,143 : INFO : topic #4 (0.207): 0.012*"product" + 0.011*"use" + 0.010*"software" + 0.008*"program" + 0.007*"version" + 0.007*"time" + 0.006*"work" + 0.006*"new" + 0.005*"like" + 0.005*"support"
2018-12-18 14:57:02,145 : INFO : topic #8 (0.223): 0.028*"camera" + 0.010*"good" + 0.010*"use" + 0.010*"great" + 0.009*"lens" + 0.008*"pictures" + 0.008*"quality" + 0.008*"like" + 0.007*"bought" + 0.006*"time"
2018-12-18 14:57:02,147 : INFO : topic diff=0.101024, rho=0.302102

Inspecting topics

The function show_topic(t, n) will display the word distribution in topic t, sorted by the word probabilities. The n most probable words will be shown.

In [5]:
model.show_topic(5)
Out[5]:
[('film', 0.029852536),
 ('movie', 0.010055234),
 ('films', 0.004800593),
 ('horror', 0.00475024),
 ('story', 0.0038416996),
 ('scene', 0.0034877707),
 ('action', 0.0033171456),
 ('like', 0.0032384025),
 ('dvd', 0.003099864),
 ('scenes', 0.0028694542)]

Show top 10 words in all the topics (excluding the probabilities).

In most cases, you will get some topics that correspond nicely to the product categories. However, the inference method used by gensim (stochastic variational inference) has a certain degree of randomness, which means that you won't get exactly the same topics each time you run the software.

In [6]:
for topic_id in range(model.num_topics):
    topk = model.show_topic(topic_id, 10)
    topk_words = [ w for w, _ in topk ]
    
    print('{}: {}'.format(topic_id, ' '.join(topk_words)))
0: girls match talk boys life young lisbon jack trip quot
1: movie like film story good great love time people dvd
2: oil bar bars flavor flaxseed energy chocolate taste aspirin ingredients
3: video tax year program return dvd state screen route format
4: product use software program version time work new like support
5: film movie films horror story scene action like dvd scenes
6: like game kids fun good play children movie love old
7: book read books people reading author like time life good
8: camera good use great lens pictures quality like bought time
9: album music like songs song quot good great sound best

Predicting the topics for a document

If you have a new document, you can use the trained model to estimate the topic proportions for it.

This is done in two steps: first, the document is converted into a matrix, and then the inference is carried out.

In [7]:
doc = 'this book describes windows software'.split()

doc_vector = model.id2word.doc2bow(doc)
doc_topics = model[doc_vector]

The result shows predicted topic distribution. In most cases, there will be one or more dominant topics, and small probabilities for the rest of the topics.

For instance, for the document this book describes Windows software, we will typically get a result that this document is a mix of a book-related topic and a software-related topic. (Compare to the topic list you got above.) Again, the exact result here will vary between executions because of issues related to random number generation.

In [8]:
doc_topics
Out[8]:
[(0, 0.011552104),
 (1, 0.025947856),
 (2, 0.01148627),
 (3, 0.01466086),
 (4, 0.42382663),
 (5, 0.013418236),
 (6, 0.019740112),
 (7, 0.41398397),
 (8, 0.042892892),
 (9, 0.022491027)]