From Gautam Muralidhar <gautam.s.muralid...@gmail.com>
Subject Re: Spark related question
Date Mon, 01 Feb 2016 05:19:21 GMT
```Hi Liang,

Step 4 gives you the per topic word distribution, i.e., the probability of the word 'w' occurring
in topic 'k'. Every topic is a distribution over words and this step gives you the distribution
for each of the topics.

P.S: the subject line says Spark related question. I am assuming the subject line was copied
from a different thread by mistake.

> On Jan 31, 2016, at 7:10 PM, Liang Quan <quanliang@gatech.edu> wrote:
> To whom this may concern,
> I'm a new subscriber of Madlib. First please allow me to extend my appreciation for what
you guys have accomplished. Madlib has a very user-friendly and accessible interface for entry-level
users. In addition, I have a question regarding the LDA function example in the link below,
> How is the probability of the each word calculated by the LDA function in Step 4 in the
table below? The frequency at which it appears in the document or something else? Your reply
is much appreciated, thanks.
>  topicid | wordid |        prob        |       word
> ---------+--------+--------------------+-------------------
>        1 |     69 |  0.181900726392252 | of
>        1 |     52 | 0.0608353510895884 | is
>        1 |     65 | 0.0608353510895884 | models
>        1 |     30 | 0.0305690072639225 | corpora
>        1 |      1 | 0.0305690072639225 | 1960s
>        1 |     57 | 0.0305690072639225 | latent
>        1 |     35 | 0.0305690072639225 | diverse
>        1 |     81 | 0.0305690072639225 | semantic
>        1 |     19 | 0.0305690072639225 | between
>        1 |     75 | 0.0305690072639225 | pitchers
>        1 |     43 | 0.0305690072639225 | for
>        1 |      6 | 0.0305690072639225 | also
>        1 |     40 | 0.0305690072639225 | favor
>        1 |     47 | 0.0305690072639225 | had
>        1 |     28 | 0.0305690072639225 | computational
