madlib-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank McQuillan <fmcquil...@pivotal.io>
Subject Chi squared independence test question
Date Wed, 24 Feb 2016 00:38:59 GMT
Afra Ahmad <afra@patientiq.io> &
Jimmy Skuros <jimmy@patientiq.io> &
Matt Gitelis <matt@patientiq.io>

sent me a question about Chi squared independence test question which I cut
and pasted below, along with my response:

"...regarding an issue we encountered with the "*Chi-squared independence
test*" in Madlib.  We are huge fans of Madlib but are having trouble
implementing this one test.  Can you please confirm that the documentation
below is correct (from most recent docs here:
http://doc.madlib.net/latest/group__grp__stats__tests.html)?

Also, what are we supposed to do to calculate the expected values?  Any
pointers would be greatly appreciated!

Thanks,
Matt"

>From Frank:

"The MADlib software is correct, but just the docs are wrong.  I already
fixed them and made a pull request.  The JIRA is
https://issues.apache.org/jira/browse/MADLIB-895

The correct query for chi square independence test is attached

How to calculate expected value:

The Chi-squared independence test actually uses the Chi-squared
goodness-of-fit function.
The expected value needs to be computed in the SQL and passed
to the goodness-of-fit function. The expected value formula for MADlib is
computed as
sum of rows * sum of columns, for each element of the input matrix. For
e.g., expected value
for element (2,1) would be sum of row 2 * sum of column 1."

Mime
View raw message