Afra Ahmad <afra@patientiq.io> &
Jimmy Skuros <jimmy@patientiq.io> &
Matt Gitelis <matt@patientiq.io>
sent me a question about Chi squared independence test question which I cut
and pasted below, along with my response:
"...regarding an issue we encountered with the "*Chisquared independence
test*" in Madlib. We are huge fans of Madlib but are having trouble
implementing this one test. Can you please confirm that the documentation
below is correct (from most recent docs here:
http://doc.madlib.net/latest/group__grp__stats__tests.html)?
Also, what are we supposed to do to calculate the expected values? Any
pointers would be greatly appreciated!
Thanks,
Matt"
>From Frank:
"The MADlib software is correct, but just the docs are wrong. I already
fixed them and made a pull request. The JIRA is
https://issues.apache.org/jira/browse/MADLIB895
The correct query for chi square independence test is attached
How to calculate expected value:
The Chisquared independence test actually uses the Chisquared
goodnessoffit function.
The expected value needs to be computed in the SQL and passed
to the goodnessoffit function. The expected value formula for MADlib is
computed as
sum of rows * sum of columns, for each element of the input matrix. For
e.g., expected value
for element (2,1) would be sum of row 2 * sum of column 1."
