madlib-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank McQuillan <>
Subject Chi squared independence test question
Date Wed, 24 Feb 2016 00:38:59 GMT
Afra Ahmad <> &
Jimmy Skuros <> &
Matt Gitelis <>

sent me a question about Chi squared independence test question which I cut
and pasted below, along with my response:

"...regarding an issue we encountered with the "*Chi-squared independence
test*" in Madlib.  We are huge fans of Madlib but are having trouble
implementing this one test.  Can you please confirm that the documentation
below is correct (from most recent docs here:

Also, what are we supposed to do to calculate the expected values?  Any
pointers would be greatly appreciated!


>From Frank:

"The MADlib software is correct, but just the docs are wrong.  I already
fixed them and made a pull request.  The JIRA is

The correct query for chi square independence test is attached

How to calculate expected value:

The Chi-squared independence test actually uses the Chi-squared
goodness-of-fit function.
The expected value needs to be computed in the SQL and passed
to the goodness-of-fit function. The expected value formula for MADlib is
computed as
sum of rows * sum of columns, for each element of the input matrix. For
e.g., expected value
for element (2,1) would be sum of row 2 * sum of column 1."

View raw message