Hi Nandish,

Thanks for looking into this. I just create a new issue on JIRA about this: https://issues.apache.org/jira/browse/MADLIB-1067Mauricio

On Thu, Feb 9, 2017 at 10:29 PM, Frank McQuillan <fmcquillan@pivotal.io> wrote:

Mauricio,Is the time difference that you observed material? i.e., is that an important difference for your use case?FrankOn Thu, Feb 9, 2017 at 11:07 PM, Nandish Jayaram <njayaram@pivotal.io> wrote:Hi Mauricio,I briefly looked through the code and it seems like the dot product in cosine_similarity is based on what is there in the Eigen library.The dot product in array_dot seems to be using a native implementation of the same. Apparently, dot product in Eigen is faster thanthe native implementation. Looks like it might be a good idea to move array_dot also to Eigen based dot product!NJOn Thu, Feb 9, 2017 at 10:19 AM, Mauricio Scheffer <mauricioscheffer@gmail.com> wrote:Thanks,Can anyone shed some light on this?On my machine, cosine_similarity takes 1.3s while array_dot takes 3s, which is rather unexpected... I would have expected a dot product to be much faster than calculating cosine similarity.Then I ran array_dot and cosine_similarity on it:So first I set up some test data (1000000 rows of 150-element float8[])Hi,I just started evaluating MADlib and one of the first things I tried is how it performs for dot product and cosine similarity.

select * from (

select cosine_similarity -- or array_dot

(a_vector, (select array_agg(random()::float8) from generate_series(0, 150))) c

from vectors

) x

order by c desc

limit 10Mauricio