Hi Nandish,

Thanks for looking into this. I just create a new issue on JIRA about this: https://issues.apache.org/jira/browse/MADLIB-1067

Frank: at first we're looking into using MADlib for cosine similarity only. If that goes well we might use it for other operations that will need good performance in dot products.

Cheers,
Mauricio




--
Mauricio

On Thu, Feb 9, 2017 at 10:29 PM, Frank McQuillan <fmcquillan@pivotal.io> wrote:
Mauricio,

Is the time difference that you observed material? i.e., is that an important difference for your use case?

Frank

On Thu, Feb 9, 2017 at 11:07 PM, Nandish Jayaram <njayaram@pivotal.io> wrote:
Hi Mauricio,

I briefly looked through the code and it seems like the dot product in cosine_similarity is based on what is there in the Eigen library. 
The dot product in array_dot seems to be using a native implementation of the same. Apparently, dot product in Eigen is faster than
the native implementation. Looks like it might be a good idea to move array_dot also to Eigen based dot product!

NJ

On Thu, Feb 9, 2017 at 10:19 AM, Mauricio Scheffer <mauricioscheffer@gmail.com> wrote:
Hi,

I just started evaluating MADlib and one of the first things I tried is how it performs for dot product and cosine similarity.

So first I set up some test data (1000000 rows of 150-element float8[])
Then I ran array_dot and cosine_similarity on it:

select * from (
  select cosine_similarity -- or array_dot
    (a_vector, (select array_agg(random()::float8) from generate_series(0, 150))) c
    from vectors
) x
order by c desc
limit 10

On my machine, cosine_similarity takes 1.3s while array_dot takes 3s, which is rather unexpected... I would have expected a dot product to be much faster than calculating cosine similarity.
Can anyone shed some light on this?

Thanks,
Mauricio