Hi Nandish,

Thanks for lo= oking into this. I just create a new issue on JIRA about this: https://issues.apache.org= /jira/browse/MADLIB-1067

Frank: at first we're looking= into using MADlib for cosine similarity only. If that goes well we might u= se it for other operations that will need good performance in dot products.=

Cheers,
Mauricio

--
Mauricio=

On Thu, Feb 9, 2017 at 10:29 PM, Frank McQui= llan wrote:
Mauricio,<= /span>

Is the time difference that you observed materi= al? i.e., is that an important difference for your use case?

Frank=

On Thu, Feb 9, 2017 at 11:= 07 PM, Nandish Jayaram wrote:
Hi Mauricio,

I briefly looked through the code and it seems like the dot product in co= sine_similarity is based on what is there in the Eigen library.=C2=A0
=
The dot product in array_dot seems to be using a native implementation= of the same. Apparently, dot product in Eigen is faster than
the= native implementation. Looks like it might be a good idea to move array_do= t also to Eigen based dot product!

NJ

O= n Thu, Feb 9, 2017 at 10:19 AM, Mauricio Scheffer wrote:
Hi,

I just started ev= aluating MADlib and one of the first things I tried is how it performs for = dot product and cosine similarity.

So first I set up some test= data (1000000 rows of 150-element float8[])
Then I ran array_dot = and cosine_similarity on it:

select * from (
=C2=A0 select cosine= _similarity -- or array_dot
=C2=A0=C2=A0=C2=A0 (a_vector, (select array_= agg(random()::float8) from generate_series(0, 150))) c
=C2=A0=C2=A0=C2= =A0 from vectors
) x
order by c desc
limit 10

On my m= achine, cosine_similarity takes 1.3s while array_dot takes 3s, which is rat= her unexpected... I would have expected a dot product to be much faster tha= n calculating cosine similarity.
Can anyone shed some light on thi= s?

Thanks,
Mauricio

--94eb2c1cd0e66d410305482a9944--