Mauricio,
=
Is the time difference that you observed material? i.e., is th= at an important difference for your use case?

Frank

On Thu, Feb 9, 2017 at 11:07 PM, Nandish Jayaram wrote:
Hi Mauricio,

I briefly looked through the code and = it seems like the dot product in cosine_similarity is based on what is ther= e in the Eigen library.=C2=A0
The dot product in array_dot seems = to be using a native implementation of the same. Apparently, dot product in= Eigen is faster than
the native implementation. Looks like it mi= ght be a good idea to move array_dot also to Eigen based dot product!
=

NJ
=

On Thu, Feb 9, 2017 at 10:19 AM,= Mauricio Scheffer wro= te:
=
Hi,

I just started evaluating MADlib and one of the = first things I tried is how it performs for dot product and cosine similari= ty.

So first I set up some test data (1000000 rows of 150-elem= ent float8[])
Then I ran array_dot and cosine_similarity on it:
select * from (
=C2=A0 select cosine_similarity -- or array_dot
= =C2=A0=C2=A0=C2=A0 (a_vector, (select array_agg(random()::float8) from gene= rate_series(0, 150))) c
=C2=A0=C2=A0=C2=A0 from vectors
) x
order = by c desc
limit 10

On my machine, cosine_similarity takes 1= .3s while array_dot takes 3s, which is rather unexpected... I would have ex= pected a dot product to be much faster than calculating cosine similarity.<= br>
Can anyone shed some light on this?

Thanks,
=
Mauricio

--94eb2c12543a4f334a0548208390--