madlib-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nandish Jayaram <njaya...@pivotal.io>
Subject Re: Performance of array_dot vs cosine_similarity
Date Thu, 09 Feb 2017 22:07:17 GMT
Hi Mauricio,

I briefly looked through the code and it seems like the dot product in
cosine_similarity is based on what is there in the Eigen library.
The dot product in array_dot seems to be using a native implementation of
the same. Apparently, dot product in Eigen is faster than
the native implementation. Looks like it might be a good idea to move
array_dot also to Eigen based dot product!

NJ

On Thu, Feb 9, 2017 at 10:19 AM, Mauricio Scheffer <
mauricioscheffer@gmail.com> wrote:

> Hi,
>
> I just started evaluating MADlib and one of the first things I tried is
> how it performs for dot product and cosine similarity.
>
> So first I set up some test data (1000000 rows of 150-element float8[])
> Then I ran array_dot and cosine_similarity on it:
>
> select * from (
>   select cosine_similarity -- or array_dot
>     (a_vector, (select array_agg(random()::float8) from generate_series(0,
> 150))) c
>     from vectors
> ) x
> order by c desc
> limit 10
>
> On my machine, cosine_similarity takes 1.3s while array_dot takes 3s,
> which is rather unexpected... I would have expected a dot product to be
> much faster than calculating cosine similarity.
> Can anyone shed some light on this?
>
> Thanks,
> Mauricio
>

Mime
View raw message