madlib-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank McQuillan <fmcquil...@pivotal.io>
Subject Re: Performance of array_dot vs cosine_similarity
Date Thu, 09 Feb 2017 18:36:38 GMT
Let me look into that a bit more.  In the interim...

Here is a related blog from Satoshi on cosine similarity performance in
MADlib
https://translate.google.com/translate?sl=ja&tl=en&js=y&prev=_t&hl=ja&ie=UTF-8&u=http%3A%2F%2Fpgsqldeepdive.blogspot.jp%2F2017%2F01%2Fconsine-similarity-performance.html&edit-text=&act=url
(translated from Japanese)

This was related to an earlier thread on this mailing list that you may
have seen
https://mail-archives.apache.org/mod_mbox/incubator-madlib-user/201701.mbox/%3CCAA8sozcTMAOsGdCKTS2Whvx7c%2B08aNbRLPkijrvP6sZZsZie_g%40mail.gmail.com%3E


On Thu, Feb 9, 2017 at 7:19 PM, Mauricio Scheffer <
mauricioscheffer@gmail.com> wrote:

> Hi,
>
> I just started evaluating MADlib and one of the first things I tried is
> how it performs for dot product and cosine similarity.
>
> So first I set up some test data (1000000 rows of 150-element float8[])
> Then I ran array_dot and cosine_similarity on it:
>
> select * from (
>   select cosine_similarity -- or array_dot
>     (a_vector, (select array_agg(random()::float8) from generate_series(0,
> 150))) c
>     from vectors
> ) x
> order by c desc
> limit 10
>
> On my machine, cosine_similarity takes 1.3s while array_dot takes 3s,
> which is rather unexpected... I would have expected a dot product to be
> much faster than calculating cosine similarity.
> Can anyone shed some light on this?
>
> Thanks,
> Mauricio
>

Mime
View raw message