madlib-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mauricio Scheffer <mauricioschef...@gmail.com>
Subject Performance of array_dot vs cosine_similarity
Date Thu, 09 Feb 2017 18:19:50 GMT
Hi,

I just started evaluating MADlib and one of the first things I tried is how
it performs for dot product and cosine similarity.

So first I set up some test data (1000000 rows of 150-element float8[])
Then I ran array_dot and cosine_similarity on it:

select * from (
  select cosine_similarity -- or array_dot
    (a_vector, (select array_agg(random()::float8) from generate_series(0,
150))) c
    from vectors
) x
order by c desc
limit 10

On my machine, cosine_similarity takes 1.3s while array_dot takes 3s, which
is rather unexpected... I would have expected a dot product to be much
faster than calculating cosine similarity.
Can anyone shed some light on this?

Thanks,
Mauricio

Mime
View raw message