madlib-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank McQuillan <fmcquil...@pivotal.io>
Subject Re: Performance of array_dot vs cosine_similarity
Date Thu, 09 Feb 2017 22:29:22 GMT
Mauricio,

Is the time difference that you observed material? i.e., is that an
important difference for your use case?

Frank

On Thu, Feb 9, 2017 at 11:07 PM, Nandish Jayaram <njayaram@pivotal.io>
wrote:

> Hi Mauricio,
>
> I briefly looked through the code and it seems like the dot product in
> cosine_similarity is based on what is there in the Eigen library.
> The dot product in array_dot seems to be using a native implementation of
> the same. Apparently, dot product in Eigen is faster than
> the native implementation. Looks like it might be a good idea to move
> array_dot also to Eigen based dot product!
>
> NJ
>
> On Thu, Feb 9, 2017 at 10:19 AM, Mauricio Scheffer <
> mauricioscheffer@gmail.com> wrote:
>
>> Hi,
>>
>> I just started evaluating MADlib and one of the first things I tried is
>> how it performs for dot product and cosine similarity.
>>
>> So first I set up some test data (1000000 rows of 150-element float8[])
>> Then I ran array_dot and cosine_similarity on it:
>>
>> select * from (
>>   select cosine_similarity -- or array_dot
>>     (a_vector, (select array_agg(random()::float8) from
>> generate_series(0, 150))) c
>>     from vectors
>> ) x
>> order by c desc
>> limit 10
>>
>> On my machine, cosine_similarity takes 1.3s while array_dot takes 3s,
>> which is rather unexpected... I would have expected a dot product to be
>> much faster than calculating cosine similarity.
>> Can anyone shed some light on this?
>>
>> Thanks,
>> Mauricio
>>
>
>

Mime
View raw message