phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Samarth Jain <samarth.j...@gmail.com>
Subject Re: Slow query help
Date Fri, 16 Mar 2018 18:08:42 GMT
A less resource intensive approach would be to use approx count distinct -
https://phoenix.apache.org/language/functions.html#approx_count_distinct

You would still need the secondary index though, as James suggested, if you
want it to run fast.

On Fri, Mar 16, 2018 at 10:26 AM Flavio Pompermaier <pompermaier@okkam.it>
wrote:

> Thanks for the tip James. I didn't know that syntax for doing the count on
> a distinct value!
> This version is able to end, the first one wasn't able to finish even
> giving a huge amount of memory to HBase (the cardinality of SOMEFIELD is
> very big indeed).
>
> Thanks a lot,
> Flavio
>
> On Fri, Mar 16, 2018 at 5:45 PM, James Taylor <jamestaylor@apache.org>
> wrote:
>
>> Hi Flavio,
>> You'll need to add a secondary index to SOMEFIELD (or SOMEFIELD + VALID)
>> to speed that up. You can write it more simply as SELECT COUNT(DISTINCT
>> SOMEFIELD) FROM TEST.MYTABLE WHERE VALID AND SOMEFIELD IS NOT NULL.
>>
>> Otherwise, you'll end up doing a full table scan (and use a fair amount
>> of memory depending on the cardinality of SOMEFIELD). The above with a
>> secondary index would skip to the distinct values instead.
>>
>> Thanks,
>> James
>>
>> On Fri, Mar 16, 2018 at 8:30 AM, Flavio Pompermaier <pompermaier@okkam.it
>> > wrote:
>>
>>> Hi to all,
>>> I'm running a query like this one on my Phoenix 4.13 (on CDH 5.11.2):
>>>
>>> SELECT COUNT(*) FROM (
>>>   SELECT DISTINCT(SOMEFIELD)
>>>   FROM TEST.MYTABLE
>>>   WHERE VALID = TRUE AND SOMEFIELD IS NOT NULL
>>> )
>>>
>>> Unfortunately the query timeouts (timeout is 10 min).... Any suggestion
>>> about how to tune my installation?
>>>
>>> Best,
>>> Flavi
>>>
>>
>

Mime
View raw message