phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Gupta <abhila...@gmail.com>
Subject Re: Disable NO_CACHE hint on query for LIMIT OFFSET paging queries
Date Wed, 22 Aug 2018 11:39:29 GMT
The thing is that RVC will not work for case where the SQL query does
aggregation on truncated primary key columns
eg. SELECT pk1,
        TO_CHAR(CONVERT_TZ(TO_DATE(time_col,'yyyyMMddHHmm'),
'UTC','Asia/Kolkata'),'yyyyMMdd') as truncated_time_sample,
        SUM(col1) as agg_value,
     FROM ... GROUP BY pk1, truncated_time_sample..LIMIT 1000 OFFSET 1000

This is the reason to pick LIMIT OFFSET for pagination.

>From your answer it seems for OFFSET queries, all the previous pages rows
are scanned aggregated to rows and rejected till offset. So in such a
scenario wouldn't those rows in block cache help during the pagination
duration.

Thanks,
Abhishek

On Wed, Aug 22, 2018 at 12:07 AM Thomas D'Silva <tdsilva@salesforce.com>
wrote:

> When you do an OFFSET Phoenix will scan rows and filter them out until it
> reaches the offset count which can end up being very costly for large
> offsets.
> If you can use a RVC where the order matches the PK of the data table or
> index the start key of the scan will be set based on the RVC, which is much
> more efficient
> (see http://phoenix.apache.org/paged.html).
>
> On Tue, Aug 21, 2018 at 8:06 AM, Abhishek Gupta <abhilater@gmail.com>
> wrote:
>
>> Hi,
>>
>> Could you help me understand how LIMIT OFFSET queries work under the hood
>> in Phoenix, is the filtering out of rows done in heap or is there some sort
>> of optimisation where it can skip at disk level.
>> My idea about posting this question was to understand if the rows from
>> paste pages of the query in block cache can optimize the subsequent page
>> call that would use the cache hits for the filter out rows and not seek at
>> disk.
>>
>> Thanks,
>> Abhishek
>>
>> On Sat, Aug 18, 2018 at 4:16 AM Thomas D'Silva <tdsilva@salesforce.com>
>> wrote:
>>
>>> Shouldn't you pass the NO_CACHE hint for the LIMIT-OFFSET queries,
>>> since you will be reading and filtering out lots of rows on the server?
>>> I guess using the block cache for RVC queries might help depending on
>>> how many rows you read per query, you should be able to easily test this
>>> out.
>>>
>>> On Fri, Aug 17, 2018 at 4:25 AM, Abhishek Gupta <abhilater@gmail.com>
>>> wrote:
>>>
>>>> Hi Team,
>>>>
>>>> I am working on a use case where SQL aggregated queries are made such
>>>> that RVC cannot be used (aggregation on truncated primary key columns)
>>>> instead LIMIT-OFFSET has to be used. RVC is used for some user user cases
>>>>
>>>> Currently I have disabled BLOCKCACHE for the table. I wanted to check
>>>> if it would be more performant to instead enable BLOCKCACHE on the
>>>> table and pass NO_CACHE hint for RVC queries because it uses
>>>> non-LIMIT-OFFSET scans and not pass NO_CACHE for the LIMIT-OFFSET
>>>> queries so that for the subsequent page calls can leverage prior page data
>>>> in block cache.
>>>>
>>>> Thanks,
>>>> Abhishek
>>>>
>>>
>>>
>

Mime
View raw message