phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas D'Silva" <tdsi...@salesforce.com>
Subject Re: Disable NO_CACHE hint on query for LIMIT OFFSET paging queries
Date Thu, 23 Aug 2018 16:47:17 GMT
If the offset is very large you will end up scanning and filtering many
rows most of which won't be accessed again.
You need to test and see the performance of the NO_CACHE in your particular
use case to see if its helps.

On Wed, Aug 22, 2018 at 4:39 AM, Abhishek Gupta <abhilater@gmail.com> wrote:

> The thing is that RVC will not work for case where the SQL query does
> aggregation on truncated primary key columns
> eg. SELECT pk1,
>         TO_CHAR(CONVERT_TZ(TO_DATE(time_col,'yyyyMMddHHmm'),
> 'UTC','Asia/Kolkata'),'yyyyMMdd') as truncated_time_sample,
>         SUM(col1) as agg_value,
>      FROM ... GROUP BY pk1, truncated_time_sample..LIMIT 1000 OFFSET 1000
>
> This is the reason to pick LIMIT OFFSET for pagination.
>
> From your answer it seems for OFFSET queries, all the previous pages rows
> are scanned aggregated to rows and rejected till offset. So in such a
> scenario wouldn't those rows in block cache help during the pagination
> duration.
>
> Thanks,
> Abhishek
>
> On Wed, Aug 22, 2018 at 12:07 AM Thomas D'Silva <tdsilva@salesforce.com>
> wrote:
>
>> When you do an OFFSET Phoenix will scan rows and filter them out until it
>> reaches the offset count which can end up being very costly for large
>> offsets.
>> If you can use a RVC where the order matches the PK of the data table or
>> index the start key of the scan will be set based on the RVC, which is much
>> more efficient
>> (see http://phoenix.apache.org/paged.html).
>>
>> On Tue, Aug 21, 2018 at 8:06 AM, Abhishek Gupta <abhilater@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Could you help me understand how LIMIT OFFSET queries work under the
>>> hood in Phoenix, is the filtering out of rows done in heap or is there some
>>> sort of optimisation where it can skip at disk level.
>>> My idea about posting this question was to understand if the rows from
>>> paste pages of the query in block cache can optimize the subsequent page
>>> call that would use the cache hits for the filter out rows and not seek at
>>> disk.
>>>
>>> Thanks,
>>> Abhishek
>>>
>>> On Sat, Aug 18, 2018 at 4:16 AM Thomas D'Silva <tdsilva@salesforce.com>
>>> wrote:
>>>
>>>> Shouldn't you pass the NO_CACHE hint for the LIMIT-OFFSET queries,
>>>> since you will be reading and filtering out lots of rows on the server?
>>>> I guess using the block cache for RVC queries might help depending on
>>>> how many rows you read per query, you should be able to easily test this
>>>> out.
>>>>
>>>> On Fri, Aug 17, 2018 at 4:25 AM, Abhishek Gupta <abhilater@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Team,
>>>>>
>>>>> I am working on a use case where SQL aggregated queries are made such
>>>>> that RVC cannot be used (aggregation on truncated primary key columns)
>>>>> instead LIMIT-OFFSET has to be used. RVC is used for some user user cases
>>>>>
>>>>> Currently I have disabled BLOCKCACHE for the table. I wanted to check
>>>>> if it would be more performant to instead enable BLOCKCACHE on the
>>>>> table and pass NO_CACHE hint for RVC queries because it uses
>>>>> non-LIMIT-OFFSET scans and not pass NO_CACHE for the LIMIT-OFFSET
>>>>> queries so that for the subsequent page calls can leverage prior page
data
>>>>> in block cache.
>>>>>
>>>>> Thanks,
>>>>> Abhishek
>>>>>
>>>>
>>>>
>>

Mime
View raw message