phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gerber, Bryan W" <Bryan.Ger...@pnnl.gov>
Subject RE: Query Hints on Functional Index
Date Wed, 01 Jul 2015 18:31:50 GMT
PHOENIX-2094 created.

Covering the queries isn’t a viable option right now, the table is 5TB already and we have
multiple indexes where we are trying to optimize case-insensitive queries. Fortunately we
can brute-force the plan with the JOIN & subselect to get what we need near-term.

Bryan G.

From: James Taylor [mailto:jamestaylor@apache.org]
Sent: Tuesday, June 30, 2015 6:56 PM
To: user
Subject: Re: Query Hints on Functional Index

Thanks for the detail, Bryan. Looks like a bug (as you've concluded) in the hinting mechanism
wrt functional indexes. Would you mind filing a JIRA? FWIW, I tried the following and when
only columns contained in the functional index are used, the functional index is used as desired
(see below).

Might be worth trying to include those 30+ cols in the functional index. Your query performance
will be much better. Have you tried that?

Thanks,
James

0: jdbc:phoenix:localhost> EXPLAIN SELECT /*+ INDEX(LOG LOG_LOWER_REQUEST_IDX) */ * FROM
LOG WHERE LOWER(RQ)='/jquery';
+------------------------------------------+
|                   PLAN                   |
+------------------------------------------+
| CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
|     SERVER FILTER BY LOWER(RQ) = '/jquery' |
+------------------------------------------+
2 rows selected (0.022 seconds)
0: jdbc:phoenix:localhost> EXPLAIN SELECT F FROM LOG WHERE LOWER(RQ) LIKE '/jquery%';
+------------------------------------------+
|                   PLAN                   |
+------------------------------------------+
| CLIENT 40-CHUNK PARALLEL 40-WAY RANGE SCAN OVER LOG_LOWER_REQUEST_IDX [0,'/jquery'] - [0,'/jquerz']
|
|     SERVER FILTER BY FIRST KEY ONLY      |
+------------------------------------------+


On Tue, Jun 30, 2015 at 1:54 PM, Gerber, Bryan W <Bryan.Gerber@pnnl.gov<mailto:Bryan.Gerber@pnnl.gov>>
wrote:
I am trying to replace case-insensitive query using JOIN/Subselect with one using a query
hint. This has worked well for our fixed-case fields. So far I have been unable to convince
Phoenix to use a functional index on a SELECT * query.  Is this a bug, or is it intended for
functional indexes to ignore hints?

This is the query syntax we are trying to replace using query hints (Avg time 0.25s):

EXPLAIN SELECT * FROM LOG INNER JOIN (SELECT TS,F,R FROM LOG WHERE LOWER(RQ) LIKE '/jquery%')
AS A ON  (LOG.TS = A.TS AND LOG.F=A.F AND LOG.R=A.R);
+------------------------------------------+
| CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
|     PARALLEL INNER-JOIN TABLE 0          |
|         CLIENT 40-CHUNK PARALLEL 40-WAY RANGE SCAN OVER LOG_LOWER_REQUEST_IDX [0,'/jquery']
- [0,'/jquerz'] |
|             SERVER FILTER BY FIRST KEY ONLY |
|     DYNAMIC SERVER FILTER BY (LOG.TS, LOG.F, LOG.R) IN ((A.TS, A.F, A.R)) |
+------------------------------------------+

This is closer to the query we want - Hint on the non-functional index generates an expected,
but non-optimal due to the ILIKE/function on RQ (avg time 0.78s)
EXPLAIN SELECT /*+ INDEX(LOG LOG_REQUEST_IDX) */ * FROM LOG WHERE RQ ILIKE '/jquery%';
+------------------------------------------+
| CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
|     SKIP-SCAN-JOIN TABLE 0               |
|         CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG_REQUEST_IDX |
|             SERVER FILTER BY FIRST KEY ONLY AND "RQ" LIKE '/jquery%' |
|     DYNAMIC SERVER FILTER BY ("LOG.TS", "LOG.F", "LOG.R") IN (($707.$709, $707.$710, $707.$711))
|
+------------------------------------------+

This is what we REALLY want it to do, but the hint is ignored for the functional index (Avg
2.57s)
 EXPLAIN SELECT /*+ INDEX(LOG LOG_LOWER_REQUEST_IDX) */ * FROM LOG WHERE LOWER(RQ) LIKE '/jquery%';
+------------------------------------------+
| CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
|     SERVER FILTER BY LOWER(RQ) LIKE '/jquery%' |
+------------------------------------------+

Test table has 2.9 million records; production table is many orders of magnitude larger. Table
is actually much wider than sample schema below (30+ cols) so INCLUDE() isn't viable.  Users
want all the columns, all the time.
Here’s a simplified schema for the table:
CREATE TABLE IF NOT EXISTS LOG
(
    TS VARCHAR NOT NULL,
    f VARCHAR NOT NULL,
    r INTEGER NOT NULL,
    sa VARCHAR,
    da VARCHAR,
    rq VARCHAR
CONSTRAINT pkey PRIMARY KEY (TS, f, r)
) TTL='5616000',KEEP_DELETED_CELLS='false',IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=40,MAX_FILESIZE='10000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';

CREATE INDEX IF NOT EXISTS LOG_LOWER_REQUEST_IDX  ON LOG(LOWER(rq)) TTL='5616000',KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',MAX_FILESIZE='10000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
CREATE INDEX IF NOT EXISTS LOG_REQUEST_IDX  ON LOG(rq) TTL='5616000',KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',MAX_FILESIZE='10000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
Bryan G.


Mime
View raw message