phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: Query Hints on Functional Index
Date Wed, 01 Jul 2015 01:55:44 GMT
Thanks for the detail, Bryan. Looks like a bug (as you've concluded) in the
hinting mechanism wrt functional indexes. Would you mind filing a JIRA?
FWIW, I tried the following and when only columns contained in the
functional index are used, the functional index is used as desired (see
below).

Might be worth trying to include those 30+ cols in the functional index.
Your query performance will be much better. Have you tried that?

Thanks,
James

0: jdbc:phoenix:localhost> EXPLAIN SELECT /*+ INDEX(LOG
LOG_LOWER_REQUEST_IDX) */ * FROM LOG WHERE LOWER(RQ)='/jquery';
+------------------------------------------+
|                   PLAN                   |
+------------------------------------------+
| CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
|     SERVER FILTER BY LOWER(RQ) = '/jquery' |
+------------------------------------------+
2 rows selected (0.022 seconds)
0: jdbc:phoenix:localhost> EXPLAIN SELECT F FROM LOG WHERE LOWER(RQ) LIKE
'/jquery%';
+------------------------------------------+
|                   PLAN                   |
+------------------------------------------+
| CLIENT 40-CHUNK PARALLEL 40-WAY RANGE SCAN OVER LOG_LOWER_REQUEST_IDX
[0,'/jquery'] - [0,'/jquerz'] |
|     SERVER FILTER BY FIRST KEY ONLY      |
+------------------------------------------+


On Tue, Jun 30, 2015 at 1:54 PM, Gerber, Bryan W <Bryan.Gerber@pnnl.gov>
wrote:

>  I am trying to replace case-insensitive query using JOIN/Subselect with
> one using a query hint. This has worked well for our fixed-case fields. So
> far I have been unable to convince Phoenix to use a functional index on a
> SELECT * query.  Is this a bug, or is it intended for functional indexes to
> ignore hints?
>
>
>
> This is the query syntax we are trying to replace using query hints (Avg
> time 0.25s):
>
>
>
> EXPLAIN SELECT * FROM LOG INNER JOIN (SELECT TS,F,R FROM LOG WHERE
> LOWER(RQ) LIKE '/jquery%') AS A ON  (LOG.TS = A.TS AND LOG.F=A.F AND
> LOG.R=A.R);
>
> +------------------------------------------+
>
> | CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
>
> |     PARALLEL INNER-JOIN TABLE 0          |
>
> |         CLIENT 40-CHUNK PARALLEL 40-WAY RANGE SCAN OVER
> LOG_LOWER_REQUEST_IDX [0,'/jquery'] - [0,'/jquerz'] |
>
> |             SERVER FILTER BY FIRST KEY ONLY |
>
> |     DYNAMIC SERVER FILTER BY (LOG.TS, LOG.F, LOG.R) IN ((A.TS, A.F,
> A.R)) |
>
> +------------------------------------------+
>
>
>
> This is closer to the query we want - Hint on the non-functional index
> generates an expected, but non-optimal due to the ILIKE/function on RQ (avg
> time 0.78s)
>
> EXPLAIN SELECT /*+ INDEX(LOG LOG_REQUEST_IDX) */ * FROM LOG WHERE RQ ILIKE
> '/jquery%';
>
> +------------------------------------------+
>
> | CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
>
> |     SKIP-SCAN-JOIN TABLE 0               |
>
> |         CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG_REQUEST_IDX |
>
> |             SERVER FILTER BY FIRST KEY ONLY AND "RQ" LIKE '/jquery%' |
>
> |     DYNAMIC SERVER FILTER BY ("LOG.TS", "LOG.F", "LOG.R") IN
> (($707.$709, $707.$710, $707.$711)) |
>
> +------------------------------------------+
>
>
>
> This is what we REALLY want it to do, but the hint is ignored for the
> functional index (Avg 2.57s)
>
>  EXPLAIN SELECT /*+ INDEX(LOG LOG_LOWER_REQUEST_IDX) */ * FROM LOG WHERE
> LOWER(RQ) LIKE '/jquery%';
>
> +------------------------------------------+
>
> | CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
>
> |     SERVER FILTER BY LOWER(RQ) LIKE '/jquery%' |
>
> +------------------------------------------+
>
>
>
> Test table has 2.9 million records; production table is many orders of
> magnitude larger. Table is actually much wider than sample schema below
> (30+ cols) so INCLUDE() isn't viable.  Users want all the columns, all the
> time.
>
> Here’s a simplified schema for the table:
>
> CREATE TABLE IF NOT EXISTS LOG
>
> (
>
>     TS VARCHAR NOT NULL,
>
>     f VARCHAR NOT NULL,
>
>     r INTEGER NOT NULL,
>
>     sa VARCHAR,
>
>     da VARCHAR,
>
>     rq VARCHAR
>
> CONSTRAINT pkey PRIMARY KEY (TS, f, r)
>
> )
> TTL='5616000',KEEP_DELETED_CELLS='false',IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=40,MAX_FILESIZE='10000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
>
>
>
> CREATE INDEX IF NOT EXISTS LOG_LOWER_REQUEST_IDX  ON LOG(LOWER(rq))
> TTL='5616000',KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',MAX_FILESIZE='10000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
>
> CREATE INDEX IF NOT EXISTS LOG_REQUEST_IDX  ON LOG(rq)
> TTL='5616000',KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',MAX_FILESIZE='10000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
>
> Bryan G.
>
>
>

Mime
View raw message