phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maryann Xue <maryann....@gmail.com>
Subject Re: Query performance question
Date Fri, 12 Dec 2014 16:07:55 GMT
Hi Ralph,

Thanks for the question!
According to the "explain" result you got, the optimization worked exactly
as expected with this query:

"DYNAMIC SERVER FILTER BY FILE_ID IN (SS.FILE_ID)" means a skip-scan
instead of a full-scan over BULK_TABLE will be executed at runtime based on
the values of "file_id" it got from the inner query.

So I need to know a few more things:
1. What is the Primary Key definition of your BULK_TABLE?
2. How many (approximately) distinct "file_id" values are there in the
BULK_TABLE? (If you don't know for sure, you can just run a query to find
out).
3. How long does it take to run a full-scan query on BULK_TABLE, like
"select * from BULK_TABLE"?
4. How long does it take to run a full-scan join query on BULK_TABLE, like
"select * from BULK_TABLE join (select file_id, recnum from BULK_TABLE) as
SS on BULK_TABLE.file_id = SS.file_id and BULK_TABLE.recnum = SS.recnum"?


Thanks,
Maryann


On Thu, Dec 11, 2014 at 6:28 PM, Perko, Ralph J <Ralph.Perko@pnnl.gov>
wrote:
>
>  Hi,
>
>  Thanks for all your help thus far with Phoenix.
>
>  I am trying to understand the best way to construct a query that returns
> all the fields from a table but still takes advantage of a single field
> secondary index.   I have a table with upwards of 50 fields and do not wish
> to index them all but the use case exists to return them all.
>
>  My general approach is to first select the records I want using an
> indexed field then use the returned pk values to get the entire record in
> the form of a subselect or join.
>
>  The initial select executes very fast, sub-second , returning close to
> 3000 records.  When used as a subselect or join the entire query takes very
> long (over 15min)  or does not return.   Based on the processing plans it
> appears there is a lot more going on than just a simple look-up of the
> values returned in the subselect.  Is there a way to do this using Phoenix
> SQL syntax?
>
>  Any suggestions are appreciated.
>
>  Initial indexed query (very fast):
>
>  SELECT file_id,recnum
>
> FROM BULK_TABLE
>
> WHERE saddr IN (ip1,ip2,ip3))
>
>
>  file_id and recnum make up the primary key
>
>
>  Plan:
>
> | CLIENT PARALLEL 10-WAY SKIP SCAN ON 30 KEYS OVER BULK_TABLE_SADDR_IDX
> [0,1,000,004,076] - [9,1,000,142,114] |
>
> | CLIENT MERGE SORT |
>
>
>   Used as a subselect (times out):
>
>   SELECT * FROM BULK_TABLE
>
> WHERE (file_id,recnum) IN(SELECT file_id,recnum
>
>               FROM BULK_TABLE
>
>       WHERE saddr IN (ip1,ip2,ip3));
>
> Plan:
>
> | CLIENT PARALLEL 10-WAY FULL SCAN OVER BULK_TABLE |
>
> | CLIENT MERGE SORT |
>
> |     PARALLEL SEMI-JOIN TABLE 0 (SKIP MERGE) |
>
> |         CLIENT PARALLEL 10-WAY SKIP SCAN ON 30 KEYS
> OVER BULK_TABLE_SADDR_IDX [0,1,000,004,076] - [9,1,000,142,114] |
>
> |             SERVER AGGREGATE INTO DISTINCT ROWS BY [RECNUM, FILE_ID] |
>
> |         CLIENT MERGE SORT |
>
> |     DYNAMIC SERVER FILTER BY FILE_ID IN ($1.$3) |
>
>
>  Another approach using using a join instead:
>
>
>  SELECT *
>
> FROM BULK_TABLE
>
> JOIN
>
>     (SELECT file_id, recnum
>
>      FROM BULK_TABLE
>
>      WHERE saddr in (ip1,ip2,ip3)) AS SS
>
> ON BULK_TABLE.file_id = SS.file_id AND BULK_TABLE.recnum = SS.recnum;
>
>
>  Runs faster but still can take about 15min
>
>
>  Plan:
>
>
>  | CLIENT PARALLEL 10-WAY FULL SCAN OVER BULK_TABLE |
>
> | CLIENT MERGE SORT |
>
> |     PARALLEL INNER-JOIN TABLE 0 |
>
> |         CLIENT PARALLEL 10-WAY SKIP SCAN ON 30 KEYS
> OVER BULK_TABLE_SADDR_IDX [0,1,000,004,076] - [9,1,000,142,114] |
>
> |         CLIENT MERGE SORT |
>
> |     DYNAMIC SERVER FILTER BY FILE_ID IN (SS.FILE_ID) |
>
>
>
>   Is there a more efficient way to run a query such as this?
>
>  Thanks!
>  Ralph
>
>
>
>
>
>

-- 
Thanks,
Maryann

Mime
View raw message