phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Perko, Ralph J" <Ralph.Pe...@pnnl.gov>
Subject Query performance question
Date Thu, 11 Dec 2014 23:28:54 GMT
Hi,

Thanks for all your help thus far with Phoenix.

I am trying to understand the best way to construct a query that returns all the fields from
a table but still takes advantage of a single field secondary index.   I have a table with
upwards of 50 fields and do not wish to index them all but the use case exists to return them
all.

My general approach is to first select the records I want using an indexed field then use
the returned pk values to get the entire record in the form of a subselect or join.

The initial select executes very fast, sub-second , returning close to 3000 records.  When
used as a subselect or join the entire query takes very long (over 15min)  or does not return.
  Based on the processing plans it appears there is a lot more going on than just a simple
look-up of the values returned in the subselect.  Is there a way to do this using Phoenix
SQL syntax?

Any suggestions are appreciated.

Initial indexed query (very fast):

SELECT file_id,recnum
FROM BULK_TABLE
WHERE saddr IN (ip1,ip2,ip3))

file_id and recnum make up the primary key

Plan:
| CLIENT PARALLEL 10-WAY SKIP SCAN ON 30 KEYS OVER BULK_TABLE_SADDR_IDX [0,1,000,004,076]
- [9,1,000,142,114] |
| CLIENT MERGE SORT |

Used as a subselect (times out):

SELECT * FROM BULK_TABLE
WHERE (file_id,recnum) IN(SELECT file_id,recnum
              FROM BULK_TABLE
      WHERE saddr IN (ip1,ip2,ip3));
Plan:
| CLIENT PARALLEL 10-WAY FULL SCAN OVER BULK_TABLE |
| CLIENT MERGE SORT |
|     PARALLEL SEMI-JOIN TABLE 0 (SKIP MERGE) |
|         CLIENT PARALLEL 10-WAY SKIP SCAN ON 30 KEYS OVER BULK_TABLE_SADDR_IDX [0,1,000,004,076]
- [9,1,000,142,114] |
|             SERVER AGGREGATE INTO DISTINCT ROWS BY [RECNUM, FILE_ID] |
|         CLIENT MERGE SORT |
|     DYNAMIC SERVER FILTER BY FILE_ID IN ($1.$3) |

Another approach using using a join instead:

SELECT *
FROM BULK_TABLE
JOIN
    (SELECT file_id, recnum
     FROM BULK_TABLE
     WHERE saddr in (ip1,ip2,ip3)) AS SS
ON BULK_TABLE.file_id = SS.file_id AND BULK_TABLE.recnum = SS.recnum;

Runs faster but still can take about 15min

Plan:

| CLIENT PARALLEL 10-WAY FULL SCAN OVER BULK_TABLE |
| CLIENT MERGE SORT |
|     PARALLEL INNER-JOIN TABLE 0 |
|         CLIENT PARALLEL 10-WAY SKIP SCAN ON 30 KEYS OVER BULK_TABLE_SADDR_IDX [0,1,000,004,076]
- [9,1,000,142,114] |
|         CLIENT MERGE SORT |
|     DYNAMIC SERVER FILTER BY FILE_ID IN (SS.FILE_ID) |


Is there a more efficient way to run a query such as this?

Thanks!
Ralph






Mime
View raw message