phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hieu Nguyen <h...@box.com>
Subject Filtering on top of hash join fails
Date Tue, 09 Jul 2019 04:49:17 GMT
Hi,

I am attempting to join two relations where the LHS has many more rows than
the RHS (LHS = >5m, RHS = 70k).  The relations are joined on the PK of the
LHS with a unique foreign key from the RHS, so that rows from the RHS are
matched one-to-one with the LHS.  I found that the Foreign Key to Primary
Key Join Optimization (https://phoenix.apache.org/joins.html) works well if
there are no additional filters applied to the LHS, returning in 1-2
seconds where a full scan against LHS would take about 5 seconds.  However,
adding even a non-selective filter to the LHS (e.g. col IS NOT NULL on a
column with non-null values) will inexplicably cause the query to take long
enough to time out the hash join cache (30 seconds).  I would have expected
that the additional filter would have been applied on top of the skip-scan
against the LHS, thus only needing to examine about 70k rows of the LHS and
returning fairly quickly in about the same amount of time.  The hash join
cache usage should be the same for the failing query as the successful 1-2
second query because the RHS relation hasn't changed.

The easiest way to reproduce is to run "./performance.py localhost 5000000"
to create a table with 5m rows, then issue a query to join the table on
itself.  In reality, we'd have two different tables for LHS and RHS, but
these repro steps still illustrate the issue.

For reference, this is the CREATE TABLE statement generated by the
performance.py script:
CREATE TABLE IF NOT EXISTS PERFORMANCE_5000000 (
HOST CHAR(2) NOT NULL,
DOMAIN VARCHAR NOT NULL,
FEATURE VARCHAR NOT NULL,
DATE DATE NOT NULL,
USAGE.CORE BIGINT,
USAGE.DB BIGINT,
STATS.ACTIVE_VISITOR INTEGER
CONSTRAINT PK PRIMARY KEY (HOST, DOMAIN, FEATURE, DATE))
SPLIT ON ('CSGoogle','CSSalesforce','EUApple','EUGoogle','EUSalesforce',
'NAApple','NAGoogle','NASalesforce');

// returns in 1.6 seconds
SELECT PERFORMANCE_5000000.*
FROM PERFORMANCE_5000000  // LHS table has 5m rows
JOIN (
SELECT HOST, DOMAIN, FEATURE, DATE FROM PERFORMANCE_5000000 WHERE HOST =
'EU' AND DOMAIN = 'Salesforce.com' AND DATE <= TO_DATE('2019-07-16') // RHS
relation matches about 70k rows
) AS I ON (
PERFORMANCE_5000000.HOST = I.HOST AND
PERFORMANCE_5000000.DOMAIN = I.DOMAIN AND
PERFORMANCE_5000000.FEATURE = I.FEATURE AND
PERFORMANCE_5000000.DATE = I.DATE
)
LIMIT 100;

// Explain
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|
               PLAN
                                 | EST_BYTES_READ  | EST_ROWS_READ  |
 EST_INFO_TS   |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 9-CHUNK 1191563 ROWS 314572800 BYTES PARALLEL 1-WAY ROUND ROBIN
FULL SCAN OVER PERFORMANCE_5000000
                                     | 314572800       | 1191563        |
1562628632646  |
| CLIENT 100 ROW LIMIT

                                  | 314572800       | 1191563        |
1562628632646  |
|     PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE)

                                  | 314572800       | 1191563        |
1562628632646  |
|         CLIENT 1-CHUNK 1191563 ROWS 314572800 BYTES PARALLEL 1-WAY ROUND
ROBIN RANGE SCAN OVER PERFORMANCE_5000000 ['EU','Salesforce.com']
                                  | 314572800       | 1191563        |
1562628632646  |
|             SERVER FILTER BY FIRST KEY ONLY AND DATE <= DATE '2019-07-16
00:00:00.000'
                                  | 314572800       | 1191563        |
1562628632646  |
|     DYNAMIC SERVER FILTER BY (PERFORMANCE_5000000.HOST,
PERFORMANCE_5000000.DOMAIN, PERFORMANCE_5000000.FEATURE,
PERFORMANCE_5000000.DATE) IN ((I.HOST, I.DOMAIN, I.FEATURE, I.DATE))  |
314572800       | 1191563        | 1562628632646  |
|     JOIN-SCANNER 100 ROW LIMIT

                                  | 314572800       | 1191563        |
1562628632646  |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+

// Fails after 30 sec with "Hash Join cache not found joinId:
993866207794942179. The cache might have expired and have been removed."
SELECT PERFORMANCE_5000000.*
FROM PERFORMANCE_5000000
JOIN (
SELECT HOST, DOMAIN, FEATURE, DATE FROM PERFORMANCE_5000000 WHERE HOST =
'EU' AND DOMAIN = 'Salesforce.com' AND DATE <= TO_DATE('2019-07-16')
) AS I ON (
PERFORMANCE_5000000.HOST = I.HOST AND
PERFORMANCE_5000000.DOMAIN = I.DOMAIN AND
PERFORMANCE_5000000.FEATURE = I.FEATURE AND
PERFORMANCE_5000000.DATE = I.DATE
)
WHERE CORE IS NOT NULL // Filter applied to LHS. Expecting for it to filter
the ~70k rows found via skip-scan
LIMIT 100;

// Explain (only difference is the additional SERVER FILTER BY USAGE.CORE
IS NOT NULL at line 2)
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|
               PLAN
                                 | EST_BYTES_READ  | EST_ROWS_READ  |
 EST_INFO_TS   |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 9-CHUNK 1191563 ROWS 314572800 BYTES PARALLEL 1-WAY ROUND ROBIN
FULL SCAN OVER PERFORMANCE_5000000
                                     | 314572800       | 1191563        |
1562628632646  |
|     SERVER FILTER BY USAGE.CORE IS NOT NULL

                                 | 314572800       | 1191563        |
1562628632646  |
| CLIENT 100 ROW LIMIT

                                  | 314572800       | 1191563        |
1562628632646  |
|     PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE)

                                  | 314572800       | 1191563        |
1562628632646  |
|         CLIENT 1-CHUNK 1191563 ROWS 314572800 BYTES PARALLEL 1-WAY ROUND
ROBIN RANGE SCAN OVER PERFORMANCE_5000000 ['EU','Salesforce.com']
                                  | 314572800       | 1191563        |
1562628632646  |
|             SERVER FILTER BY FIRST KEY ONLY AND DATE <= DATE '2019-07-16
00:00:00.000'
                                  | 314572800       | 1191563        |
1562628632646  |
|     DYNAMIC SERVER FILTER BY (PERFORMANCE_5000000.HOST,
PERFORMANCE_5000000.DOMAIN, PERFORMANCE_5000000.FEATURE,
PERFORMANCE_5000000.DATE) IN ((I.HOST, I.DOMAIN, I.FEATURE, I.DATE))  |
314572800       | 1191563        | 1562628632646  |
|     JOIN-SCANNER 100 ROW LIMIT

                                  | 314572800       | 1191563        |
1562628632646  |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+

Interestingly, if I change the DATE filter to be more selective (so that
RHS relation has only 200 rows), then both queries return quickly (< 1sec)
with no change in the EXPLAIN plan.  Thus, the RHS needs to be large enough
to trigger this issue.

I tried using the USE_SORT_MERGE_JOIN hint, which made the queries succeed.
But it also made queries much slower.  It makes sense to me why this
happens - because both relations (large and small) need to be streamed to
the client for merging, and as far as I can tell, the sort-merge join does
not take advantage of the skip-scan optimization on the LHS that the
regular hash-join does.  So this probably won't work for our
latency-sensitive use-case.

Any possible explanations on the drastic difference in query execution time
by just adding a filter to the larger relation in a hash join?

Thanks for your help,
-Hieu

Mime
View raw message