phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerald Sangudi <gsang...@23andme.com>
Subject Re: SORT_MERGE_JOIN on non-leading key: server-side sorting
Date Fri, 18 May 2018 00:20:46 GMT
Thanks Maryann and James.

I'll verify the RC as soon as it's available.

Gerald


On Thu, May 17, 2018 at 5:01 PM, James Taylor <jamestaylor@apache.org>
wrote:

> Hi Gerald,
> The fix for PHOENIX-4508 will appear in the 4.14.0 release which we're
> working on now. We should have a second RC up shortly that you can use to
> verify. The fix isn't in 4.13 since it was checked in after the release.
>
> Thanks,
> James
>
> On Thu, May 17, 2018 at 4:44 PM, Maryann Xue <maryann.xue@gmail.com>
> wrote:
>
>> Hey, this is a simple experiment I did based on an existing test case.
>> Look for "query3". The test verifies that there is no "SERVER SORT" in the
>> entire query plan (both salted and unsalted tables have been covered):
>>
>>
>> @Test
>> public void testBug4508() throws Exception {
>>     Properties props = PropertiesUtil.deepCopy(TEST_PROPERTIES);
>>     Connection conn = DriverManager.getConnection(getUrl(), props);
>>     props = PropertiesUtil.deepCopy(TEST_PROPERTIES);
>>     props.setProperty("TenantId", "010");
>>     Connection conn010 = DriverManager.getConnection(getUrl(), props);
>>     try {
>>         // Salted tables
>>         String peopleTable = generateUniqueName();
>>         String myTable = generateUniqueName();
>>         conn.createStatement().execute("CREATE TABLE " + peopleTable + " (\n" +
>>                 "PERSON_ID VARCHAR NOT NULL,\n" +
>>                 "NAME VARCHAR\n" +
>>                 "CONSTRAINT PK_TEST_PEOPLE PRIMARY KEY (PERSON_ID)) SALT_BUCKETS
= 3");
>>         conn.createStatement().execute("CREATE TABLE " + myTable + " (\n" +
>>                 "LOCALID VARCHAR NOT NULL,\n" +
>>                 "DSID VARCHAR(255) NOT NULL, \n" +
>>                 "EID CHAR(40),\n" +
>>                 "HAS_CANDIDATES BOOLEAN\n" +
>>                 "CONSTRAINT PK_MYTABLE PRIMARY KEY (LOCALID, DSID)) SALT_BUCKETS
= 3");
>>         verifyQueryPlanAndResultForBug4508(conn, peopleTable, myTable);
>>
>>         // Salted multi-tenant tables
>>         String peopleTable2 = generateUniqueName();
>>         String myTable2 = generateUniqueName();
>>         conn.createStatement().execute("CREATE TABLE " + peopleTable2 + " (\n" +
>>                 "TENANT_ID VARCHAR NOT NULL,\n" +
>>                 "PERSON_ID VARCHAR NOT NULL,\n" +
>>                 "NAME VARCHAR\n" +
>>                 "CONSTRAINT PK_TEST_PEOPLE PRIMARY KEY (TENANT_ID, PERSON_ID))\n"
+
>>                 "SALT_BUCKETS = 3, MULTI_TENANT=true");
>>         conn.createStatement().execute("CREATE TABLE " + myTable2 + " (\n" +
>>                 "TENANT_ID VARCHAR NOT NULL,\n" +
>>                 "LOCALID VARCHAR NOT NULL,\n" +
>>                 "DSID VARCHAR(255) NOT NULL, \n" +
>>                 "EID CHAR(40),\n" +
>>                 "HAS_CANDIDATES BOOLEAN\n" +
>>                 "CONSTRAINT PK_MYTABLE PRIMARY KEY (TENANT_ID, LOCALID, DSID))\n"
+
>>                 "SALT_BUCKETS = 3, MULTI_TENANT=true");
>>         verifyQueryPlanAndResultForBug4508(conn010, peopleTable2, myTable2);
>>     } finally {
>>         conn.close();
>>         conn010.close();
>>     }
>> }
>>
>>
>> private static void verifyQueryPlanAndResultForBug4508(
>> Connection conn, String peopleTable, String myTable) throws Exception {
>> PreparedStatement peopleTableUpsertStmt = conn.prepareStatement(
>> "UPSERT INTO " + peopleTable + " VALUES(?, ?)");
>> peopleTableUpsertStmt.setString(1, "X001");
>> peopleTableUpsertStmt.setString(2, "Marcus");
>> peopleTableUpsertStmt.execute();
>> peopleTableUpsertStmt.setString(1, "X002");
>> peopleTableUpsertStmt.setString(2, "Jenny");
>> peopleTableUpsertStmt.execute();
>> peopleTableUpsertStmt.setString(1, "X003");
>> peopleTableUpsertStmt.setString(2, "Seymour");
>> peopleTableUpsertStmt.execute();
>> conn.commit();
>>
>> PreparedStatement myTableUpsertStmt = conn.prepareStatement(
>> "UPSERT INTO " + myTable + " VALUES(?, ?, ?, ?)");
>> myTableUpsertStmt.setString(1, "X001");
>> myTableUpsertStmt.setString(2, "GROUP");
>> myTableUpsertStmt.setString(3, null);
>> myTableUpsertStmt.setBoolean(4, false);
>> myTableUpsertStmt.execute();
>> myTableUpsertStmt.setString(1, "X001");
>> myTableUpsertStmt.setString(2, "PEOPLE");
>> myTableUpsertStmt.setString(3, null);
>> myTableUpsertStmt.setBoolean(4, false);
>> myTableUpsertStmt.execute();
>> myTableUpsertStmt.setString(1, "X003");
>> myTableUpsertStmt.setString(2, "PEOPLE");
>> myTableUpsertStmt.setString(3, null);
>> myTableUpsertStmt.setBoolean(4, false);
>> myTableUpsertStmt.execute();
>> myTableUpsertStmt.setString(1, "X002");
>> myTableUpsertStmt.setString(2, "PEOPLE");
>> myTableUpsertStmt.setString(3, "Z990");
>> myTableUpsertStmt.setBoolean(4, false);
>> myTableUpsertStmt.execute();
>> conn.commit();
>>
>> String query1 = "SELECT /*+ USE_SORT_MERGE_JOIN*/ COUNT(*)\n" +
>> "FROM " + peopleTable + " ds JOIN " + myTable + " l\n" +
>> "ON ds.PERSON_ID = l.LOCALID\n" +
>> "WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = FALSE";
>> String query2 = "SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)\n" +
>> "FROM (SELECT LOCALID FROM " + myTable + "\n" +
>> "WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l\n" +
>> "JOIN " + peopleTable + " ds ON ds.PERSON_ID = l.LOCALID";
>> String query3 = "SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)\n" +
>> "FROM " + myTable + " t1 JOIN " + myTable + " t2\n" +
>> "ON t1.DSID = t2.DSID\n" +
>> "WHERE t1.LOCALID = 'X001' AND t2.LOCALID = 'X002'";
>>
>> for (String q : new String[]{query1, query2, query3}) {
>> ResultSet rs = conn.createStatement().executeQuery("explain " + q);
>> String plan = QueryUtil.getExplainPlan(rs);
>> assertFalse("Tables should not be sorted over their PKs:\n" + plan,
>> plan.contains("SERVER SORTED BY"));
>>
>> rs = conn.createStatement().executeQuery(q);
>> assertTrue(rs.next());
>> //assertEquals(2, rs.getInt(1));
>> assertFalse(rs.next());
>> }
>> }
>>
>>
>>
>> On Thu, May 17, 2018 at 3:48 PM, Gerald Sangudi <gsangudi@23andme.com>
>> wrote:
>>
>>> Hi Maryann,
>>>
>>> Would you mind sharing the EXPLAIN output you are getting?
>>>
>>> I'm not able to find a download for phoenix-4.13.2-HBase or
>>> 4.14.0-HBase. The *-cdh downloads do not work with Amazon EMR. I tried
>>> building against the 4.14.0-rc0 tag. This produced 4.14.0 for
>>> phoenix-core.jar, but all the other jars produced are 4.13.1, including the
>>> client jar. When I deploy these jars on EMR and run queries via sqlline, I
>>> get a NoClassDefFoundError.
>>>
>>> Are the plans to provide 4.13.2-HBase as a public download?
>>>
>>> Thanks,
>>> Gerald
>>>
>>> On Wed, May 16, 2018 at 10:40 AM, Maryann Xue <maryann.xue@gmail.com>
>>> wrote:
>>>
>>>> Hi Gerald,
>>>>
>>>> I checked again. Unfortunately this fix is included with 4.13.2 but not
>>>> 4.13.1. Would you mind upgrading your library to 4.13.2?
>>>>
>>>> Thanks,
>>>> Maryann
>>>>
>>>> On Wed, May 16, 2018 at 9:41 AM, Maryann Xue <maryann.xue@gmail.com>
>>>> wrote:
>>>>
>>>>> Sorry for the late response. Yes, sure, I will try it right away.
>>>>>
>>>>> On Wed, May 16, 2018 at 9:40 AM Gerald Sangudi <gsangudi@23andme.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Maryann,
>>>>>>
>>>>>> Following up again -- I ran EXPLAIN on 4.13.1 and saw the same
>>>>>> behavior, a sort on the RHS. Is it possible for you to try it on
4.13.1?
>>>>>>
>>>>>> Thanks,
>>>>>> Gerald
>>>>>>
>>>>>> On Fri, May 11, 2018 at 10:52 AM, Gerald Sangudi <
>>>>>> gsangudi@23andme.com> wrote:
>>>>>>
>>>>>>> Hi Maryann,
>>>>>>>
>>>>>>> Thanks for verifying against latest. However, I did not detect
the
>>>>>>> fix in Phoenix 4.13.1. AWS EMR currently provides Phoenix 4.13.0.
I
>>>>>>> manually upgraded an AWS EMR cluster to Phoenix 4.13.1 and rebooted
the
>>>>>>> cluster. The EXPLAIN plan still shows the same issue.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Gerald
>>>>>>>
>>>>>>> On Wed, May 9, 2018 at 11:51 AM, Maryann Xue <maryann.xue@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Gerald,
>>>>>>>>
>>>>>>>> I have verified against latest Phoenix code that this problem
has
>>>>>>>> been fixed. I have also checked Phoenix 4.13 release tags.
Looks like all
>>>>>>>> versions of 4.13 packages now include that fix. Would you
mind getting the
>>>>>>>> latest Phoenix-4.13 package and testing it again? Thank you!
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Maryann
>>>>>>>>
>>>>>>>> On Tue, May 8, 2018 at 12:24 PM, Gerald Sangudi <
>>>>>>>> gsangudi@23andme.com> wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I'm running Phoenix 4.13 on AWS EMR and getting the following
>>>>>>>>> EXPLAIN plan:
>>>>>>>>>
>>>>>>>>> Table:
>>>>>>>>>
>>>>>>>>> *CREATE TABLE salted (       keyA BIGINT NOT NULL,  
    keyB
>>>>>>>>> BIGINT NOT NULL,       val SMALLINT,       CONSTRAINT
pk PRIMARY KEY (keyA,
>>>>>>>>> keyB))SALT_BUCKETS = 64;*
>>>>>>>>> EXPLAIN:
>>>>>>>>>
>>>>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */COUNT(*) cFROM
salted t1
>>>>>>>>> JOIN salted t2ON (t1.keyB = t2.keyB)WHERE t1.keyA = 10AND
t2.keyA = 20;*
>>>>>>>>>
>>>>>>>>> *+------------------------------------------------------------------------------+-----------------+-+|
>>>>>>>>>                                     PLAN   | EST_BYTES_READ
|
>>>>>>>>> |+------------------------------------------------------------------------------+-----------------+-+|
>>>>>>>>> SORT-MERGE-JOIN (INNER) TABLES
>>>>>>>>>                                               | null
| ||     CLIENT
>>>>>>>>> 64-CHUNK PARALLEL 64-WAY RANGE SCAN OVER SALTED [0,10]
- [63,10]  | null |
>>>>>>>>> ||         SERVER FILTER BY FIRST KEY ONLY
>>>>>>>>>                                      | null | ||    
CLIENT MERGE SORT
>>>>>>>>>                                                     
  | null | || AND
>>>>>>>>> (SKIP MERGE)                                        
                    |
>>>>>>>>> null | ||     CLIENT 64-CHUNK PARALLEL 64-WAY RANGE SCAN
OVER SALTED [0,20]
>>>>>>>>> - [63,20]  | null | ||         SERVER FILTER BY FIRST
KEY ONLY
>>>>>>>>>                                      | null | ||    
    SERVER SORTED BY
>>>>>>>>> [T2.KEYB]                                           |
null | ||     CLIENT
>>>>>>>>> MERGE SORT                                          
             | null |
>>>>>>>>> || CLIENT AGGREGATE INTO SINGLE ROW
>>>>>>>>>                                             | null |
>>>>>>>>> |+------------------------------------------------------------------------------+-----------------+-+*
>>>>>>>>>
>>>>>>>>> In the EXPLAIN output, I see "SERVER SORTED BY [T2.KEYB]".
Is this
>>>>>>>>> sort necessary? For both JOIN terms T1 and T2, the value
of keyA, the
>>>>>>>>> leading part of the primary key, is fixed. Furthermore,
there is no
>>>>>>>>> corresponding sort of T1.KEYB.
>>>>>>>>>
>>>>>>>>> When I EXPLAIN the same query on a non-salted table,
neither
>>>>>>>>> T1.KEYB nor T2.KEYB is re-sorted. I'm hoping the sort
is unnecessary. If
>>>>>>>>> so, is there an open ticket? I would be happy to file
a ticket and to
>>>>>>>>> contribute to a fix. I would appreciate any guidance.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Gerald
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>

Mime
View raw message