phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Shen <wills...@marinsoftware.com>
Subject Re: "upsert select" with "limit" clause
Date Wed, 12 Dec 2018 23:51:02 GMT
Vincent,
Do we expect to see the same behavior with SELECT?

I observed the following. Not sure what about applying the limit is adding
to the time... especially since there is only one row, much less than the
actual LIMIT.

SELECT tb1."updBy" FROM "prod"."ADGROUPS" tb1 WHERE
("cstId","cltId","pubId","accId","cpgnId") IN
((612266,61623806,7,1736995,343308425));

+--------+

| updBy  |

+--------+

| null   |

+--------+

1 row selected (0.144 seconds)

 SELECT tb1."updBy" FROM "prod"."ADGROUPS" tb1 WHERE
("cstId","cltId","pubId","accId","cpgnId") IN
((612266,61623806,7,1736995,343308425)) LIMIT 10000;

+--------+

| updBy  |

+--------+

| null   |

+--------+

1 row selected (0.571 seconds)


EXPLAIN SELECT tb1."updBy" FROM "prod"."ADGROUPS" tb1 WHERE
("cstId","cltId","pubId","accId","cpgnId") IN
((612266,61623806,7,1736995,343308425));

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+-------------+

|
            PLAN
                            | EST_BYTES_READ  | EST_ROWS_RE |

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+-------------+

| CLIENT 255-CHUNK 0 ROWS 0 BYTES PARALLEL 255-WAY ROUND ROBIN RANGE SCAN
OVER prod.ADGROUPS [0,612266,61623806,7,1736995,343308425] -
[254,612266,61623806,7,1736995,343308425]  | 0               | 0           |

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+-------------+

1 row selected (0.042 seconds)


 EXPLAIN SELECT tb1."updBy" FROM "prod"."ADGROUPS" tb1 WHERE
("cstId","cltId","pubId","accId","cpgnId") IN
((612266,61623806,7,1736995,343308425)) LIMIT 10000;

+----------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+-------------+

|
    PLAN
          | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_T |

+----------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+-------------+

| CLIENT 255-CHUNK SERIAL 255-WAY ROUND ROBIN RANGE SCAN OVER prod.ADGROUPS
[0,612266,61623806,7,1736995,343308425] -
[254,612266,61623806,7,1736995,343308425]  | 80520000        | 10000
  | 15446585013 |

|     SERVER 10000 ROW LIMIT

          | 80520000        | 10000          | 15446585013 |

| CLIENT 10000 ROW LIMIT

          | 80520000        | 10000          | 15446585013 |

+----------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+-------------+

3 rows selected (0.117 seconds)

On Wed, Dec 12, 2018 at 3:34 PM Vincent Poon <vincentpoon@apache.org> wrote:

> I think it's done client-side if you have LIMIT.  If you have e.g. LIMIT
> 1000 , it would be incorrect for each regionserver to upsert 100, if you
> have more than one regionserver.  So instead results are sent back to the
> client, where the LIMIT is applied and then written back to the server in
> the UPSERT.
>
> On Wed, Dec 12, 2018 at 1:18 PM Shawn Li <shawnlijob@gmail.com> wrote:
>
>> Hi Vincent,
>>
>>
>>
>> The table creation statement is similar to below. We have about 200
>> fields. Table is mutable and don’t have any index on the table.
>>
>>
>>
>> CREATE TABLE IF NOT EXISTS us_population (
>>
>>       state CHAR(2) NOT NULL,
>>
>>       city VARCHAR,
>>
>>       population BIGINT,
>>
>>       …
>>
>>       CONSTRAINT my_pk PRIMARY KEY (state));
>>
>>
>>
>> Thanks,
>>
>> Shawn
>>
>>
>>
>> On Wed, Dec 12, 2018, 13:42 Vincent Poon <vincentpoon@apache.org wrote:
>>
>>> For #2, can you provide the table definition and the statement used?
>>> e.g. Is the table immutable, or does it have indexes?
>>>
>>> On Tue, Dec 11, 2018 at 6:08 PM Shawn/Xiang Li <shawnlijob@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> 1.       Want to check what is underlying running for limit clause
>>>> used in the following Upsert statement (is it involving any coprocessor
>>>> working behind?):
>>>>
>>>>
>>>>
>>>> *                                  upsert into table2 select * from
>>>> table1 limit 3000000; * (table 1 and table 2 have same schema)
>>>>
>>>>
>>>>
>>>>               The above statement is running a lot slower than without
>>>> “limit”  clause as shown in following, even the above statement upsert
less
>>>> data:
>>>>
>>>>
>>>>
>>>> *                                upsert into table2 select * from
>>>> table1;*
>>>>
>>>>
>>>>
>>>> 2.       We also observe memory usable is pretty high without the
>>>> limit clause (8gb vs 2gb), sometimes for big table it can reach 20gb
>>>> without using limit clause.  According to phoenix website description for
>>>> upsert select “If auto commit is on, and both a) the target table matches
>>>> the source table, and b) the select performs no aggregation, then the
>>>> population of the target table will be done completely on the server-side
>>>> (with constraint violations logged, but otherwise ignored).”
>>>>
>>>>
>>>>
>>>>                My question is If everything is done on server-side, how
>>>> come we have such high memory usage on the client machine?
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Shawn
>>>>
>>>

Mime
View raw message