phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jaanai Zhang <cloud.pos...@gmail.com>
Subject Re: "upsert select" with "limit" clause
Date Thu, 13 Dec 2018 03:25:43 GMT
Shawn,


For the upsert without limit,  which will read the source table and write
the target tables on the server side.  I think the higher memory usage is
caused by using scan cache and memstore under the higher throughput.

----------------------------------------
   Jaanai Zhang
   Best regards!



Shawn Li <shawnlijob@gmail.com> 于2018年12月13日周四 上午10:13写道:

> Hi Vincent,
>
> So you describe limit will sent result to client side and then write to
> server, this might explain why upsert with limit is slower than without
> limit. But looks like it can't explain the memory usage? The memory usage
> on client machine is 8gb (without "limit") vs 2gb (with limit), sometime
> upsert without "limit" can even reach 20gb for big table.
>
> Thanks,
> Shawn
>
> On Wed, Dec 12, 2018 at 6:34 PM Vincent Poon <vincentpoon@apache.org>
> wrote:
>
>> I think it's done client-side if you have LIMIT.  If you have e.g. LIMIT
>> 1000 , it would be incorrect for each regionserver to upsert 100, if you
>> have more than one regionserver.  So instead results are sent back to the
>> client, where the LIMIT is applied and then written back to the server in
>> the UPSERT.
>>
>> On Wed, Dec 12, 2018 at 1:18 PM Shawn Li <shawnlijob@gmail.com> wrote:
>>
>>> Hi Vincent,
>>>
>>>
>>>
>>> The table creation statement is similar to below. We have about 200
>>> fields. Table is mutable and don’t have any index on the table.
>>>
>>>
>>>
>>> CREATE TABLE IF NOT EXISTS us_population (
>>>
>>>       state CHAR(2) NOT NULL,
>>>
>>>       city VARCHAR,
>>>
>>>       population BIGINT,
>>>
>>>       …
>>>
>>>       CONSTRAINT my_pk PRIMARY KEY (state));
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Shawn
>>>
>>>
>>>
>>> On Wed, Dec 12, 2018, 13:42 Vincent Poon <vincentpoon@apache.org wrote:
>>>
>>>> For #2, can you provide the table definition and the statement used?
>>>> e.g. Is the table immutable, or does it have indexes?
>>>>
>>>> On Tue, Dec 11, 2018 at 6:08 PM Shawn/Xiang Li <shawnlijob@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> 1.       Want to check what is underlying running for limit clause
>>>>> used in the following Upsert statement (is it involving any coprocessor
>>>>> working behind?):
>>>>>
>>>>>
>>>>>
>>>>> *                                  upsert into table2 select * from
>>>>> table1 limit 3000000; * (table 1 and table 2 have same schema)
>>>>>
>>>>>
>>>>>
>>>>>               The above statement is running a lot slower than without
>>>>> “limit”  clause as shown in following, even the above statement upsert
less
>>>>> data:
>>>>>
>>>>>
>>>>>
>>>>> *                                upsert into table2 select * from
>>>>> table1;*
>>>>>
>>>>>
>>>>>
>>>>> 2.       We also observe memory usable is pretty high without the
>>>>> limit clause (8gb vs 2gb), sometimes for big table it can reach 20gb
>>>>> without using limit clause.  According to phoenix website description
for
>>>>> upsert select “If auto commit is on, and both a) the target table matches
>>>>> the source table, and b) the select performs no aggregation, then the
>>>>> population of the target table will be done completely on the server-side
>>>>> (with constraint violations logged, but otherwise ignored).”
>>>>>
>>>>>
>>>>>
>>>>>                My question is If everything is done on server-side,
>>>>> how come we have such high memory usage on the client machine?
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Shawn
>>>>>
>>>>

Mime
View raw message