phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jaanai Zhang <cloud.pos...@gmail.com>
Subject Re: "upsert select" with "limit" clause
Date Thu, 13 Dec 2018 08:15:19 GMT
Shawn,

The UPSERT SELECT will run in a coprocessor on if it hasn't limit clause,
only query 1 table, the query is doing aggregation, no sequences and auto
commit is on. Please check your SQL ... and you can also check whether some
resources have not been released.

----------------------------------------
   Jaanai Zhang
   Best regards!



Shawn Li <shawnlijob@gmail.com> 于2018年12月13日周四 下午12:10写道:

> Hi Jaanai,
>
> Thanks for putting your thought. The behavior you describe is correct on
> the Hbase region sever side. The memory usage for blockcache and memstore
> will be high under such high throughput. But our phoenix client is on a
> gateway machine (no hbase region server sitting on it or any Hbase service
> on it), so not sure how to explain such high memory usage for upsert select
> without "limit" clause. The high memory usage behavior like all select
> results send to client machine, cached in client machine's memory, and then
> insert back to target table, which is not like the behavior that should
> happen, all of this should be done on the server side as the table schema
> is exactly the same. By the way, this happens on both Phoenix 4.7 and
> Phoenix 4.14.
>
>
> Thanks,
> Shawn
>
> On Wed, Dec 12, 2018 at 10:26 PM Jaanai Zhang <cloud.poster@gmail.com>
> wrote:
>
>> Shawn,
>>
>>
>> For the upsert without limit,  which will read the source table and write
>> the target tables on the server side.  I think the higher memory usage is
>> caused by using scan cache and memstore under the higher throughput.
>>
>> ----------------------------------------
>>    Jaanai Zhang
>>    Best regards!
>>
>>
>>
>> Shawn Li <shawnlijob@gmail.com> 于2018年12月13日周四 上午10:13写道:
>>
>>> Hi Vincent,
>>>
>>> So you describe limit will sent result to client side and then write to
>>> server, this might explain why upsert with limit is slower than without
>>> limit. But looks like it can't explain the memory usage? The memory usage
>>> on client machine is 8gb (without "limit") vs 2gb (with limit), sometime
>>> upsert without "limit" can even reach 20gb for big table.
>>>
>>> Thanks,
>>> Shawn
>>>
>>> On Wed, Dec 12, 2018 at 6:34 PM Vincent Poon <vincentpoon@apache.org>
>>> wrote:
>>>
>>>> I think it's done client-side if you have LIMIT.  If you have e.g.
>>>> LIMIT 1000 , it would be incorrect for each regionserver to upsert 100, if
>>>> you have more than one regionserver.  So instead results are sent back to
>>>> the client, where the LIMIT is applied and then written back to the server
>>>> in the UPSERT.
>>>>
>>>> On Wed, Dec 12, 2018 at 1:18 PM Shawn Li <shawnlijob@gmail.com> wrote:
>>>>
>>>>> Hi Vincent,
>>>>>
>>>>>
>>>>>
>>>>> The table creation statement is similar to below. We have about 200
>>>>> fields. Table is mutable and don’t have any index on the table.
>>>>>
>>>>>
>>>>>
>>>>> CREATE TABLE IF NOT EXISTS us_population (
>>>>>
>>>>>       state CHAR(2) NOT NULL,
>>>>>
>>>>>       city VARCHAR,
>>>>>
>>>>>       population BIGINT,
>>>>>
>>>>>       …
>>>>>
>>>>>       CONSTRAINT my_pk PRIMARY KEY (state));
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Shawn
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Dec 12, 2018, 13:42 Vincent Poon <vincentpoon@apache.org
>>>>> wrote:
>>>>>
>>>>>> For #2, can you provide the table definition and the statement used?
>>>>>> e.g. Is the table immutable, or does it have indexes?
>>>>>>
>>>>>> On Tue, Dec 11, 2018 at 6:08 PM Shawn/Xiang Li <shawnlijob@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 1.       Want to check what is underlying running for limit clause
>>>>>>> used in the following Upsert statement (is it involving any coprocessor
>>>>>>> working behind?):
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *                                  upsert into table2 select
* from
>>>>>>> table1 limit 3000000; * (table 1 and table 2 have same schema)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               The above statement is running a lot slower than
>>>>>>> without “limit”  clause as shown in following, even the above
statement
>>>>>>> upsert less data:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *                                upsert into table2 select *
from
>>>>>>> table1;*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2.       We also observe memory usable is pretty high without
the
>>>>>>> limit clause (8gb vs 2gb), sometimes for big table it can reach
20gb
>>>>>>> without using limit clause.  According to phoenix website description
for
>>>>>>> upsert select “If auto commit is on, and both a) the target
table matches
>>>>>>> the source table, and b) the select performs no aggregation,
then the
>>>>>>> population of the target table will be done completely on the
server-side
>>>>>>> (with constraint violations logged, but otherwise ignored).”
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                My question is If everything is done on server-side,
>>>>>>> how come we have such high memory usage on the client machine?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Shawn
>>>>>>>
>>>>>>

Mime
View raw message