phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jeremy p <athomewithagroove...@gmail.com>
Subject Re: How to do true batch updates in Phoenix
Date Fri, 21 Aug 2015 17:50:26 GMT
Thank you!  Samarth solution looks like it'll work for me.  One question :
you mentioned that the Phoenix client keeps uncommitted rows in memory
until they're sent over to HBase.  When we call conn.commit() does that
send the rows over to HBase immediately?

--Jeremy

On Wed, Aug 19, 2015 at 7:19 PM, ALEX K <alex.kamil@gmail.com> wrote:

> I'm using the same solution as Samarth suggested (commit batching), it
> brings down latency per single row upsert from 50ms to 5ms (averaged after
> batching)
>
> On Wed, Aug 19, 2015 at 7:11 PM, Samarth Jain <samarth.jain@gmail.com>
> wrote:
>
>> You can do this via phoenix by doing something like this:
>>
>> try (Connection conn = DriverManager.getConnection(url)) {
>> conn.setAutoCommit(false);
>> int batchSize = 0;
>> int commitSize = 1000; // number of rows you want to commit per batch.
>> Change this value according to your needs.
>> while (there are records to upsert) {
>>      stmt.executeUpdate();
>>      batchSize++;
>>      if (batchSize % commitSize == 0) {
>>           conn.commit();
>>      }
>> }
>> conn.commit(); // commit the last batch of records
>>
>> You don't want commitSize to be too large since Phoenix client keeps the
>> uncommitted rows in memory till they are sent over to HBase.
>>
>>
>>
>> On Wed, Aug 19, 2015 at 3:05 PM, Serega Sheypak <serega.sheypak@gmail.com
>> > wrote:
>>
>>> I would suggest you to use
>>>
>>> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/BufferedMutator.html
>>> instead of list of puts and share mutableBuffer across threads (it's
>>> thread-safe). I reduced my response time from 30-40 ms to 4ms while using
>>> buffferedmutator. It also sends mutations in async mode. :)
>>>
>>> I meet the same problem. Can't force Phoenix to buffer upserts on
>>> client-side and then send them to HBase in small batches.
>>>
>>> 2015-08-19 19:40 GMT+02:00 jeremy p <athomewithagroovebox@gmail.com>:
>>>
>>>> Hello all,
>>>>
>>>> I need to do true batch updates to a Phoenix table.  By this, I mean
>>>> sending a bunch of updates to HBase as part of a single request.  The HBase
>>>> API offers this behavior with the Table.put(List<Put> puts) method.
 I
>>>> noticed PhoenixStatement exposes an executeBatch() method, however, this
>>>> method just executes the batched statements one-by-one.  This will not
>>>> deliver the performance that the HBase API exposes through their batch put
>>>> method.
>>>>
>>>> What is the best way for me to do true batch updates to a Phoenix
>>>> table?  I need to do this programmatically, so I cannot use the command
>>>> line bulk insert utility.
>>>>
>>>> --Jeremy
>>>>
>>>
>>>
>>
>

Mime
View raw message