phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ALEX K <alex.ka...@gmail.com>
Subject Re: How to do true batch updates in Phoenix
Date Wed, 19 Aug 2015 23:19:47 GMT
I'm using the same solution as Samarth suggested (commit batching), it
brings down latency per single row upsert from 50ms to 5ms (averaged after
batching)

On Wed, Aug 19, 2015 at 7:11 PM, Samarth Jain <samarth.jain@gmail.com>
wrote:

> You can do this via phoenix by doing something like this:
>
> try (Connection conn = DriverManager.getConnection(url)) {
> conn.setAutoCommit(false);
> int batchSize = 0;
> int commitSize = 1000; // number of rows you want to commit per batch.
> Change this value according to your needs.
> while (there are records to upsert) {
>      stmt.executeUpdate();
>      batchSize++;
>      if (batchSize % commitSize == 0) {
>           conn.commit();
>      }
> }
> conn.commit(); // commit the last batch of records
>
> You don't want commitSize to be too large since Phoenix client keeps the
> uncommitted rows in memory till they are sent over to HBase.
>
>
>
> On Wed, Aug 19, 2015 at 3:05 PM, Serega Sheypak <serega.sheypak@gmail.com>
> wrote:
>
>> I would suggest you to use
>>
>> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/BufferedMutator.html
>> instead of list of puts and share mutableBuffer across threads (it's
>> thread-safe). I reduced my response time from 30-40 ms to 4ms while using
>> buffferedmutator. It also sends mutations in async mode. :)
>>
>> I meet the same problem. Can't force Phoenix to buffer upserts on
>> client-side and then send them to HBase in small batches.
>>
>> 2015-08-19 19:40 GMT+02:00 jeremy p <athomewithagroovebox@gmail.com>:
>>
>>> Hello all,
>>>
>>> I need to do true batch updates to a Phoenix table.  By this, I mean
>>> sending a bunch of updates to HBase as part of a single request.  The HBase
>>> API offers this behavior with the Table.put(List<Put> puts) method.  I
>>> noticed PhoenixStatement exposes an executeBatch() method, however, this
>>> method just executes the batched statements one-by-one.  This will not
>>> deliver the performance that the HBase API exposes through their batch put
>>> method.
>>>
>>> What is the best way for me to do true batch updates to a Phoenix
>>> table?  I need to do this programmatically, so I cannot use the command
>>> line bulk insert utility.
>>>
>>> --Jeremy
>>>
>>
>>
>

Mime
View raw message