phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hef <hef.onl...@gmail.com>
Subject Re: How to speed up write performance
Date Fri, 08 Sep 2017 06:57:24 GMT
Hi James,
I have read over the Tuning Guide, and tried some of your suggestions: #3,
#5, #6. Since the date is mutable, and read/write frequently, I did not try
#1, #2, #4.
The schema is simple as such:

*create table if not exists test_data (*
*  id VARCHAR(32),*
*  sid VARCHAR(32),*
*  uid UNSIGNED_LONG,*
*  xid UNSIGNED_INT,*
*  ts UNSIGNED_LONG*
*  CONSTRAINT id primary key(id)*
*)  SPLIT ON
('0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z')
;*


The indexes are created with:

*create LOCAL index data_ts on  test_data(ts);*
*create LOCAL index data_sid_ts on  test_data(sid, ts);*

On my 10 RegionServer cluster, write performance raised to 1k/s with the
tuning above.
I also tried using multi-thread for write, with 20 threads write
concurrently, the performance can raise to 5k/s, but won't raise any more
how ever threads increased.

I didn't try write with thin-client to QueryServer, I guess there won't be
any boost.



On Wed, Sep 6, 2017 at 3:21 PM, James Taylor <jamestaylor@apache.org> wrote:

> Hi Hef,
> Have you had a chance to read our Tuning Guide [1] yet? There's a lot of
> good, general guidance there. There are some optimizations for write
> performance that depend on how you expect/allow your data and schema to
> change:
> 1) Is your data write-once? Make sure to declare your table with the
> IMMUTABLE_ROWS=true property[2]. That will lower the overhead of a
> secondary index as it's not necessary to read the data row (to get the old
> value) prior to writing it when there are secondary indexes.
> 2) Does your schema only change in an append-only manner? For example, are
> columns only added, but never removed? If so, you can declare your table as
> APPEND_ONLY_SCHEMA as described here [2].
> 3) Does your schema never or rarely change at know times? If so, you can
> declare an UPDATE_CACHE_FREQUENCY property as described here [2] to reduce
> the RPC traffic.
> 4) Can you bulk load data [3] and then add or rebuild the index afterwards?
> 5) Have you investigated using local indexes [4]? They're optimized for
> write speed since they ensure that the index data is on the same region
> server as the data (i.e. all writes are local to the region server, no
> cross region server calls, but there's some overhead at read time).
> 6) Have you considered not using secondary indexes and just letting your
> less common queries be slower?
>
> Keep in mind, with secondary indexes, you're essentially writing your data
> twice. You'll need to expect that your write performance will drop. As
> usual, there's a set of tradeoffs that you need to understand and choose
> according to your requirements.
>
> Thanks,
> James
>
> [1] https://phoenix.apache.org/tuning_guide.html
> [2] https://phoenix.apache.org/language/index.html#options
> [3] https://phoenix.apache.org/bulk_dataload.html
> [4] https://phoenix.apache.org/secondary_indexing.html#Local_Indexes
>
> On Tue, Sep 5, 2017 at 11:48 AM, Josh Elser <elserj@apache.org> wrote:
>
>> 500writes/seconds seems very low to me. On my wimpy laptop, I can easily
>> see over 10K writes/second depending on the schema.
>>
>> The first check is to make sure that you have autocommit disabled.
>> Otherwise, every update you make via JDBC will trigger an HBase RPC.
>> Batching of RPCs to HBase is key to optimal performance via Phoenix.
>>
>> Regarding #2, unless you have intimate knowledge with how Phoenix writes
>> data to HBase, do not investigate this approach.
>>
>>
>> On 9/5/17 5:56 AM, Hef wrote:
>>
>>> Hi guys,
>>> I'm evaluating using Phoenix to replace MySQL for better scalability.
>>> The version I'm evaluating is 4.11-HBase-1.2, with some dependencies
>>> modified to match CDH5.9 which we are using.
>>>
>>> The problem I'm having is the write performance to Phoenix from JDBC is
>>> too poor, only 500writes/second, while our data's throughput is almost
>>> 50,000/s. My questions are:
>>> 1. If the 500/s TPS is normal speed? How fast can you achieve in your
>>> production?
>>> 2. Whether I can write directly into HBase with mutation API, and read
>>> from Phoenix, that could be fast. But I don't see the secondary index be
>>> created automatically in this case.
>>>
>>> Regards,
>>> Hef
>>>
>>
>

Mime
View raw message