Here's another good tuning resource that covers HBase too:
http://phoenix.apache.org/presentations/TuningForOLTP.pdf
On Fri, Sep 8, 2017 at 8:30 AM, Josh Elser <elserj@apache.org> wrote:
> Hef -- do your split points actually correspond with the distribution on
> values of your `id` column? You can tell this pretty easily looking at the
> number of requests per region for your data table on the HBase UI.
>
> And yes, PQS will not increase the performance (as it is adding "more
> work" to accomplish the same thing the thick driver accomplishes itself).
>
> 5K/s updates with two indexes seems OK to me for a laptop/VM. Maybe you
> need to increase the number of handlers you give HBase? What are the
> hardware characteristics of the system that you're running HBase on?
>
> On 9/8/17 2:57 AM, Hef wrote:
>
>> Hi James,
>> I have read over the Tuning Guide, and tried some of your suggestions:
>> #3, #5, #6. Since the date is mutable, and read/write frequently, I did not
>> try #1, #2, #4.
>> The schema is simple as such:
>>
>> /create table if not exists test_data (/
>> / id VARCHAR(32),/
>> / sid VARCHAR(32),/
>> / uid UNSIGNED_LONG,/
>> / xid UNSIGNED_INT,/
>> / ts UNSIGNED_LONG/
>> / CONSTRAINT id primary key(id)/
>> /) SPLIT ON ('0','1','2','3','4','5','6','7','8','9','a','b','c','d','e'
>> ,'f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z')
>> ;/
>>
>>
>> The indexes are created with:
>>
>> /create LOCAL index data_ts on test_data(ts);/
>> /create LOCAL index data_sid_ts on test_data(sid, ts);/
>>
>> On my 10 RegionServer cluster, write performance raised to 1k/s with the
>> tuning above.
>> I also tried using multi-thread for write, with 20 threads write
>> concurrently, the performance can raise to 5k/s, but won't raise any more
>> how ever threads increased.
>>
>> I didn't try write with thin-client to QueryServer, I guess there won't
>> be any boost.
>>
>>
>>
>> On Wed, Sep 6, 2017 at 3:21 PM, James Taylor <jamestaylor@apache.org
>> <mailto:jamestaylor@apache.org>> wrote:
>>
>> Hi Hef,
>> Have you had a chance to read our Tuning Guide [1] yet? There's a
>> lot of good, general guidance there. There are some optimizations
>> for write performance that depend on how you expect/allow your data
>> and schema to change:
>> 1) Is your data write-once? Make sure to declare your table with the
>> IMMUTABLE_ROWS=true property[2]. That will lower the overhead of a
>> secondary index as it's not necessary to read the data row (to get
>> the old value) prior to writing it when there are secondary indexes.
>> 2) Does your schema only change in an append-only manner? For
>> example, are columns only added, but never removed? If so, you can
>> declare your table as APPEND_ONLY_SCHEMA as described here [2].
>> 3) Does your schema never or rarely change at know times? If so, you
>> can declare an UPDATE_CACHE_FREQUENCY property as described here [2]
>> to reduce the RPC traffic.
>> 4) Can you bulk load data [3] and then add or rebuild the index
>> afterwards?
>> 5) Have you investigated using local indexes [4]? They're optimized
>> for write speed since they ensure that the index data is on the same
>> region server as the data (i.e. all writes are local to the region
>> server, no cross region server calls, but there's some overhead at
>> read time).
>> 6) Have you considered not using secondary indexes and just letting
>> your less common queries be slower?
>>
>> Keep in mind, with secondary indexes, you're essentially writing
>> your data twice. You'll need to expect that your write performance
>> will drop. As usual, there's a set of tradeoffs that you need to
>> understand and choose according to your requirements.
>>
>> Thanks,
>> James
>>
>> [1] https://phoenix.apache.org/tuning_guide.html
>> <https://phoenix.apache.org/tuning_guide.html>
>> [2] https://phoenix.apache.org/language/index.html#options
>> <https://phoenix.apache.org/language/index.html#options>
>> [3] https://phoenix.apache.org/bulk_dataload.html
>> <https://phoenix.apache.org/bulk_dataload.html>
>> [4] https://phoenix.apache.org/secondary_indexing.html#Local_Indexes
>> <https://phoenix.apache.org/secondary_indexing.html#Local_Indexes>
>>
>> On Tue, Sep 5, 2017 at 11:48 AM, Josh Elser <elserj@apache.org
>> <mailto:elserj@apache.org>> wrote:
>>
>> 500writes/seconds seems very low to me. On my wimpy laptop, I
>> can easily see over 10K writes/second depending on the schema.
>>
>> The first check is to make sure that you have autocommit
>> disabled. Otherwise, every update you make via JDBC will trigger
>> an HBase RPC. Batching of RPCs to HBase is key to optimal
>> performance via Phoenix.
>>
>> Regarding #2, unless you have intimate knowledge with how
>> Phoenix writes data to HBase, do not investigate this approach.
>>
>>
>> On 9/5/17 5:56 AM, Hef wrote:
>>
>> Hi guys,
>> I'm evaluating using Phoenix to replace MySQL for better
>> scalability.
>> The version I'm evaluating is 4.11-HBase-1.2, with some
>> dependencies modified to match CDH5.9 which we are using.
>>
>> The problem I'm having is the write performance to Phoenix
>> from JDBC is too poor, only 500writes/second, while our
>> data's throughput is almost 50,000/s. My questions are:
>> 1. If the 500/s TPS is normal speed? How fast can you
>> achieve in your production?
>> 2. Whether I can write directly into HBase with mutation
>> API, and read from Phoenix, that could be fast. But I don't
>> see the secondary index be created automatically in this case.
>>
>> Regards,
>> Hef
>>
>>
>>
>>
|