phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anil Gupta <anilgupt...@gmail.com>
Subject Re: Hbase and Phoenix Performance improvement
Date Thu, 02 Jul 2015 03:22:41 GMT
Hi Nishant,

 Refer to HBase wiki for multiple column families. As per my experience, don't try to have
more than 2-3 column family. Also group the column in column families on basis of access pattern.

If you don't have an access where you can avoid reading a column family then you would not
gain any performance. So, evaluate your access patterns before you create multiple column
families.

Sent from my iPhone

> On Jul 1, 2015, at 6:33 PM, Nishant Patel <nishant.k.patel@gmail.com> wrote:
> 
> Thanks Puneet and James for your responses.
> 
> Date is not recommended as first part of rowkey. It will create issue during write operation.
In real production scenario we will have more data and will have more values for column1 and
column2. 
> 
> Will try other things today. Lets see how much I can achieve today.
> 
> Regards,
> Nishant
> 
>> On Wed, Jul 1, 2015 at 9:52 PM, James Taylor <jamestaylor@apache.org> wrote:
>> Also, try separating your columns into multiple column families to prevent having
to scan past your 75+ column qualifiers for every query.
>> 
>>> On Wed, Jul 1, 2015 at 4:47 AM, Puneet Kumar Ojha <puneet.kumar@pubmatic.com>
wrote:
>>> Yes …Salting will improve the scan performance. Try with numbers 5,10,20 .
As I do not know about the cluster details.
>>> 
>>>  
>>> 
>>> Increase scanner caching to 100000.
>>> 
>>>  
>>> 
>>> Check if SNAPPY is working …I hope you need to put the jars classpath as well.
>>> 
>>>  
>>> 
>>> Since the cardinality of the col1 and col2 fields is very small use date as first
column. Also put date as integer.
>>> 
>>>  
>>> 
>>> Try modifying the memory settings related to heap in hbase site.xml.
>>> 
>>>  
>>> 
>>> Try naming the Column Qualifiers as single alphabets. They consume space and
takes more time to scan.
>>> 
>>>  
>>> 
>>> Thanks
>>> 
>>> Puneet.
>>> 
>>>  
>>> 
>>>  
>>> 
>>> From: Nishant Patel [mailto:nishant.k.patel@gmail.com] 
>>> Sent: Wednesday, July 01, 2015 4:33 PM
>>> To: user@phoenix.apache.org
>>> Subject: Re: Hbase and Phoenix Performance improvement
>>> 
>>>  
>>> 
>>> HI Puneet/Martin,
>>> 
>>> Thanks for your response. Please see my answer as below.
>>> 
>>> I have not specified any salt bucket. I have created Phoenix View on existing
Hbase Table. Can I specify Salt bucket for Phoenix View?
>>> 
>>> After loading Hbase data I alter table to use SNAPPY Compression. Are you talking
about any other compression?
>>> 
>>> I have set hbase.client.scanner.caching to 500. I tried with 1000 also but did
not see any performance improvement.
>>> 
>>> I am not using with production system. I have inserted data once and not deleting
so there should not be problem. There is no load on Hbase servers as I am just reading data
right now.
>>> 
>>> Sample query is as below.
>>> 
>>> Select column5,count(1) ttr from table where column1='column1' and column2='column2'
and date>='20150504' and date<='20150704' group by column5.
>>> 
>>> I am doing scan based on where condition. Column1, column2 and date is part of
my rowkey so it should not perform complete table scan. My rowkey design is as below
>>> 
>>> column1|column2|date|unique_identifier
>>> 
>>> Regards,
>>> 
>>> Nishant
>>> 
>>>  
>>> 
>>> On Wed, Jul 1, 2015 at 2:07 PM, Martin Pernollet <mpernollet@octo.com>
wrote:
>>> 
>>> It sounds like you are scanning rather than getting rows based on a known row
id. Am I wrong?
>>> 
>>> One thing I am currently trying is to have indexed columns and "hot" content
in one column family and let "cold" content in another family. It speed up scanning the table
when you need to
>>> 
>>>  
>>> 
>>> Le mer. 1 juil. 2015 à 06:56, Nishant Patel <nishant.k.patel@gmail.com>
a écrit :
>>> 
>>> Hi,
>>> 
>>> I am trying to measure performance for Hbase and Phoenix.
>>> 
>>> I have generated 1000 records per day with combination of Column1 and Column2.
>>> 
>>> I have created 5 different combination for column1 and column2 and created data
for 365 days. Total records I have generated 5 * 5 * 365 * 1000 = 9125000
>>> 
>>> I am writing 75+ qualifiers in one Column Family for each record.
>>> 
>>>  
>>> 
>>> Rowkey Design is as below : column1|column2|date(yyyyMMdd)|unique identifier.
I have used one byte character as rowkey separator. I have create view in Phoenix on top of
Hbase table.
>>> 
>>> My all queries contain column1 , column2 and date as filter condition.
>>> 
>>> If date range is less than 1 month I get response in less than 1 second. if date
range is 3/6/12 months then response comes in seconds. Sometime it takes 25+ seconds for 12
months range.
>>> 
>>> My question is, is it possible to get response in phoenix in less than 1 second
for amount of data I have specified. If yes what kind of tuning need to be done? As of now
I have not done any changes at Hbase and Phoenix except proper rowkey design.
>>> 
>>> I am trying to verify whether phoenix will suit our requirement or not.
>>> 
>>> --
>>> 
>>> Thanks,
>>> Nishant
>>> 
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Regards,
>>> Nishant Patel
>>> 
> 
> 
> 
> -- 
> Regards,
> Nishant Patel
> 

Mime
View raw message