Also, try separating your columns into multiple column families to prevent having to scan past your 75+ column qualifiers for every query.

On Wed, Jul 1, 2015 at 4:47 AM, Puneet Kumar Ojha <puneet.kumar@pubmatic.com> wrote:

Yes …Salting will improve the scan performance. Try with numbers 5,10,20 . As I do not know about the cluster details.

 

Increase scanner caching to 100000.

 

Check if SNAPPY is working …I hope you need to put the jars classpath as well.

 

Since the cardinality of the col1 and col2 fields is very small use date as first column. Also put date as integer.

 

Try modifying the memory settings related to heap in hbase site.xml.

 

Try naming the Column Qualifiers as single alphabets. They consume space and takes more time to scan.

 

Thanks

Puneet.

 

 

From: Nishant Patel [mailto:nishant.k.patel@gmail.com]
Sent: Wednesday, July 01, 2015 4:33 PM
To: user@phoenix.apache.org
Subject: Re: Hbase and Phoenix Performance improvement

 

HI Puneet/Martin,

Thanks for your response. Please see my answer as below.

I have not specified any salt bucket. I have created Phoenix View on existing Hbase Table. Can I specify Salt bucket for Phoenix View?

After loading Hbase data I alter table to use SNAPPY Compression. Are you talking about any other compression?

I have set hbase.client.scanner.caching to 500. I tried with 1000 also but did not see any performance improvement.

I am not using with production system. I have inserted data once and not deleting so there should not be problem. There is no load on Hbase servers as I am just reading data right now.

Sample query is as below.

Select column5,count(1) ttr from table where column1='column1' and column2='column2' and date>='20150504' and date<='20150704' group by column5.

I am doing scan based on where condition. Column1, column2 and date is part of my rowkey so it should not perform complete table scan. My rowkey design is as below

column1|column2|date|unique_identifier

Regards,

Nishant

 

On Wed, Jul 1, 2015 at 2:07 PM, Martin Pernollet <mpernollet@octo.com> wrote:

It sounds like you are scanning rather than getting rows based on a known row id. Am I wrong?

One thing I am currently trying is to have indexed columns and "hot" content in one column family and let "cold" content in another family. It speed up scanning the table when you need to

 

Le mer. 1 juil. 2015 à 06:56, Nishant Patel <nishant.k.patel@gmail.com> a écrit :

Hi,

I am trying to measure performance for Hbase and Phoenix.

I have generated 1000 records per day with combination of Column1 and Column2.

I have created 5 different combination for column1 and column2 and created data for 365 days. Total records I have generated 5 * 5 * 365 * 1000 = 9125000

I am writing 75+ qualifiers in one Column Family for each record.

 

Rowkey Design is as below : column1|column2|date(yyyyMMdd)|unique identifier. I have used one byte character as rowkey separator. I have create view in Phoenix on top of Hbase table.

My all queries contain column1 , column2 and date as filter condition.

If date range is less than 1 month I get response in less than 1 second. if date range is 3/6/12 months then response comes in seconds. Sometime it takes 25+ seconds for 12 months range.

My question is, is it possible to get response in phoenix in less than 1 second for amount of data I have specified. If yes what kind of tuning need to be done? As of now I have not done any changes at Hbase and Phoenix except proper rowkey design.

I am trying to verify whether phoenix will suit our requirement or not.

--

Thanks,
Nishant




--

Regards,
Nishant Patel