phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: Is there a way to specify split num or reducer num when creating phoenix table ?
Date Thu, 29 Aug 2019 12:18:21 GMT
Configuring salt buckets is not the same thing as pre-splitting a table. 
You should not be setting a crazy large number of buckets like you are.

If you want more parallelism in the MapReduce job, pre-split along 
date-boundaries, with the salt bucket taken into consideration (e.g. 
\x00_date, \x01_date, \x02_date).

HBase requires that a file to be bulk-loaded fit inside of a single 
region. A Reducer will only generate data for a single Region (as a 
Reducer can only generate one file). Create more regions, and you will 
get more parallelism.

On 8/29/19 4:11 AM, you Zhuang wrote:
> I have a chronological series of data. Data row like dt, r1 ,r2 ,r3 ,r4 ,r5 ,r6 ,d1 ,d2
,d3 ,d4 , d5 …
> 
> And dt is format  as  20190829 , increasing monotonically, such as 20190830,20190831...
> 
> The query pattern is some like select * from table where dt between 20180620 and  20190829
and r3 = ? And r6 = ?;
> 
> Dt is mandatory, remain filter is some random combination of r1 to r6, selected columns
are always  all columns *.
> 
> 
> I have made dt,r1,r2,… r6 to be compound primary key. The create table clause is below:
> 
> CREATE TABLE app.table(
>   Dt integer not null ,
>   R1 integer not null,
>   R2 integer not null,
>   R3 integer not null,
>   R4 integer not null,
>   R5 integer not null,
>   R6 integer not null,
> 
>   D1 decimal(30,6),
>   D2 decimal(30,6),
>   D3 decimal(30,6),
>   D4 decimal(30,6),
>   D5 decimal(30,6),
>   D6 decimal(30,6)
> 
> 
>   CONSTRAINT pk PRIMARY KEY (dt,r1,r2,r3,r4,r5,r6)
> ) SALT_BUCKETS = 3,UPDATE_CACHE_FREQUENCY = 300000,COMPRESSION = 'SNAPPY',  SPLIT_POLICY
= 'org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy', MAX_FILESIZE = '5000000000’;
> 
> I have 3 region server so I determine SALT_BUCKETS = 3.
> 
> But when I initially load table data with csvbulkload tool , the  dt ranges from  20180620
to 20190829, data size is about 1T,
> 
> Csvbulkload map reduce shows 3 partitions for reducer, It  always failed due to so small
partitions.
> 
> I increase SALT_BUCKETS = 512, but max SALT_BUCKETS = 256, I set it to 256 but not works.
> 
> 
> 
> I know I can split on (…)  when creating table, but I don’t know how to determine
the point , and hundreds of points is scaring.
> 
> 
> So Is there a way to specify split num or reducer num when creating phoenix table ?
> 
> I will be expecting any advice for tuning this scenario.
> 

Mime
View raw message