phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sunfl@certusnet.com.cn" <su...@certusnet.com.cn>
Subject Re: MapReduce bulk load into Phoenix table
Date Tue, 13 Jan 2015 09:29:44 GMT
Hi, Constantin
You can try to use Apache Spark to complete the mapreduce bulkload job. As far as I know,
bulkloading into
phoenix or hbase may be affected by several conditions, like wal enabled or numbers of split
regions. And your hbase
or phoenix configuration parameter may also influence the bulkloading performance. 

You can share more about your specific data loading information and I can help you do some
tuning work.

Thanks,
Sun.





CertusNet 

From: Ciureanu, Constantin (GfK)
Date: 2015-01-13 17:12
To: user@phoenix.apache.org
Subject: MapReduce bulk load into Phoenix table
Hello all,
 
(Due to the slow speed of Phoenix JDBC – single machine ~ 1000-1500 rows /sec) I am also
documenting myself about loading data into Phoenix via MapReduce.
 
So far I understood that the Key + List<[Key,Value]> to be inserted into HBase table
is obtained via a “dummy” Phoenix connection – then those rows are stored into HFiles
(then after the MR job finishes it is Bulk loading those HFiles normally into HBase).
 
My question: Is there any better / faster approach? I assume this cannot reach the maximum
speed to load data into Phoenix / HBase table.
   
Also I would like to find a better / newer sample code than this one:
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.phoenix/phoenix/4.0.0-incubating/org/apache/phoenix/mapreduce/CsvToKeyValueMapper.java#CsvToKeyValueMapper.loadPreUpsertProcessor%28org.apache.hadoop.conf.Configuration%29
 
Thank you,
   Constantin
Mime
View raw message