phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antonio Murgia <antonio.mur...@eng.it>
Subject bulk-upsert spark phoenix
Date Tue, 27 Sep 2016 14:43:18 GMT
Hi,

I would like to perform a Bulk insert to HBase using Apache Phoenix from
Spark. I tried using Apache Spark Phoenix library but, as far as I was
able to understand from the code, it looks like it performs a jdbc batch
of upserts (am I right?). Instead I want to perform a Bulk load like the
one described in this blog post
(https://zeyuanxy.github.io/HBase-Bulk-Loading/) but taking advance of
the automatic transformation between java/scala types to Bytes.

I'm actually using phoenix 4.5.2, therefore I cannot use hive to
manipulate the phoenix table, and if it possible i want to avoid to
spawn a MR job that reads data from csv
(https://phoenix.apache.org/bulk_dataload.html). Actually i just want to
do what the csv loader is doing with MR but programmatically with Spark
(since the data I want to persist is already loaded in memory).

Thank you all!


Mime
View raw message