phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neelesh <neele...@gmail.com>
Subject Re: Spark & Phoenix data load
Date Mon, 11 Apr 2016 05:19:28 GMT
Thanks Josh. I looked at the code as well and you are right.  It would've
been great to disconnect the core bulkloader logic from CSV. That would
make more direct bulkload integrations possible. Hopefully I'll get to that
one of these days.
On Apr 10, 2016 11:52 AM, "Josh Mahonin" <jmahonin@gmail.com> wrote:

Hi Neelesh,

The saveToPhoenix method uses the MapReduce PhoenixOutputFormat under the
hood, which is a wrapper over the JDBC driver. It's likely not as efficient
as the CSVBulkLoader, although there are performance improvements over a
simple JDBC client as the writes are spread across multiple Spark workers
(depending on the number of partitions in the RDD/DataFrame).

Regards,

Josh

On Sun, Apr 10, 2016 at 1:21 AM, Neelesh <neeleshs@gmail.com> wrote:

> Hi ,
>   Does phoenix-spark's saveToPhoenix use the JDBC driver internally, or
> does it do something similar to CSVBulkLoader using HFiles?
>
> Thanks!
>
>

Mime
View raw message