Thanks Josh. I looked at the code as well and you are right.  It would've been great to disconnect the core bulkloader logic from CSV. That would make more direct bulkload integrations possible. Hopefully I'll get to that one of these days.

On Apr 10, 2016 11:52 AM, "Josh Mahonin" <jmahonin@gmail.com> wrote:
Hi Neelesh,

The saveToPhoenix method uses the MapReduce PhoenixOutputFormat under the hood, which is a wrapper over the JDBC driver. It's likely not as efficient as the CSVBulkLoader, although there are performance improvements over a simple JDBC client as the writes are spread across multiple Spark workers (depending on the number of partitions in the RDD/DataFrame).

Regards,

Josh

On Sun, Apr 10, 2016 at 1:21 AM, Neelesh <neeleshs@gmail.com> wrote:
Hi ,
  Does phoenix-spark's saveToPhoenix use the JDBC driver internally, or does it do something similar to CSVBulkLoader using HFiles?

Thanks!