phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Mahonin <jmaho...@gmail.com>
Subject Re: PHOENIX SPARK - DataFrame for BulkLoad
Date Wed, 18 May 2016 14:29:55 GMT
Hi,

The Spark integration uses the Phoenix MapReduce framework, which under the
hood translates those to UPSERTs spread across a number of workers.

You should try out both methods and see which works best for your use case.
For what it's worth, we routinely do load / save operations using the Spark
integration on those data sizes.

Josh

On Tue, May 17, 2016 at 7:03 AM, Radha krishna <grkmca95@gmail.com> wrote:

> Hi
>
> I have the same scenario, can you share your metrics like column count for
> each row, number of SALT_BUCKETS, compression technique which you used and
> how much time it is taking to load the complete data.
>
> my scenario is I have to load 1.9 billions of records ( approx 20 files
> data each file contains 100 million rows and 102 columns per each row)
> currently it is taking 35 to 45 minutes to load one file data
>
>
>
> On Tue, May 17, 2016 at 3:51 PM, Mohanraj Ragupathiraj <
> mohanaugust@gmail.com> wrote:
>
>> I have 100 million records to be inserted to a HBase table (PHOENIX) as a
>> result of a Spark Job. I would like to know if i convert it to a Dataframe
>> and save it, will it do Bulk load (or) it is not the efficient way to write
>> data to Phoenix HBase table
>>
>> --
>> Thanks and Regards
>> Mohan
>>
>
>
>
> --
>
>
>
>
>
>
>
>
> Thanks & Regards
>    Radha krishna
>
>
>

Mime
View raw message