phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohanraj Ragupathiraj <mohanaug...@gmail.com>
Subject Re: PHOENIX SPARK - DataFrame for BulkLoad
Date Fri, 20 May 2016 06:54:01 GMT
Thank you very much., I will try and post the updates.

On Wed, May 18, 2016 at 10:29 PM, Josh Mahonin <jmahonin@gmail.com> wrote:

> Hi,
>
> The Spark integration uses the Phoenix MapReduce framework, which under
> the hood translates those to UPSERTs spread across a number of workers.
>
> You should try out both methods and see which works best for your use
> case. For what it's worth, we routinely do load / save operations using the
> Spark integration on those data sizes.
>
> Josh
>
> On Tue, May 17, 2016 at 7:03 AM, Radha krishna <grkmca95@gmail.com> wrote:
>
>> Hi
>>
>> I have the same scenario, can you share your metrics like column count
>> for each row, number of SALT_BUCKETS, compression technique which you used
>> and how much time it is taking to load the complete data.
>>
>> my scenario is I have to load 1.9 billions of records ( approx 20 files
>> data each file contains 100 million rows and 102 columns per each row)
>> currently it is taking 35 to 45 minutes to load one file data
>>
>>
>>
>> On Tue, May 17, 2016 at 3:51 PM, Mohanraj Ragupathiraj <
>> mohanaugust@gmail.com> wrote:
>>
>>> I have 100 million records to be inserted to a HBase table (PHOENIX) as
>>> a result of a Spark Job. I would like to know if i convert it to a
>>> Dataframe and save it, will it do Bulk load (or) it is not the efficient
>>> way to write data to Phoenix HBase table
>>>
>>> --
>>> Thanks and Regards
>>> Mohan
>>>
>>
>>
>>
>> --
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks & Regards
>>    Radha krishna
>>
>>
>>
>


-- 
Thanks and Regards
Mohan
VISA Pte Limited, Singapore.

Mime
View raw message