phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pariksheet Barapatre <pbarapa...@gmail.com>
Subject Re: Does phoenix CsvBulkLoadTool write to WAL/Memstore
Date Wed, 16 Mar 2016 16:00:58 GMT
Hi Vamsi,

How many number of rows your expecting out of your transformation and what
is the frequency of job?

If there are less number of row (< ~100K and this depends on cluster size
as well), you can go ahead with phoenix-spark plug-in , increase  batch
size to accommodate more rows, else use CVSbulkLoader.

Thanks
Pari

On 16 March 2016 at 20:03, Vamsi Krishna <vamsi.attluri@gmail.com> wrote:

> Thanks Gabriel & Ravi.
>
> I have a data processing job wirtten in Spark-Scala.
> I do a join on data from 2 data files (CSV files) and do data
> transformation on the resulting data. Finally load the transformed data
> into phoenix table using Phoenix-Spark plugin.
> On seeing that Phoenix-Spark plugin goes through regular HBase write path
> (writes to WAL), i'm thinking of option 2 to reduce the job execution time.
>
> *Option 2:* Do data transformation in Spark and write the transformed
> data to a CSV file and use Phoenix CsvBulkLoadTool to load data into
> Phoenix table.
>
> Has anyone tried this kind of exercise? Any thoughts.
>
> Thanks,
> Vamsi Attluri
>
> On Tue, Mar 15, 2016 at 9:40 PM Ravi Kiran <maghamravikiran@gmail.com>
> wrote:
>
>> Hi Vamsi,
>>    The upserts through Phoenix-spark plugin definitely go through WAL .
>>
>>
>> On Tue, Mar 15, 2016 at 5:56 AM, Gabriel Reid <gabriel.reid@gmail.com>
>> wrote:
>>
>>> Hi Vamsi,
>>>
>>> I can't answer your question abotu the Phoenix-Spark plugin (although
>>> I'm sure that someone else here can).
>>>
>>> However, I can tell you that the CsvBulkLoadTool does not write to the
>>> WAL or to the Memstore. It simply writes HFiles and then hands those
>>> HFiles over to HBase, so the memstore and WAL are never
>>> touched/affected by this.
>>>
>>> - Gabriel
>>>
>>>
>>> On Tue, Mar 15, 2016 at 1:41 PM, Vamsi Krishna <vamsi.attluri@gmail.com>
>>> wrote:
>>> > Team,
>>> >
>>> > Does phoenix CsvBulkLoadTool write to HBase WAL/Memstore?
>>> >
>>> > Phoenix-Spark plugin:
>>> > Does saveToPhoenix method on RDD[Tuple] write to HBase WAL/Memstore?
>>> >
>>> > Thanks,
>>> > Vamsi Attluri
>>> > --
>>> > Vamsi Attluri
>>>
>>
>> --
> Vamsi Attluri
>



-- 
Cheers,
Pari

Mime
View raw message