phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: REG: Using Sequences in Phoenix Data Frame
Date Mon, 17 Aug 2015 14:48:42 GMT
Sequences are supported by MR integration, but I'm not sure if their usage
by the Spark integration would cause any issues.

On Monday, August 17, 2015, Josh Mahonin <jmahonin@interset.com> wrote:

> Hi Satya,
>
> I don't believe sequences are supported by the broader Phoenix map-reduce
> integration, which the phoenix-spark module uses under the hood.
>
> One workaround that would give you sequential IDs, is to use the
> 'zipWithIndex' method on the underlying Spark RDD, with a small 'map()'
> operation to unpack / reorganize the tuple, before saving it to Phoenix.
>
> Good luck!
>
> Josh
>
> On Sat, Aug 15, 2015 at 10:02 AM, Ns G <nsgnsg84@gmail.com
> <javascript:_e(%7B%7D,'cvml','nsgnsg84@gmail.com');>> wrote:
>
>> Hi All,
>>
>> I hope that someone will reply to this email as all my previous emails
>> have been unanswered.
>>
>> I have 10-20 Million records in file and I want to insert it through
>> Phoenix-Spark.
>> The table primary id is generated by a sequence. So, every time an upsert
>> is done, the sequence Id gets generated.
>>
>> Now I want to implement this in Spark and more precisely using data
>> frames. Since RDDs are immutables, How can I add sequence to the rows in
>> dataframe?
>>
>> Thanks for any help or direction or suggestion.
>>
>> Satya
>>
>
>

Mime
View raw message