phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Mahonin <jmaho...@interset.com>
Subject Re: REG: Using Sequences in Phoenix Data Frame
Date Mon, 17 Aug 2015 14:55:42 GMT
Oh, neat! I was looking for some references to it in code, unit tests and
docs and didn't see anything relevant.

It's possible they might "just work" then, although it's definitely an
untested scenario.

On Mon, Aug 17, 2015 at 10:48 AM, James Taylor <jamestaylor@apache.org>
wrote:

> Sequences are supported by MR integration, but I'm not sure if their
> usage by the Spark integration would cause any issues.
>
>
> On Monday, August 17, 2015, Josh Mahonin <jmahonin@interset.com> wrote:
>
>> Hi Satya,
>>
>> I don't believe sequences are supported by the broader Phoenix map-reduce
>> integration, which the phoenix-spark module uses under the hood.
>>
>> One workaround that would give you sequential IDs, is to use the
>> 'zipWithIndex' method on the underlying Spark RDD, with a small 'map()'
>> operation to unpack / reorganize the tuple, before saving it to Phoenix.
>>
>> Good luck!
>>
>> Josh
>>
>> On Sat, Aug 15, 2015 at 10:02 AM, Ns G <nsgnsg84@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> I hope that someone will reply to this email as all my previous emails
>>> have been unanswered.
>>>
>>> I have 10-20 Million records in file and I want to insert it through
>>> Phoenix-Spark.
>>> The table primary id is generated by a sequence. So, every time an
>>> upsert is done, the sequence Id gets generated.
>>>
>>> Now I want to implement this in Spark and more precisely using data
>>> frames. Since RDDs are immutables, How can I add sequence to the rows in
>>> dataframe?
>>>
>>> Thanks for any help or direction or suggestion.
>>>
>>> Satya
>>>
>>
>>

Mime
View raw message