phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: REG: Using Sequences in Phoenix Data Frame
Date Tue, 18 Aug 2015 15:58:59 GMT
See PhoenixHBaseLoaderIT.testDataForSQLQueryWithSequences()

On Mon, Aug 17, 2015 at 9:10 AM, Ns G <nsgnsg84@gmail.com> wrote:

> It would be really helpful if  links to resources are provided  where
> sequences are used in Map reduce which I will try to replicate in spark.
>
> Thank you James and Josh for your answers.
> On 17-Aug-2015 8:25 pm, "Josh Mahonin" <jmahonin@interset.com> wrote:
>
>> Oh, neat! I was looking for some references to it in code, unit tests and
>> docs and didn't see anything relevant.
>>
>> It's possible they might "just work" then, although it's definitely an
>> untested scenario.
>>
>> On Mon, Aug 17, 2015 at 10:48 AM, James Taylor <jamestaylor@apache.org>
>> wrote:
>>
>>> Sequences are supported by MR integration, but I'm not sure if their
>>> usage by the Spark integration would cause any issues.
>>>
>>>
>>> On Monday, August 17, 2015, Josh Mahonin <jmahonin@interset.com> wrote:
>>>
>>>> Hi Satya,
>>>>
>>>> I don't believe sequences are supported by the broader Phoenix
>>>> map-reduce integration, which the phoenix-spark module uses under the hood.
>>>>
>>>> One workaround that would give you sequential IDs, is to use the
>>>> 'zipWithIndex' method on the underlying Spark RDD, with a small 'map()'
>>>> operation to unpack / reorganize the tuple, before saving it to Phoenix.
>>>>
>>>> Good luck!
>>>>
>>>> Josh
>>>>
>>>> On Sat, Aug 15, 2015 at 10:02 AM, Ns G <nsgnsg84@gmail.com> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I hope that someone will reply to this email as all my previous emails
>>>>> have been unanswered.
>>>>>
>>>>> I have 10-20 Million records in file and I want to insert it through
>>>>> Phoenix-Spark.
>>>>> The table primary id is generated by a sequence. So, every time an
>>>>> upsert is done, the sequence Id gets generated.
>>>>>
>>>>> Now I want to implement this in Spark and more precisely using data
>>>>> frames. Since RDDs are immutables, How can I add sequence to the rows
in
>>>>> dataframe?
>>>>>
>>>>> Thanks for any help or direction or suggestion.
>>>>>
>>>>> Satya
>>>>>
>>>>
>>>>
>>

Mime
View raw message