phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas D'Silva" <tdsi...@salesforce.com>
Subject Re: Missing content in phoenix after writing from Spark
Date Wed, 12 Sep 2018 22:03:42 GMT
Is there a reason you didn't use the spark-connector to serialize your data?

On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin <saif1988@gmail.com> wrote:

> Thank you Josh! That was helpful. Indeed, there was a salt bucket on the
> table, and the key-column now shows correctly.
>
> However, the problem still persists in that the rest of the columns show
> as completely empty on Phoenix (appear correctly on Hbase). We'll be
> looking into this but if you have any further advice, appreciated.
>
> Saif
>
> On Wed, Sep 12, 2018 at 5:50 PM Josh Elser <elserj@apache.org> wrote:
>
>> Reminder: Using Phoenix internals forces you to understand exactly how
>> the version of Phoenix that you're using serializes data. Is there a
>> reason you're not using SQL to interact with Phoenix?
>>
>> Sounds to me that Phoenix is expecting more data at the head of your
>> rowkey. Maybe a salt bucket that you've defined on the table but not
>> created?
>>
>> On 9/12/18 4:32 PM, Saif Addin wrote:
>> > Hi all,
>> >
>> > We're trying to write tables with all string columns from spark.
>> > We are not using the Spark Connector, instead we are directly writing
>> > byte arrays from RDDs.
>> >
>> > The process works fine, and Hbase receives the data correctly, and
>> > content is consistent.
>> >
>> > However reading the table from Phoenix, we notice the first character
>> of
>> > strings are missing. This sounds like it's a byte encoding issue, but
>> > we're at loss. We're using PVarchar to generate bytes.
>> >
>> > Here's the snippet of code creating the RDD:
>> >
>> > val tdd = pdd.flatMap(x => {
>> >    val rowKey = PVarchar.INSTANCE.toBytes(x._1)
>> >    for(i <- 0 until cols.length) yield {
>> >      other stuff for other columns ...
>> >      ...
>> >      (rowKey, (column1, column2, column3))
>> >    }
>> > })
>> >
>> > ...
>> >
>> > We then create the following output to be written down in Hbase
>> >
>> > val output = tdd.map(x => {
>> >      val rowKeyByte: Array[Byte] = x._1
>> >      val immutableRowKey = new ImmutableBytesWritable(rowKeyByte)
>> >
>> >      val kv = new KeyValue(rowKeyByte,
>> >          PVarchar.INSTANCE.toBytes(column1),
>> >          PVarchar.INSTANCE.toBytes(column2),
>> >        PVarchar.INSTANCE.toBytes(column3)
>> >      )
>> >      (immutableRowKey, kv)
>> > })
>> >
>> > By the way, we are using *KryoSerializer* in order to be able to
>> > serialize all classes necessary for Hbase (KeyValue, BytesWritable,
>> etc).
>> >
>> > The key of this table is the one missing data when queried from
>> Phoenix.
>> > So we guess something is wrong with the byte ser.
>> >
>> > Any ideas? Appreciated!
>> > Saif
>>
>

Mime
View raw message