phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jaanai Zhang <cloud.pos...@gmail.com>
Subject Re: Missing content in phoenix after writing from Spark
Date Thu, 13 Sep 2018 02:03:10 GMT
It seems columns data missing mapping information of the schema. if you
want to use this way to write HBase table,  you can create an HBase table
and uses Phoenix mapping it.

----------------------------------------
   Jaanai Zhang
   Best regards!



Thomas D'Silva <tdsilva@salesforce.com> 于2018年9月13日周四 上午6:03写道:

> Is there a reason you didn't use the spark-connector to serialize your
> data?
>
> On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin <saif1988@gmail.com> wrote:
>
>> Thank you Josh! That was helpful. Indeed, there was a salt bucket on the
>> table, and the key-column now shows correctly.
>>
>> However, the problem still persists in that the rest of the columns show
>> as completely empty on Phoenix (appear correctly on Hbase). We'll be
>> looking into this but if you have any further advice, appreciated.
>>
>> Saif
>>
>> On Wed, Sep 12, 2018 at 5:50 PM Josh Elser <elserj@apache.org> wrote:
>>
>>> Reminder: Using Phoenix internals forces you to understand exactly how
>>> the version of Phoenix that you're using serializes data. Is there a
>>> reason you're not using SQL to interact with Phoenix?
>>>
>>> Sounds to me that Phoenix is expecting more data at the head of your
>>> rowkey. Maybe a salt bucket that you've defined on the table but not
>>> created?
>>>
>>> On 9/12/18 4:32 PM, Saif Addin wrote:
>>> > Hi all,
>>> >
>>> > We're trying to write tables with all string columns from spark.
>>> > We are not using the Spark Connector, instead we are directly writing
>>> > byte arrays from RDDs.
>>> >
>>> > The process works fine, and Hbase receives the data correctly, and
>>> > content is consistent.
>>> >
>>> > However reading the table from Phoenix, we notice the first character
>>> of
>>> > strings are missing. This sounds like it's a byte encoding issue, but
>>> > we're at loss. We're using PVarchar to generate bytes.
>>> >
>>> > Here's the snippet of code creating the RDD:
>>> >
>>> > val tdd = pdd.flatMap(x => {
>>> >    val rowKey = PVarchar.INSTANCE.toBytes(x._1)
>>> >    for(i <- 0 until cols.length) yield {
>>> >      other stuff for other columns ...
>>> >      ...
>>> >      (rowKey, (column1, column2, column3))
>>> >    }
>>> > })
>>> >
>>> > ...
>>> >
>>> > We then create the following output to be written down in Hbase
>>> >
>>> > val output = tdd.map(x => {
>>> >      val rowKeyByte: Array[Byte] = x._1
>>> >      val immutableRowKey = new ImmutableBytesWritable(rowKeyByte)
>>> >
>>> >      val kv = new KeyValue(rowKeyByte,
>>> >          PVarchar.INSTANCE.toBytes(column1),
>>> >          PVarchar.INSTANCE.toBytes(column2),
>>> >        PVarchar.INSTANCE.toBytes(column3)
>>> >      )
>>> >      (immutableRowKey, kv)
>>> > })
>>> >
>>> > By the way, we are using *KryoSerializer* in order to be able to
>>> > serialize all classes necessary for Hbase (KeyValue, BytesWritable,
>>> etc).
>>> >
>>> > The key of this table is the one missing data when queried from
>>> Phoenix.
>>> > So we guess something is wrong with the byte ser.
>>> >
>>> > Any ideas? Appreciated!
>>> > Saif
>>>
>>
>

Mime
View raw message