phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: Missing content in phoenix after writing from Spark
Date Wed, 12 Sep 2018 20:50:38 GMT
Reminder: Using Phoenix internals forces you to understand exactly how 
the version of Phoenix that you're using serializes data. Is there a 
reason you're not using SQL to interact with Phoenix?

Sounds to me that Phoenix is expecting more data at the head of your 
rowkey. Maybe a salt bucket that you've defined on the table but not 
created?

On 9/12/18 4:32 PM, Saif Addin wrote:
> Hi all,
> 
> We're trying to write tables with all string columns from spark.
> We are not using the Spark Connector, instead we are directly writing 
> byte arrays from RDDs.
> 
> The process works fine, and Hbase receives the data correctly, and 
> content is consistent.
> 
> However reading the table from Phoenix, we notice the first character of 
> strings are missing. This sounds like it's a byte encoding issue, but 
> we're at loss. We're using PVarchar to generate bytes.
> 
> Here's the snippet of code creating the RDD:
> 
> val tdd = pdd.flatMap(x => {
>    val rowKey = PVarchar.INSTANCE.toBytes(x._1)
>    for(i <- 0 until cols.length) yield {
>      other stuff for other columns ...
>      ...
>      (rowKey, (column1, column2, column3))
>    }
> })
> 
> ...
> 
> We then create the following output to be written down in Hbase
> 
> val output = tdd.map(x => {
>      val rowKeyByte: Array[Byte] = x._1
>      val immutableRowKey = new ImmutableBytesWritable(rowKeyByte)
> 
>      val kv = new KeyValue(rowKeyByte,
>          PVarchar.INSTANCE.toBytes(column1),
>          PVarchar.INSTANCE.toBytes(column2),
>        PVarchar.INSTANCE.toBytes(column3)
>      )
>      (immutableRowKey, kv)
> })
> 
> By the way, we are using *KryoSerializer* in order to be able to 
> serialize all classes necessary for Hbase (KeyValue, BytesWritable, etc).
> 
> The key of this table is the one missing data when queried from Phoenix. 
> So we guess something is wrong with the byte ser.
> 
> Any ideas? Appreciated!
> Saif

Mime
View raw message