phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saif Addin <saif1...@gmail.com>
Subject Missing content in phoenix after writing from Spark
Date Wed, 12 Sep 2018 20:32:38 GMT
Hi all,

We're trying to write tables with all string columns from spark.
We are not using the Spark Connector, instead we are directly writing byte
arrays from RDDs.

The process works fine, and Hbase receives the data correctly, and content
is consistent.

However reading the table from Phoenix, we notice the first character of
strings are missing. This sounds like it's a byte encoding issue, but we're
at loss. We're using PVarchar to generate bytes.

Here's the snippet of code creating the RDD:

val tdd = pdd.flatMap(x => {
  val rowKey = PVarchar.INSTANCE.toBytes(x._1)
  for(i <- 0 until cols.length) yield {
    other stuff for other columns ...
    ...
    (rowKey, (column1, column2, column3))
  }
})

...

We then create the following output to be written down in Hbase

val output = tdd.map(x => {
    val rowKeyByte: Array[Byte] = x._1
    val immutableRowKey = new ImmutableBytesWritable(rowKeyByte)

    val kv = new KeyValue(rowKeyByte,
        PVarchar.INSTANCE.toBytes(column1),
        PVarchar.INSTANCE.toBytes(column2),
      PVarchar.INSTANCE.toBytes(column3)
    )

    (immutableRowKey, kv)
})

By the way, we are using *KryoSerializer* in order to be able to serialize
all classes necessary for Hbase (KeyValue, BytesWritable, etc).

The key of this table is the one missing data when queried from Phoenix. So
we guess something is wrong with the byte ser.

Any ideas? Appreciated!
Saif

Mime
View raw message