phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: Missing content in phoenix after writing from Spark
Date Thu, 13 Sep 2018 14:38:25 GMT
Pretty sure we ran tests with Spark 2.3 with Phoenix 5.0. Not sure if 
Spark has already moved beyond that.

On 9/12/18 11:00 PM, Saif Addin wrote:
> Thanks, we'll try Spark Connector then. Thought it didn't support newest 
> Spark Versions
> 
> On Wed, Sep 12, 2018 at 11:03 PM Jaanai Zhang <cloud.poster@gmail.com 
> <mailto:cloud.poster@gmail.com>> wrote:
> 
>     It seems columns data missing mapping information of the schema. if
>     you want to use this way to write HBase table,  you can create an
>     HBase table and uses Phoenix mapping it.
> 
>     ----------------------------------------
>         Jaanai Zhang
>         Best regards!
> 
> 
> 
>     Thomas D'Silva <tdsilva@salesforce.com
>     <mailto:tdsilva@salesforce.com>> 于2018年9月13日周四 上午6:03写道:
> 
>         Is there a reason you didn't use the spark-connector to
>         serialize your data?
> 
>         On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin <saif1988@gmail.com
>         <mailto:saif1988@gmail.com>> wrote:
> 
>             Thank you Josh! That was helpful. Indeed, there was a salt
>             bucket on the table, and the key-column now shows correctly.
> 
>             However, the problem still persists in that the rest of the
>             columns show as completely empty on Phoenix (appear
>             correctly on Hbase). We'll be looking into this but if you
>             have any further advice, appreciated.
> 
>             Saif
> 
>             On Wed, Sep 12, 2018 at 5:50 PM Josh Elser
>             <elserj@apache.org <mailto:elserj@apache.org>> wrote:
> 
>                 Reminder: Using Phoenix internals forces you to
>                 understand exactly how
>                 the version of Phoenix that you're using serializes
>                 data. Is there a
>                 reason you're not using SQL to interact with Phoenix?
> 
>                 Sounds to me that Phoenix is expecting more data at the
>                 head of your
>                 rowkey. Maybe a salt bucket that you've defined on the
>                 table but not
>                 created?
> 
>                 On 9/12/18 4:32 PM, Saif Addin wrote:
>                  > Hi all,
>                  >
>                  > We're trying to write tables with all string columns
>                 from spark.
>                  > We are not using the Spark Connector, instead we are
>                 directly writing
>                  > byte arrays from RDDs.
>                  >
>                  > The process works fine, and Hbase receives the data
>                 correctly, and
>                  > content is consistent.
>                  >
>                  > However reading the table from Phoenix, we notice the
>                 first character of
>                  > strings are missing. This sounds like it's a byte
>                 encoding issue, but
>                  > we're at loss. We're using PVarchar to generate bytes.
>                  >
>                  > Here's the snippet of code creating the RDD:
>                  >
>                  > val tdd = pdd.flatMap(x => {
>                  >    val rowKey = PVarchar.INSTANCE.toBytes(x._1)
>                  >    for(i <- 0 until cols.length) yield {
>                  >      other stuff for other columns ...
>                  >      ...
>                  >      (rowKey, (column1, column2, column3))
>                  >    }
>                  > })
>                  >
>                  > ...
>                  >
>                  > We then create the following output to be written
>                 down in Hbase
>                  >
>                  > val output = tdd.map(x => {
>                  >      val rowKeyByte: Array[Byte] = x._1
>                  >      val immutableRowKey = new
>                 ImmutableBytesWritable(rowKeyByte)
>                  >
>                  >      val kv = new KeyValue(rowKeyByte,
>                  >          PVarchar.INSTANCE.toBytes(column1),
>                  >          PVarchar.INSTANCE.toBytes(column2),
>                  >        PVarchar.INSTANCE.toBytes(column3)
>                  >      )
>                  >      (immutableRowKey, kv)
>                  > })
>                  >
>                  > By the way, we are using *KryoSerializer* in order to
>                 be able to
>                  > serialize all classes necessary for Hbase (KeyValue,
>                 BytesWritable, etc).
>                  >
>                  > The key of this table is the one missing data when
>                 queried from Phoenix.
>                  > So we guess something is wrong with the byte ser.
>                  >
>                  > Any ideas? Appreciated!
>                  > Saif
> 
> 

Mime
View raw message