phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: Missing content in phoenix after writing from Spark
Date Mon, 17 Sep 2018 19:22:49 GMT
Please retain the mailing list in your replies.

On 9/17/18 2:32 PM, Saif Addin wrote:
> Thanks for the patience, sorry I sent incomplete information. We are 
> loading the following jars and still getting: */executor 1): 
> java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.phoenix.query.QueryServicesOptions/*
> */
> /*
> http://central.maven.org/maven2/org/apache/hbase/hbase-client/2.1.0/hbase-client-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-common/2.1.0/hbase-common-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-hadoop-compat/2.1.0/hbase-hadoop-compat-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-mapreduce/2.1.0/hbase-mapreduce-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/thirdparty/hbase-shaded-miscellaneous/2.1.0/hbase-shaded-miscellaneous-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-protocol/2.1.0/hbase-protocol-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-protocol-shaded/2.1.0/hbase-protocol-shaded-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/thirdparty/hbase-shaded-protobuf/2.1.0/hbase-shaded-protobuf-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/thirdparty/hbase-shaded-netty/2.1.0/hbase-shaded-netty-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-server/2.1.0/hbase-server-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-hadoop2-compat/2.1.0/hbase-hadoop2-compat-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-metrics/2.1.0/hbase-metrics-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-metrics-api/2.1.0/hbase-metrics-api-2.1.0.jar
> http://central.maven.org/maven2/org/apache/hbase/hbase-zookeeper/2.1.0/hbase-zookeeper-2.1.0.jar
> 
> http://central.maven.org/maven2/org/apache/phoenix/phoenix-spark/5.0.0-HBase-2.0/phoenix-spark-5.0.0-HBase-2.0.jar
> http://central.maven.org/maven2/org/apache/phoenix/phoenix-core/5.0.0-HBase-2.0/phoenix-core-5.0.0-HBase-2.0.jar
> http://central.maven.org/maven2/org/apache/phoenix/phoenix-queryserver/5.0.0-HBase-2.0/phoenix-queryserver-5.0.0-HBase-2.0.jar
> http://central.maven.org/maven2/org/apache/phoenix/phoenix-queryserver-client/5.0.0-HBase-2.0/phoenix-queryserver-client-5.0.0-HBase-2.0.jar
> 
> http://central.maven.org/maven2/org/apache/twill/twill-zookeeper/0.13.0/twill-zookeeper-0.13.0.jar
> http://central.maven.org/maven2/org/apache/twill/twill-discovery-core/0.13.0/twill-discovery-core-0.13.0.jar
> 
> Not sure which one I could be missing
> 
> On Fri, Sep 14, 2018 at 7:34 PM Josh Elser <elserj@apache.org 
> <mailto:elserj@apache.org>> wrote:
> 
>     Uh, you're definitely not using the right JARs :)
> 
>     You'll want the phoenix-client.jar for the Phoenix JDBC driver and the
>     phoenix-spark.jar for the Phoenix RDD.
> 
>     On 9/14/18 1:08 PM, Saif Addin wrote:
>      > Hi, I am attempting to make connection with Spark but no success
>     so far.
>      >
>      > For writing into Phoenix, I am trying this:
>      >
>      > tdd.toDF("ID", "COL1", "COL2",
>      > "COL3").write.format("org.apache.phoenix.spark").option("zkUrl",
>      > "zookeper-host-url:2181").option("table",
>      > htablename).mode("overwrite").save()
>      >
>      > But getting:
>      > *java.sql.SQLException: ERROR 103 (08004): Unable to establish
>     connection.*
>      > *
>      > *
>      > For reading, on the other hand, attempting this:
>      >
>      > val hbConf = HBaseConfiguration.create()
>      > val hbaseSitePath = "/etc/hbase/conf/hbase-site.xml"
>      > hbConf.addResource(new Path(hbaseSitePath))
>      >
>      > spark.sqlContext.phoenixTableAsDataFrame("VISTA_409X68",
>     Array("ID"),
>      > conf = hbConf)
>      >
>      > Gets me
>      > *java.lang.NoClassDefFoundError: Could not initialize class
>      > org.apache.phoenix.query.QueryServicesOptions*
>      > *
>      > *
>      > I have added phoenix-queryserver-5.0.0-HBase-2.0.jar and
>      > phoenix-queryserver-client-5.0.0-HBase-2.0.jar
>      > Any thoughts? I have an hbase-site.xml file with more
>     configuration but
>      > not sure how to get it to be read in the saving instance.
>      > Thanks
>      >
>      > On Thu, Sep 13, 2018 at 11:38 AM Josh Elser <elserj@apache.org
>     <mailto:elserj@apache.org>
>      > <mailto:elserj@apache.org <mailto:elserj@apache.org>>> wrote:
>      >
>      >     Pretty sure we ran tests with Spark 2.3 with Phoenix 5.0. Not
>     sure if
>      >     Spark has already moved beyond that.
>      >
>      >     On 9/12/18 11:00 PM, Saif Addin wrote:
>      >      > Thanks, we'll try Spark Connector then. Thought it didn't
>     support
>      >     newest
>      >      > Spark Versions
>      >      >
>      >      > On Wed, Sep 12, 2018 at 11:03 PM Jaanai Zhang
>      >     <cloud.poster@gmail.com <mailto:cloud.poster@gmail.com>
>     <mailto:cloud.poster@gmail.com <mailto:cloud.poster@gmail.com>>
>      >      > <mailto:cloud.poster@gmail.com
>     <mailto:cloud.poster@gmail.com> <mailto:cloud.poster@gmail.com
>     <mailto:cloud.poster@gmail.com>>>>
>      >     wrote:
>      >      >
>      >      >     It seems columns data missing mapping information of the
>      >     schema. if
>      >      >     you want to use this way to write HBase table,  you
>     can create an
>      >      >     HBase table and uses Phoenix mapping it.
>      >      >
>      >      >     ----------------------------------------
>      >      >         Jaanai Zhang
>      >      >         Best regards!
>      >      >
>      >      >
>      >      >
>      >      >     Thomas D'Silva <tdsilva@salesforce.com
>     <mailto:tdsilva@salesforce.com>
>      >     <mailto:tdsilva@salesforce.com <mailto:tdsilva@salesforce.com>>
>      >      >     <mailto:tdsilva@salesforce.com
>     <mailto:tdsilva@salesforce.com>
>      >     <mailto:tdsilva@salesforce.com
>     <mailto:tdsilva@salesforce.com>>>> 于2018年9月13日周四 上午6:03写道:
>      >      >
>      >      >         Is there a reason you didn't use the
>     spark-connector to
>      >      >         serialize your data?
>      >      >
>      >      >         On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin
>      >     <saif1988@gmail.com <mailto:saif1988@gmail.com>
>     <mailto:saif1988@gmail.com <mailto:saif1988@gmail.com>>
>      >      >         <mailto:saif1988@gmail.com
>     <mailto:saif1988@gmail.com> <mailto:saif1988@gmail.com
>     <mailto:saif1988@gmail.com>>>>
>      >     wrote:
>      >      >
>      >      >             Thank you Josh! That was helpful. Indeed,
>     there was a
>      >     salt
>      >      >             bucket on the table, and the key-column now
shows
>      >     correctly.
>      >      >
>      >      >             However, the problem still persists in that
>     the rest
>      >     of the
>      >      >             columns show as completely empty on Phoenix
>     (appear
>      >      >             correctly on Hbase). We'll be looking into
>     this but
>      >     if you
>      >      >             have any further advice, appreciated.
>      >      >
>      >      >             Saif
>      >      >
>      >      >             On Wed, Sep 12, 2018 at 5:50 PM Josh Elser
>      >      >             <elserj@apache.org <mailto:elserj@apache.org>
>     <mailto:elserj@apache.org <mailto:elserj@apache.org>>
>      >     <mailto:elserj@apache.org <mailto:elserj@apache.org>
>     <mailto:elserj@apache.org <mailto:elserj@apache.org>>>> wrote:
>      >      >
>      >      >                 Reminder: Using Phoenix internals forces
>     you to
>      >      >                 understand exactly how
>      >      >                 the version of Phoenix that you're using
>     serializes
>      >      >                 data. Is there a
>      >      >                 reason you're not using SQL to interact
>     with Phoenix?
>      >      >
>      >      >                 Sounds to me that Phoenix is expecting
>     more data
>      >     at the
>      >      >                 head of your
>      >      >                 rowkey. Maybe a salt bucket that you've
>     defined
>      >     on the
>      >      >                 table but not
>      >      >                 created?
>      >      >
>      >      >                 On 9/12/18 4:32 PM, Saif Addin wrote:
>      >      >                  > Hi all,
>      >      >                  >
>      >      >                  > We're trying to write tables with
all
>     string
>      >     columns
>      >      >                 from spark.
>      >      >                  > We are not using the Spark Connector,
>     instead
>      >     we are
>      >      >                 directly writing
>      >      >                  > byte arrays from RDDs.
>      >      >                  >
>      >      >                  > The process works fine, and Hbase
>     receives the
>      >     data
>      >      >                 correctly, and
>      >      >                  > content is consistent.
>      >      >                  >
>      >      >                  > However reading the table from
Phoenix, we
>      >     notice the
>      >      >                 first character of
>      >      >                  > strings are missing. This sounds
like
>     it's a byte
>      >      >                 encoding issue, but
>      >      >                  > we're at loss. We're using PVarchar
to
>      >     generate bytes.
>      >      >                  >
>      >      >                  > Here's the snippet of code creating
the
>     RDD:
>      >      >                  >
>      >      >                  > val tdd = pdd.flatMap(x => {
>      >      >                  >    val rowKey =
>     PVarchar.INSTANCE.toBytes(x._1)
>      >      >                  >    for(i <- 0 until cols.length)
yield {
>      >      >                  >      other stuff for other columns
...
>      >      >                  >      ...
>      >      >                  >      (rowKey, (column1, column2,
column3))
>      >      >                  >    }
>      >      >                  > })
>      >      >                  >
>      >      >                  > ...
>      >      >                  >
>      >      >                  > We then create the following output
to
>     be written
>      >      >                 down in Hbase
>      >      >                  >
>      >      >                  > val output = tdd.map(x => {
>      >      >                  >      val rowKeyByte: Array[Byte]
= x._1
>      >      >                  >      val immutableRowKey = new
>      >      >                 ImmutableBytesWritable(rowKeyByte)
>      >      >                  >
>      >      >                  >      val kv = new KeyValue(rowKeyByte,
>      >      >                  >         
>     PVarchar.INSTANCE.toBytes(column1),
>      >      >                  >         
>     PVarchar.INSTANCE.toBytes(column2),
>      >      >                  >        PVarchar.INSTANCE.toBytes(column3)
>      >      >                  >      )
>      >      >                  >      (immutableRowKey, kv)
>      >      >                  > })
>      >      >                  >
>      >      >                  > By the way, we are using
>     *KryoSerializer* in
>      >     order to
>      >      >                 be able to
>      >      >                  > serialize all classes necessary
for Hbase
>      >     (KeyValue,
>      >      >                 BytesWritable, etc).
>      >      >                  >
>      >      >                  > The key of this table is the one
>     missing data when
>      >      >                 queried from Phoenix.
>      >      >                  > So we guess something is wrong
with the
>     byte ser.
>      >      >                  >
>      >      >                  > Any ideas? Appreciated!
>      >      >                  > Saif
>      >      >
>      >      >
>      >
> 

Mime
View raw message