phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lkyaes <lky...@gmail.com>
Subject Read-Write data to/from Phoenix 4.13 or 4.14 with Spark SQL Dataframe 2.1.0
Date Mon, 10 Sep 2018 10:20:37 GMT
Hello !

I wonder if there any way  how to get working Phoenix 4.13 or 4.14 with
Spark 2.1.0

In production we used Spark SQL dataframe to load from and  write data to
Hbase with Apache Phoenix  (Spark 1.6 and Phoenix 4.7)  and it worked well.

After upgrade , we faced an issues with loading and writing, it is not
possible anymore.

Our environment:

·         Cloudera 5.11.2,

·         HBase 1.2

·         Spark 2.1.0   (parcel , compatible with Coudera 5.11.2)

·         APACHE_PHOENIX  4.14.0-cdh5.11.2.p0.3   (we tested 4.13 as well)



We read/write data by Python (Pyspark library) but the same errors will
come also writing in Scala.

*Read data  from Phoenix 4.13  with Spark 2.1.0 error :*

Py4JJavaError: An error occurred while calling o213.load.
: java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame

*Read data  from Phoenix 4.14  with Spark 2.1.0 error :*

Py4JJavaError: An error occurred while calling o89.load. :
com.google.common.util.concurrent.ExecutionError:
java.lang.NoSuchMethodError:
com.lmax.disruptor.dsl.Disruptor.<init>(Lcom/lmax/disruptor/EventFactory;ILjava/util/concurrent/ThreadFactory;Lcom/lmax/disruptor/dsl/ProducerType;Lcom/lmax/disruptor/WaitStrategy;)V

(Disruptor .jar versions changing -  did not solve the issue)

*Insert data to Phoenix 4.14  with Spark 2.1.0  error:*

Py4JJavaError: An error occurred while calling o186.save. :
java.lang.AbstractMethodError:
org.apache.phoenix.spark.DefaultSource.createRelation(Lorg/apache/spark/sql/SQLContext;Lorg/apache/spark/sql/SaveMode;Lscala/collection/immutable/Map;Lorg/apache/spark/sql/Dataset;)Lorg/apache/spark/sql/sources/BaseRelation;



Actually we are aware that  Spark2 failed to read and write Phoenix  due to
Spark changing the DataFrame API, as well as a Scala version change, the
resultant JAR isn't binary compatible with Spark versions < 2.0.

*DataFrame class is missing from Spark 2 and *This issues was fixed ONCE  by
patch for Phoenix versioon 4.10
https://issues.apache.org/jira/browse/PHOENIX-3333

Unfortanatly this patch is not sutable for our enviroment, Could you please
comment whether other versions of Phoenix has such fix?

How to read/write data from Phoenix 4.13/or 4.14 using Spark2?

Regards and hope for you help,
Liubov Kyaes
Data Engineer
ir.ee

Mime
View raw message