phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: Read-Write data to/from Phoenix 4.13 or 4.14 with Spark SQL Dataframe 2.1.0
Date Tue, 11 Sep 2018 01:06:26 GMT
Lots of details missing here about how you're trying to submit these 
Spark jobs, but let me try to explain how things work now:

Phoenix provides spark(1) and spark2 jars. These JARs provide the 
implementation for Spark *on top* of what the phoenix-client.jar. You 
want to include both the phoenix-client and relevant phoenix-spark jars 
when you submit your application.

This should be how things are meant to work with Phoenix 4.13 and 4.14. 
If this doesn't help you, please give us some more specifics about the 
commands you run and the output you get. Thanks!

On 9/10/18 6:20 AM, lkyaes wrote:
> Hello !
> 
> I wonder if there any way how to get working Phoenix 4.13 or 4.14 with 
> Spark 2.1.0
> 
> In production we used Spark SQL dataframe to load from and write data to 
> Hbase with Apache Phoenix (Spark 1.6 and Phoenix 4.7) and it worked well.
> 
> After upgrade , we faced an issues with loading and writing, it is not 
> possible anymore.
> 
> Our environment:
> 
> ·Cloudera 5.11.2,
> 
> ·HBase 1.2
> 
> ·Spark 2.1.0(parcel , compatible with Coudera 5.11.2)
> 
> ·APACHE_PHOENIX 4.14.0-cdh5.11.2.p0.3 (we tested 4.13 as well)
> 
> We read/write data by Python (Pyspark library) but the same errors will 
> come also writing in Scala.
> 
> *Read data from Phoenix 4.13 with Spark 2.1.0 error :*
> 
> Py4JJavaError:An error occurred while calling o213.load.
> : java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame
> 
> *Read data from Phoenix 4.14 with Spark 2.1.0 error :*
> 
> Py4JJavaError:An error occurred while calling o89.load. : 
> com.google.common.util.concurrent.ExecutionError: 
> java.lang.NoSuchMethodError: 
> com.lmax.disruptor.dsl.Disruptor.<init>(Lcom/lmax/disruptor/EventFactory;ILjava/util/concurrent/ThreadFactory;Lcom/lmax/disruptor/dsl/ProducerType;Lcom/lmax/disruptor/WaitStrategy;)V
> 
> (Disruptor .jar versions changing - did not solve the issue)
> 
> *Insert data to Phoenix 4.14 with Spark 2.1.0 error:*
> 
> Py4JJavaError:An error occurred while calling o186.save. 
> :java.lang.AbstractMethodError: 
> org.apache.phoenix.spark.DefaultSource.createRelation(Lorg/apache/spark/sql/SQLContext;Lorg/apache/spark/sql/SaveMode;Lscala/collection/immutable/Map;Lorg/apache/spark/sql/Dataset;)Lorg/apache/spark/sql/sources/BaseRelation;
> 
> 
> Actually we areawarethat Spark2 failed to read and write Phoenix due to 
> Spark changing the DataFrame API, as well as a Scala version change, the 
> resultant JAR isn't binary compatible with Spark versions < 2.0.
> 
> *DataFrame class is missing from Spark 2 and *This issues was fixed ONCE 
> by patch for Phoenix versioon 
> 4.10https://issues.apache.org/jira/browse/PHOENIX-3333
> 
> Unfortanatly this patch is not sutable for our enviroment, Could you 
> please comment whether other versions of Phoenix has such fix?
> 
> How to read/write data from Phoenix 4.13/or 4.14 using Spark2?
> 
> Regards and hope for you help,
> Liubov Kyaes
> Data Engineer
> ir.ee <http://ir.ee>
> 
> **//___^
> 

Mime
View raw message