phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Craig Roberts <craig.robe...@frogasia.com>
Subject PySpark and Phoenix Dynamic Columns
Date Fri, 24 Feb 2017 04:56:21 GMT
Hi all,

I've got a (very) basic Spark application in Python that selects some basic
information from my Phoenix table. I can't quite figure out how (or even if
I can) select dynamic columns through this, however.

Here's what I have;

from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext

conf = SparkConf().setAppName("pysparkPhoenixLoad").setMaster("local")
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)

df = sqlContext.read.format("org.apache.phoenix.spark") \
       .option("table", """MYTABLE("dyamic_column" VARCHAR)""") \
       .option("zkUrl", "127.0.0.1:2181:/hbase-unsecure") \
       .load()

df.show()
df.printSchema()


I get a "org.apache.phoenix.schema.TableNotFoundException:" error for the
above.

If I try and load the data frame as a table and query that with SQL:

sqlContext.registerDataFrameAsTable(df, "test")
sqlContext.sql("""SELECT * FROM test("dynamic_column" VARCHAR)""")


I get a bit of a strange exception:

py4j.protocol.Py4JJavaError: An error occurred while calling o37.sql.
: java.lang.RuntimeException: [1.19] failure: ``union'' expected but `('
found

SELECT * FROM test("dynamic_column" VARCHAR)



Does anybody have a pointer on whether this is supported and how I might be
able to query a dynamic column? I haven't found much information on the
wider Internet about Spark + Phoenix integration for this kind of
thing...Simple selects are working. Final note: I have (rather stupidly)
lower-cased my column names in Phoenix, so I need to quote them when I
execute a query (I'll be changing this as soon as possible).

Any assistance would be appreciated :)
*-- Craig*

Mime
View raw message