phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Mahonin <jmaho...@gmail.com>
Subject Re: phoenix spark options not supporint query in dbtable
Date Thu, 09 Jun 2016 18:01:19 GMT
They're effectively the same code paths. However, I'd recommend using the
Data Frame API unless you have a specific need to pass in a custom
Configuration object.

The Data Frame API has bindings in Scala, Java and Python, so that's
another advantage. The phoenix-spark docs have a PySpark example, but it's
applicable to Java (and Scala) as well.

Josh

On Thu, Jun 9, 2016 at 1:02 PM, Long, Xindian <Xindian.Long@sensus.com>
wrote:

> Hi, Josh:
>
>
>
> Thanks for the answer. Do you know the underlining difference between the
> following two ways of Loading a Dataframe? (using the Data Source API, or
> Load as a DataFrame directly using a Configuration object)
>
>
>
> Is there a  Java interface to use the functionality of
> phoenixTableAsDataFrame, saveToPhoenix ?
>
>
>
> Thanks
>
>
>
> Xindian
>
>
> Load as a DataFrame using the Data Source API
>
> *import org.apache.spark.SparkContext*
>
> *import org.apache.spark.sql.SQLContext*
>
> *import org.apache.phoenix.spark._*
>
>
>
> *val sc = new SparkContext("local", "phoenix-test")*
>
> *val sqlContext = new SQLContext(sc)*
>
>
>
> *val df = sqlContext.load(*
>
> *  "org.apache.phoenix.spark",*
>
> *  Map("table" -> "TABLE1", "zkUrl" -> "phoenix-server:2181")*
>
> *)*
>
>
>
> *df*
>
> *  .filter(df("COL1") === "test_row_1" && df("ID") === 1L)*
>
> *  .select(df("ID"))*
>
> *  .show*
> Or Load Load as a DataFrame directly using a Configuration object
>
> *import org.apache.hadoop.conf.Configuration*
>
> *import org.apache.spark.SparkContext*
>
> *import org.apache.spark.sql.SQLContext*
>
> *import org.apache.phoenix.spark._*
>
>
>
> *val configuration = new Configuration()*
>
> *// Can set Phoenix-specific settings, requires 'hbase.zookeeper.quorum'*
>
>
>
> *val sc = new SparkContext("local", "phoenix-test")*
>
> *val sqlContext = new SQLContext(sc)*
>
>
>
> *// Load the columns 'ID' and 'COL1' from TABLE1 as a DataFrame*
>
> *val df = sqlContext.phoenixTableAsDataFrame(*
>
> *  "TABLE1", Array("ID", "COL1"), conf = configuration*
>
> *)*
>
>
>
> *df.show*
>
>
>
>
>
>
> *From:* Josh Mahonin [mailto:jmahonin@gmail.com]
> *Sent:* 2016年6月9日 9:44
> *To:* user@phoenix.apache.org
> *Subject:* Re: phoenix spark options not supporint query in dbtable
>
>
>
> Hi Xindian,
>
>
>
> The phoenix-spark integration is based on the Phoenix MapReduce layer,
> which doesn't support aggregate functions. However, as you mentioned, both
> filtering and pruning predicates are pushed down to Phoenix. With an RDD or
> DataFrame loaded, all of Spark's various aggregation methods are available
> to you.
>
>
>
> Although the Spark JDBC data source supports the full complement of
> Phoenix's supported queries, the way it achieves parallelism is to split
> the query across a number of workers and connections based on a
> 'partitionColumn' with a 'lowerBound' and 'upperBound', which must be
> numeric. If your use case has numeric primary keys, then that is
> potentially a good solution for you. [1]
>
>
>
> The phoenix-spark parallelism is based on the splits provided by the
> Phoenix query planner, and has no requirements on specifying partition
> columns or upper/lower bounds. It's up to you to evaluate which technique
> is the right method for your use case. [2]
>
>
>
> Good luck,
>
>
>
> Josh
>
> [1]
> http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases
>
> [2] https://phoenix.apache.org/phoenix_spark.html
>
>
>
>
>
> On Wed, Jun 8, 2016 at 6:01 PM, Long, Xindian <Xindian.Long@sensus.com>
> wrote:
>
> The Spark JDBC data source supports to specify a query as the  “dbtable”
> option.
>
> I assume all queries in the above query in pushed down to the database
> instead of done in Spark.
>
>
>
> The  phoenix spark plug in seems not supporting that. Why is that? Any
> plan in the future to support it?
>
>
>
> I know phoenix spark does support an optional select clause and predicate
> push down in some cases, but it is limited.
>
>
>
> Thanks
>
>
>
> Xindian
>
>
>
>
>
> -------------------------------------------
>
> Xindian “Shindian” Long
>
> Mobile:  919-9168651
>
> Email: Xindian.Long@gmail.com
>
>
>
>
>
>
>
>
>

Mime
View raw message