Hi,Phoenix is able to parallelize queries based on the underlying HBase region splits, as well as its own internal guideposts based on statistics collection 
The phoenix-spark connector exposes those splits to Spark for the RDD / DataFrame parallelism. In order to test this out, you can try run an EXPLAIN SELECT... query  to mimic the DataFrame load to see how many parallel scans will be run, and then compare those to the RDD / DataFrame partition count (some_rdd.partitions.size). In Phoenix 4.10 and above , they will be the same. In versions below that, the partition count will equal the number of regions for that table.On Thu, Aug 17, 2017 at 3:07 AM, Kanagha <email@example.com> wrote:Also, I'm using phoenixTableAsDataFrame API to read from a pre-split phoenix table. How can we ensure read is parallelized across all executors? Would salting/pre-splitting tables help in providing parallelism? Appreciate any inputs.ThanksKanaghaOn Wed, Aug 16, 2017 at 10:16 PM, kanagha <firstname.lastname@example.org> wrote:Hi Josh,
Per your previous post, it is mentioned "The phoenix-spark parallelism is
based on the splits provided by the Phoenix query planner, and has no
requirements on specifying partition columns or upper/lower bounds."
Does it depend upon the region splits on the input table for parallelism?
Could you please provide more details?
View this message in context: http://apache-phoenix-user-lis
t.1124778.n5.nabble.com/phoeni x-spark-options-not-supporint- query-in-dbtable-tp1915p3810.h tml
Sent from the Apache Phoenix User List mailing list archive at Nabble.com.