phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Mahonin <>
Subject Re: Custom Connector for Prestodb
Date Thu, 17 Aug 2017 14:22:18 GMT
Hi Luqman,

I just responded to another query on the list about phoenix-spark that may
help shed some light. In addition, the preferred locations the
phoenix-spark connector exposes are determined in the general
PhoenixInputFormat MapReduce code [1]

I'm not very familiar with PrestoDB, but if it's able to load data using a
general Hadoop InputFormat, the PhoenixInputFormat would be a good place to
start looking.



On Thu, Aug 17, 2017 at 5:46 AM, Luqman Ghani <> wrote:

> Hi,
> We are evaluating the possibility of writing a custom connector for
> Phoenix to access tables in stored in HBase. However, we need some help.
> The connector for Presto should be able to read from HBase cluster using
> parallel collections. For that the connector has a "ConnectorSplitManager"
> which needs to be implemented. To quote from here
> <>:
> "
> The split manager partitions the data for a table into the individual
> chunks that Presto will distribute to workers for processing. For example,
> the Hive connector lists the files for each Hive partition and creates one
> or more split per file. For data sources that don’t have partitioned data,
> a good strategy here is to simply return a single split for the entire
> table. This is the strategy employed by the Example HTTP connector.
> "
> I want to know if there's a way to implement Split Manager so that the
> data in HBase can be accessed by parallel connections. I was trying to
> follow the code for Phoenix-Spark connector
> <>
> see how it decides getPreferredLocations to create splits, but couldn't
> understand.
> Any hints or code directions will be helpful.
> Regards,
> Luqman

View raw message