phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Mahonin <jmaho...@gmail.com>
Subject Re: Custom Connector for Prestodb
Date Thu, 17 Aug 2017 14:22:18 GMT
Hi Luqman,

I just responded to another query on the list about phoenix-spark that may
help shed some light. In addition, the preferred locations the
phoenix-spark connector exposes are determined in the general
PhoenixInputFormat MapReduce code [1]

I'm not very familiar with PrestoDB, but if it's able to load data using a
general Hadoop InputFormat, the PhoenixInputFormat would be a good place to
start looking.

Josh

[1]
https://github.com/apache/phoenix/blob/5b099014446865c12779f3882fd8b407496717ea/phoenix-hive/src/main/java/org/apache/phoenix/hive/mapreduce/PhoenixInputFormat.java#L177-L178



On Thu, Aug 17, 2017 at 5:46 AM, Luqman Ghani <lgsahaf@gmail.com> wrote:

> Hi,
>
> We are evaluating the possibility of writing a custom connector for
> Phoenix to access tables in stored in HBase. However, we need some help.
>
> The connector for Presto should be able to read from HBase cluster using
> parallel collections. For that the connector has a "ConnectorSplitManager"
> which needs to be implemented. To quote from here
> <https://prestodb.io/docs/current/develop/connectors.html>:
> "
> The split manager partitions the data for a table into the individual
> chunks that Presto will distribute to workers for processing. For example,
> the Hive connector lists the files for each Hive partition and creates one
> or more split per file. For data sources that don’t have partitioned data,
> a good strategy here is to simply return a single split for the entire
> table. This is the strategy employed by the Example HTTP connector.
> "
>
> I want to know if there's a way to implement Split Manager so that the
> data in HBase can be accessed by parallel connections. I was trying to
> follow the code for Phoenix-Spark connector
> <https://github.com/apache/phoenix/blob/master/phoenix-spark/src/main/scala/org/apache/phoenix/spark/PhoenixRDD.scala>
to
> see how it decides getPreferredLocations to create splits, but couldn't
> understand.
>
> Any hints or code directions will be helpful.
>
> Regards,
> Luqman
>

Mime
View raw message