phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Mahonin <jmaho...@gmail.com>
Subject Re: PHOENIX SPARK - Load Table as DataFrame
Date Wed, 18 May 2016 13:28:09 GMT
Hi Mohan,

Generally speaking, you can treat the Phoenix RDD / DataFrames the same as
any other type, regardless of the source. If you look at the Spark
programming guide [1], they have great documentation on how and when data
is loaded into memory.

Josh

[1]
http://spark.apache.org/docs/latest/programming-guide.html#rdd-operations

On Tue, May 17, 2016 at 6:21 AM, Mohanraj Ragupathiraj <
mohanaugust@gmail.com> wrote:

> I have created a DataFrame from a HBase Table (PHOENIX) which has 500
> million rows. From the DataFrame I created an RDD of JavaBean and use it
> for joining with data from a file.
>
> Map<String, String> phoenixInfoMap = new HashMap<String, String>();
> phoenixInfoMap.put("table", tableName);
> phoenixInfoMap.put("zkUrl", zkURL);
> DataFrame df =
> sqlContext.read().format("org.apache.phoenix.spark").options(phoenixInfoMap).load();
> JavaRDD<Row> tableRows = df.toJavaRDD();
> JavaPairRDD<String, AccountModel> dbData = tableRows.mapToPair(
> new PairFunction<Row, String, String>()
> {
> @Override
> public Tuple2<String, String> call(Row row) throws Exception
> {
> return new Tuple2<String, String>(row.getAs("ID"), row.getAs("NAME"));
> }
> });
>
> Now my question - Lets say the file has 2 unique million entries matching
> with the table. Is the entire table loaded into memory as RDD or only the
> matching 2 million records from the table will be loaded into memory as RDD
> ?
>
> --
> Thanks and Regards
> Mohan
>

Mime
View raw message