phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Jones <>
Subject Re: Phoenix spark and dynamic columns
Date Wed, 27 Jul 2016 16:32:11 GMT

Thank you for your reply. I will take a look at your suggestions.


Hi Paul,

Unfortunately out of the box the Spark integration doesn't support saving to dynamic columns.
It's worth filing a JIRA enhancement over, and if you're interested in contributing a patch,
here's the following spots I think would need enhancing:

The saving code derives the column names to use with Phoenix from the DataFrame itself here
[1] as `fieldArray`. We would likely need a new DataFrame parameter to pass in the column
list (with dynamic columns included) here [2]

The output configuration, which takes care of getting the MapReduce bits ready for saving,
would also need to be updated to support the dynamic column definitions here [3], and then
the 'UPSERT' statement construction would need to be adjusted to support those as well here




On Mon, Jul 25, 2016 at 5:49 PM, Paul Jones <<>>
Is it possible to save a dataframe into a table where the columns are dynamic?

For instance, I have a loaded a CSV file with header (key, cat1, cat2) into a dataframe. All
values are strings. I created a table like this: create table mytable ("KEY" varchar not null
primary key); The code is as follows:

    val df =
        .option("header", "true")
        .option("inferSchema", "true")
        .option("delimiter", "\t")

        .option("table", "mytable")
        .option("zkUrl", "servier:2181/hbase")

The CSV files I process always have a key column but I don’t know what the other columns
will be until I start processing. The code above fails my example unless I create static columns
named cat1 and cat2. Can I change the save somehow to run an upsert specifying the names/column
types thus saving into dynamic columns?

Thanks in advance,

View raw message