phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Reid <>
Subject Re: bulk loading with dynamic columns
Date Fri, 17 Oct 2014 06:53:46 GMT

Going via Scalding sounds like a fine idea as well -- the advantage of
using Pig is that you wouldn't need to implement anything custom in
terms of JDBC handling (because it already exists), but indeed I would
expect that you'll get comparable performance with Scalding.

If you want to generate HFiles, I would indeed look at extending (or
reusing parts of) the CsvBulkLoadTool, which currently creates HFiles.
However, I would definitely only go this way as a fallback if JDBC
performance isn't sufficient.

I actually never really considered a CSV file being dynamic, I was
more thinking along the lines of loading CSV files with different
schemas into the same table (via dynamic columns). If it's at all an
option, I would suggest splitting out records by schema first in a
pre-processing stage, and then loading the collection of files that
match a single schema together. CSV is a fine format for really simple
schemas, but I don't think it would be at all suited to storing
records with different schemas.

- Gabriel

On Fri, Oct 17, 2014 at 8:27 AM, Bob Dole <> wrote:
> Gabriel,
> Thanks for your response. My current plan is to implement the bulk load
> using scalding via jdbc. I have not played with Pig, but, my guess is my
> scalding solution will achieve comparable performance.
> I haven't done a performance test yet, but, if it turns out that loading via
> jdbc is too slow, I would need to generate the HFiles.
> I would be interested in your thoughts on how you'd approach generating
> hfiles. Would you extend the csv bulk loader? How would you represent
> dynamic columns in a csv? A general solution is also further complicated by
> the fact that a dynamic column may have heterogeneous types.
> -Bob
> On Thursday, October 16, 2014 12:24 AM, Gabriel Reid
> <> wrote:
> Hi Bob,
> No, there currently isn't any support for bulk loading dynamic columns.
> I think that this would (in theory) be as simple as supplying a custom
> upsert statement to the bulk loader or PhoenixHBaseStorage (if you're
> using Pig), so it probably wouldn't be too tricky to implement.
> If you're interested in having something like this in Phoenix, could
> you log a ticket for it at
> If you're interested in
> taking a crack at implementing it as well, feel free (as well as
> feeling free to ask for advice on how to go about it).
> - Gabriel
> On Thu, Oct 16, 2014 at 7:58 AM, Bob Dole <> wrote:
>> Is there any existing support perform bulk loading with dynamic columns?
>> Thanks!

View raw message