phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Perko, Ralph J" <>
Subject data ingestion
Date Thu, 09 Oct 2014 14:36:12 GMT
Hi,  What is the best way to ingest large amounts of csv data coming in at regular intervals
(about every 15min for a total of about 500G/daily or 1.5B records/daily) that requires a
few transformations before being inserted?

By transformation I mean the following:
1) 1 field is converted to a timestamp
2) 1 field is parsed to create a new field
3) several fields are combined into 1
4) a couple columns need to be reordered

Is there anyway to make these transformations through the bulk load tool or is MR the best
If I use MR should I go purely through JDBC? Write directly to hbase?  Doing something similar
to the csv bulk load tool (Perhaps even just customizing the CsvBulkLoadTool?) or something
altogether different?


Ralph Perko
Pacific Northwest National Laboratory

View raw message