phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Perko, Ralph J" <Ralph.Pe...@pnnl.gov>
Subject data ingestion
Date Thu, 09 Oct 2014 14:36:12 GMT
Hi,  What is the best way to ingest large amounts of csv data coming in at regular intervals
(about every 15min for a total of about 500G/daily or 1.5B records/daily) that requires a
few transformations before being inserted?

By transformation I mean the following:
1) 1 field is converted to a timestamp
2) 1 field is parsed to create a new field
3) several fields are combined into 1
4) a couple columns need to be reordered

Is there anyway to make these transformations through the bulk load tool or is MR the best
route?
If I use MR should I go purely through JDBC? Write directly to hbase?  Doing something similar
to the csv bulk load tool (Perhaps even just customizing the CsvBulkLoadTool?) or something
altogether different?

Thanks!
Ralph

__________________________________________________
Ralph Perko
Pacific Northwest National Laboratory


Mime
View raw message