flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pankaj Gupta <pan...@brightroll.com>
Subject HDFS Sink log rotation on the basis of time of writing
Date Fri, 02 Nov 2012 23:08:36 GMT

Is it possible to organize files written to HDFS into buckets based on the
time of writing rather than the timestamp in the header? Alternatively, is
it possible to insert the timestamp injector just before the HDFS Sink?

My use case is  to organize files such that they are organized
chronologically as well as alphabetically by name and that there is only
one file being written to at a time. This will make it easier to look for
newly available data so that MapReduce jobs can process them.

Thanks in Advance,

View raw message