flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Workman <justinjwork...@gmail.com>
Subject Re: hdfs.idleTime
Date Thu, 12 Jan 2017 19:23:13 GMT
More details

Flume 1.6 - Core Apache version.
KafkaSource (0.8.2) -> File Channel -> HDFS Sink (CDH5.5.2).

On Thu, Jan 12, 2017 at 12:20 PM, Justin Workman <justinjworkman@gmail.com>

> sorry for cross posting to user and dev. I have recently set up a flume
> configuration where we are using the regex_extractor interceptor to parse
> the actual event date from the record flowing through the Flume source,
> then using that date to build the HDFS sink bucket path. However, it
> appears that the hdfs.idleTimeout value is not honored in this
> configuration. It does work when using the timestamp interceptor you build
> the output path.
> I have set the hdfs.idleTimeout value for the HDFS sink, but the files are
> never closed or renamed until I restart or shutdown Flume. Our flume is
> configured to roll based on size or output path, and the files
> rename/close/roll fine based on size, however the last file in each output
> path is always left with the .tmp extension until we restart Flume. I would
> expect that the file would be renamed and closed if there are no records
> written to this file after the idleTimeout is reached.
> Could I be missing something, or is this a known bug with the
> regex_extract interceptor?
> Thanks
> Justin

View raw message