More details

Flume 1.6 - Core Apache version.
KafkaSource (0.8.2) -> File Channel -> HDFS Sink (CDH5.5.2).

On Thu, Jan 12, 2017 at 12:20 PM, Justin Workman <justinjworkman@gmail.com> wrote:
sorry for cross posting to user and dev. I have recently set up a flume configuration where we are using the regex_extractor interceptor to parse the actual event date from the record flowing through the Flume source, then using that date to build the HDFS sink bucket path. However, it appears that the hdfs.idleTimeout value is not honored in this configuration. It does work when using the timestamp interceptor you build the output path.

I have set the hdfs.idleTimeout value for the HDFS sink, but the files are never closed or renamed until I restart or shutdown Flume. Our flume is configured to roll based on size or output path, and the files rename/close/roll fine based on size, however the last file in each output path is always left with the .tmp extension until we restart Flume. I would expect that the file would be renamed and closed if there are no records written to this file after the idleTimeout is reached.

Could I be missing something, or is this a known bug with the regex_extract interceptor?

Thanks
Justin