Have you tried increasing the HDFS sink timeouts?

--
Chris Horrocks


On Wed, Jul 20, 2016 at 8:03 am, no jihun <'jeesim2@gmail.com'> wrote:
Hi.

I found some files on hdfs left as OPEN_FOR_WRITE state.

This is flume's log about the file.


01  18 7 2016 16:12:02,765 INFO  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:234)
02 - Creating 1468825922758.avro.tmp

03  18 7 2016 16:22:39,812 INFO  [hdfs-hdfs2-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter$5.call:429)
04 - Closing idle bucketWriter 1468825922758.avro.tmp at 1468826559812

05  18 7 2016 16:22:39,812 INFO  [hdfs-hdfs2-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter.close:363)
06 - Closing 1468825922758.avro.tmp

07  18 7 2016 16:22:49,813 WARN  [hdfs-hdfs2-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter.close:370)
08 - failed to close() HDFSWriter for file (1468825922758.avro.tmp). Exception follows.
09 java.io.IOException: Callable timed out after 10000 ms on file: 1468825922758.avro.tmp

10  18 7 2016 16:22:49,816 INFO  [hdfs-hdfs2-call-runner-7] (org.apache.flume.sink.hdfs.BucketWriter$8.call:629) 
11 - Renaming 1468825922758.avro.tmp to 1468825922758.avro

- seems close never retried
- flume just renamed which still opened.


2 day later I've found that file by this command

hdfs fsck /data/flume -openforwrite | grep "OPENFORWRITE" | grep "2016/07/18" | sed 's//data/flume// /data/flume//g' | grep -v ".avro.tmp" | sed -n 's/.*(/data/flume/.*avro).*//p'


So, reverseLease-ed

hdfs debug recoverLease -path 1468825922758.avro -retries 3
recoverLease returned false.
Retrying in 5000 ms...
Retry #1 
recoverLease SUCCEEDED on 1468825922758.avro 


My hdfs sink configuration

hadoop2.sinks.hdfs2.type = hdfs
hadoop2.sinks.hdfs2.channel = fileCh1
hadoop2.sinks.hdfs2.hdfs.fileType = DataStream
hadoop2.sinks.hdfs2.serializer = ....
hadoop2.sinks.hdfs2.serializer.compressionCodec = snappy
hadoop2.sinks.hdfs2.hdfs.filePrefix = %{type}_%Y-%m-%d_%{host}
hadoop2.sinks.hdfs2.hdfs.fileSuffix = .avro
hadoop2.sinks.hdfs2.hdfs.rollInterval = 3700
#hadoop2.sinks.hdfs2.hdfs.rollSize = 67000000
hadoop2.sinks.hdfs2.hdfs.rollSize = 800000000
hadoop2.sinks.hdfs2.hdfs.rollCount = 0 
hadoop2.sinks.hdfs2.hdfs.batchSize = 10000
hadoop2.sinks.hdfs2.hdfs.idleTimeout = 300

hdfs.closeTries, retryInterval both not set. 


My question is  
why '1468825922758.avro' left OPEN_FOR_WRITE? even though renamed to .avro succesufully.
Is this expected behavior? so , what should I do to eliminate these anomal OPENFORWRITE files?

Regards,
Jihun.