flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abraham Fine <...@brightroll.com>
Subject HDFS IO Error
Date Tue, 25 Mar 2014 18:43:44 GMT
Hello-

We have Flume agents running 1.4.0 that sink to HDFS (version 
2.0.0-cdh4.2.1).

Exceptions start occurring at the same time across our Flume agentswhen 
a datanode in HDFS goes down. We did not have this issue whilerunning 
Flume 1.3.

We noticed a similar issue posted on the mailing list 
herehttp://mail-archives.apache.org/mod_mbox/flume-user/201307.mbox/%3CCAPZq-vkmDGptbOWEAF+rE-1neibUtQ36+EHqukn5B7FUM4QAyA@mail.gmail.com%3E

<http://mail-archives.apache.org/mod_mbox/flume-user/201307.mbox/%3CCAPZq-vkmDGptbOWEAF+rE-1neibUtQ36+EHqukn5B7FUM4QAyA@mail.gmail.com%3E>and

on JIRAhttps://issues.apache.org/jira/browse/FLUME-2261 
<https://issues.apache.org/jira/browse/FLUME-2261>but couldnot find a 
solution.

We have noticed the following in the Flume logs:

WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor]
(org.apache.flume.sink.hdfs.HDFSEventSink.process:418)  - HDFS IO
error
java.io.IOException: Callable timed out after 20000 ms on file <FILEPATH> :
         at 
org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:550)
         at 
org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:353)
         at 
org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:319)
         at 
org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:405)
         at 
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
         at 
org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
         at java.lang.Thread.run(Thread.java:662)
Caused by: java.util.concurrent.TimeoutException
         at 
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
         at java.util.concurrent.FutureTask.get(FutureTask.java:91)
         at 
org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:543)
         ... 6 more

This is usually followed by:

WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor]
(org.apache.flume.sink.hdfs.HDFSEventSink.process:418)  - HDFS IO
error
java.io.IOException: This bucket writer was closed due to idling and
this handle is thus no longer valid
         at 
org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:380)
         at 
org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
         at 
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
         at 
org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
         at java.lang.Thread.run(Thread.java:662)

When these exceptions occur, the HDFS sink does not close files. Weoften 
end up with multi-gigabyte files in HDFS.

Our sink configuration:

agentX.sinks.hdfs-sinkX-1.channel = chX
agentX.sinks.hdfs-sinkX-1.type = hdfs
agentX.sinks.hdfs-sinkX-1.hdfs.path = <FILEPATH>
agentX.sinks.hdfs-sinkX-1.hdfs.filePrefix = event
agentX.sinks.hdfs-sinkX-1.hdfs.writeFormat = Text
agentX.sinks.hdfs-sinkX-1.hdfs.rollInterval = 120
agentX.sinks.hdfs-sinkX-1.hdfs.idleTimeout= 180
agentX.sinks.hdfs-sinkX-1.hdfs.rollCount = 0
agentX.sinks.hdfs-sinkX-1.hdfs.rollSize = 0
agentX.sinks.hdfs-sinkX-1.hdfs.fileType = DataStream
agentX.sinks.hdfs-sinkX-1.hdfs.batchSize = 24000
agentX.sinks.hdfs-sinkX-1.hdfs.txnEventSize = 24000
agentX.sinks.hdfs-sinkX-1.hdfs.callTimeout = 20000
agentX.sinks.hdfs-sinkX-1.hdfs.threadsPoolSize = 1


The file paths are unique to each sink.

Thank you for your help.

--
Abraham Fine | Software Engineer
BrightRoll, Inc. | Smart Video Advertising |www.brightroll.com 
<http://www.brightroll.com/>

Mime
View raw message