flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Snehal Nagmote <nagmote.sne...@gmail.com>
Subject Flume HDFS Sink Issue (IO Exception - hdfs.DFSClient$DFSOutputStream.sync)
Date Tue, 26 Nov 2013 19:07:25 GMT
Hello All,

We are using HDFS sink with Flume and it goes into HDFS IO Exception very
often .

I am using apache Flume HDP 1.4.0. we have two tier topology and Collector
is not on datanode ,Collector fails often and it
throws  java.io.IOException: DFSOutputStream is closed

java.io.IOException: DFSOutputStream is closed
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:4097)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:4084)
at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:117)
at org.apache.flume.sink.hdfs.BucketWriter$5.call(BucketWriter.java:356)
at org.apache.flume.sink.hdfs.BucketWriter$5.call(BucketWriter.java:353)
at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:536)
at
org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:160)
at org.apache.flume.sink.hdfs.BucketWriter.access$1000(BucketWriter.java:56)
at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:533)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

This is how configuration looks like


agent.sinks.hdfs-sink.type = hdfs
agent.sinks.hdfs-sink.hdfs.filePrefix = %Y%m%d%H-events-1
agent.sinks.hdfs-sink.hdfs.path = hdfs://
bi-hdnn01.sjc.kixeye.com:8020/flume/logs/%Y%m%d/%H/
agent.sinks.hdfs-sink.hdfs.fileSuffix = .done
agent.sinks.hdfs-sink.hdfs.fileType =DataStream
agent.sinks.hdfs-sink.hdfs.writeFormat = Text
agent.sinks.hdfs-sink.hdfs.rollInterval = 0
agent.sinks.hdfs-sink.hdfs.rollSize = 0
agent.sinks.hdfs-sink.hdfs.rollCount = 0
agent.sinks.hdfs-sink.hdfs.batchSize = 10000
agent.sinks.hdfs-sink.hdfs.threadsPoolSize=10000
agent.sinks.hdfs-sink.hdfs.rollTimerPoolSize=10
agent.sinks.hdfs-sink.hdfs.callTimeout = 500000


Earlier , I was using rollInterval=30 , I changed it to 0 because of above
exception and then I started seeing new exception.

 Failed to renew lease for [DFSClient_NONMAPREDUCE_1307546979_31] for 30
seconds.  Will retry shortly ...
java.io.IOException: Call to
bi-hdnn01.sjc.kixeye.com/10.54.208.14:8020failed on local exception:
java.io.IOException:

Caused by: java.io.IOException: Connection reset by peer


Because of these exception , our production downstream process gets lot
slower and need frequent restarts and upstream process fills channels ,
Does anyone know , what could be the cause and how we can avoid this ?

Any thoughts would be really helpful , its been extremely difficult to
debug this


Thanks,
Snehal

Mime
View raw message