flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nishant Neeraj <nishant.has.a.quest...@gmail.com>
Subject HDFS Sink keeps .tmp files and closes with exception
Date Thu, 18 Oct 2012 20:18:20 GMT
I am working on a POC using

flume-ng version Flume 1.2.0-cdh4.1.1
Hadoop 1.0.4

The config looks like this

#Flume agent configuration
agent1.sources = avroSource1
agent1.sinks = fileSink1
agent1.channels = memChannel1

agent1.sources.avroSource1.type = avro
agent1.sources.avroSource1.channels = memChannel1
agent1.sources.avroSource1.bind = 0.0.0.0
agent1.sources.avroSource1.port = 4545

agent1.sources.avroSource1.interceptors = b
agent1.sources.avroSource1.interceptors.b.type =
org.apache.flume.interceptor.TimestampInterceptor$Builder

agent1.sinks.fileSink1.type = hdfs
agent1.sinks.fileSink1.channel = memChannel1
agent1.sinks.fileSink1.hdfs.path = /flume/agg1/%y-%m-%d
agent1.sinks.fileSink1.hdfs.filePrefix = agg
agent1.sinks.fileSink1.hdfs.rollInterval = 0
agent1.sinks.fileSink1.hdfs.rollSize = 0
agent1.sinks.fileSink1.hdfs.rollCount = 0
agent1.sinks.fileSink1.hdfs.fileType = DataStream
agent1.sinks.fileSink1.hdfs.writeFormat = Text


agent1.channels.memChannel1.type = memory
agent1.channels.memChannel1.capacity = 1000
agent1.channels.memChannel1.transactionCapacity = 1000


Basically, I do not want to roll the file at all. I am just wanting to tail
and watch the show from Hadoop UI. The problem is it does not work. The
console keeps saying,

agg.1350590350462.tmp 0 KB    2012-10-18 19:59

Flume console shows events getting pushes. When I stop the flume,  I see
the file gets populated, but the '.tmp' is still in the file name. And I
see this exception on close.

2012-10-18 20:06:49,315 (hdfs-fileSink1-call-runner-8) [DEBUG -
org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:254)]
Closing /flume/agg1/12-10-18/agg.1350590350462.tmp
2012-10-18 20:06:49,316 (hdfs-fileSink1-call-runner-8) [WARN -
org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:260)]
failed to close() HDFSWriter for file
(/flume/agg1/12-10-18/agg.1350590350462.tmp). Exception follows.
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:264)
at org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:74)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3667)
at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
at org.apache.flume.sink.hdfs.HDFSDataStream.close(HDFSDataStream.java:103)
at org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:257)
at org.apache.flume.sink.hdfs.BucketWriter.access$400(BucketWriter.java:50)
at org.apache.flume.sink.hdfs.BucketWriter$3.run(BucketWriter.java:243)
at org.apache.flume.sink.hdfs.BucketWriter$3.run(BucketWriter.java:240)
at
org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:127)
at org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:240)
at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:748)
at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:745)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)


Thanks
Nishant

Mime
View raw message