flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Shreedharan <hshreedha...@cloudera.com>
Subject Re: HDFS Sink keeps .tmp files and closes with exception
Date Fri, 19 Oct 2012 20:37:00 GMT
Nishant, 

a: if CDH4 was working for you, you could use it with hadoop-2.x or CDH3u5 with hadoop-1.x.

b: Looks like your rollSize/rollCount/rollInterval are all 0. Can you increase rollCount to
say 1000 or so? If you see here: http://flume.apache.org/FlumeUserGuide.html#hdfs-sink, if
you set the roll* configuration params to 0, they would never roll the files. If files are
not rolled, they are not closed, and HDFS will show those as 0-sized files. Once the roll
happens, HDFS GUI will show you the real file size. You can use any one of the three roll*
config parameters to roll the files. 



Thanks,
Hari


-- 
Hari Shreedharan


On Friday, October 19, 2012 at 1:29 PM, Nishant Neeraj wrote:

> Thanks for the responses. 
> 
> a: Got rid of all the CDH stuffs. (basically, started on a fresh AWS instance)
> b: Installed from binary files. 
> 
> It DID NOT work. Here is what I observed: 
> flume-ng version: Flume 1.2.0
> Hadoop: 1.0.4
> 
> This is what my configuration is: 
> 
> agent1.sinks.fileSink1.type = hdfs
> agent1.sinks.fileSink1.channel = memChannel1
> agent1.sinks.fileSink1.hdfs.path = hdfs://localhost:54310/flume/agg1/%y-%m-%d
> agent1.sinks.fileSink1.hdfs.filePrefix = agg2
> agent1.sinks.fileSink1.hdfs.rollInterval = 0
> agent1.sinks.fileSink1.hdfs.rollSize = 0
> agent1.sinks.fileSink1.hdfs.rollCount = 0
> agent1.sinks.fileSink1.hdfs.fileType = DataStream
> agent1.sinks.fileSink1.hdfs.writeFormat = Text
> #agent1.sinks.fileSink1.hdfs.batchSize = 10
> 
> #1: startup error
> -----------------------------------
> With new intallation, I start to find this exception on start of Flume (it does not stop
me from adding data to hdfs)
> 
> 2012-10-19 19:48:32,191 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:70)]
Creating instance of sink: fileSink1, type: hdfs
> 2012-10-19 19:48:32,296 (conf-file-poller-0) [DEBUG - org.apache.hadoop.conf.Configuration.<init>(Configuration.java:227)]
java.io.IOException: config()
> at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:227)
> at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:214)
> at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:184)
> at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:236)
> at org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:516)
> at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:238)
> at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
> at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.loadSinks (PropertiesFileConfigurationProvider.java:373)
> at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load (PropertiesFileConfigurationProvider.java:223)
> at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:123)
> at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
> at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run
(AbstractFileConfigurationProvider.java:202)
> -- snip --
> 
> #2: the old issue continues
> ------------------------------------
> When I start loading source, I see console shows that events gets generated. But HDFS
GUI shows 0KB file with .tmp extention. Adding hdfs.batchSize has no effect, I would assume
this should have flushed the content to the temp file. But no. I tried with smaller and bigger
values of hdfs.batchSize, no effect.
> 
> When I shutdown Flume, I see the data gets purged to the temp file. BUT the temp file
is still holding the .tmp extention. So, basically NO WAY TO HAVE ONE SINGLE AGGRAGATED FILE
of all the logs. If I make the rollSize setting to positive, things start to work, but forfeits
the purpose.  
> 
> Evenwith roll non-zero value, the last file stays as .tmp when I close Flume
> 
> #3: Shutdown throws exception
> ------------------------------------
> Closing flume ends with this excpetion, (the data in the file looks OK, though)
> 
> 2012-10-19 20:07:55,543 (hdfs-fileSink1-call-runner-7) [DEBUG - org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:247)]
Closing hdfs://localhost:54310/flume/agg1/12-10-19/agg2.1350676790623.tmp 
> 2012-10-19 20:07:55,543 (hdfs-fileSink1-call-runner-7) [WARN - org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:253)]
failed to close() HDFSWriter for file (hdfs://localhost:54310/flume/agg1/12-10-19/agg2.1350676790623.tmp).
Exception follows.
> java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:264)
> at org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:74)
> at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3667)
> at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
> at org.apache.flume.sink.hdfs.HDFSDataStream.close(HDFSDataStream.java:103)
> at org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:250)
> at org.apache.flume.sink.hdfs.BucketWriter.access$400(BucketWriter.java:48)
> at org.apache.flume.sink.hdfs.BucketWriter$3.run(BucketWriter.java:236)
> at org.apache.flume.sink.hdfs.BucketWriter$3.run(BucketWriter.java:233)
> at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:125)
> at org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:233)
> at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:747)
> at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:744)
> -- snip --
> 
> 
> Couple of side notes:
> 
> #1: For weird reasons, I did not have to prefix hdfs://localhost:54310 in my previous
config (the one using CDH4 version) and thing were as good as in this installation except
there was not many exceptions. 
> 
> #2: I have 
> java version "1.6.0_24"
> OpenJDK Runtime Environment (IcedTea6 1.11.4) (6b24-1.11.4-1ubuntu0.12.04.1)
> OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
> 
> #3: I did not create a special hadoop:hduser this time. Just dumped the file in $HOME,
changed config files: *-site.xml , -env.sh (http://-env.sh), flume.sh (http://flume.sh). And
exported appropriate variables.
> 
> #4. here is what my config files look like:
> 
> <!-- core-site.xml -->
> <configuration>
> <property>
>   <name>hadoop.tmp.dir</name>
>   <value>/home/ubuntu/hadoop/tmp</value>
>   <description>A base for other temporary directories.</description>
> </property>
> 
> <property> 
>   <name>fs.default.name (http://fs.default.name)</name>
>   <value>hdfs://localhost:54310</value>
> </property>
> </configuration>
> 
> <!-- hdfs-site.xml -->
> <configuration>
> <property>
>   <name>dfs.replication</name>
>   <value>1</value>
> </property>
> </configuration>
> 
> <!-- mapred-site.xml  -->
> 
> <configuration>
> <property>
>   <name>mapred.job.tracker</name>
>   <value>localhost:54311</value>
> </property>
> </configuration>
> 
> #5: /home/ubuntu/hadoop/tmp has chmod 777 (tried 750 as well)
> 
> thanks for your time 
> - Nishant
> 
> On Fri, Oct 19, 2012 at 4:30 AM, Hari Shreedharan <hshreedharan@cloudera.com (mailto:hshreedharan@cloudera.com)>
wrote:
> > Nishant, 
> > 
> > CDH4+ Flume is built against Hadoop-2, and may not work correctly against Hadoop-1.x,
since Hadoop's interfaces changed in the mean time. You could also use Apache Flume-1.2.0
or the upcoming Apache Flume-1.3.0 directly against Hadoop-1.x without issues, as they are
built against Hadoop-1.x.  
> > 
> > 
> > Thanks,
> > Hari
> > 
> > -- 
> > Hari Shreedharan
> > 
> > 
> > On Thursday, October 18, 2012 at 1:18 PM, Nishant Neeraj wrote:
> > 
> > > I am working on a POC using 
> > > > flume-ng version Flume 1.2.0-cdh4.1.1
> > > > Hadoop 1.0.4
> > > > 
> > > The config looks like this
> > > 
> > > #Flume agent configuration
> > > agent1.sources = avroSource1
> > > agent1.sinks = fileSink1
> > > agent1.channels = memChannel1
> > > 
> > > agent1.sources.avroSource1.type = avro 
> > > agent1.sources.avroSource1.channels = memChannel1
> > > agent1.sources.avroSource1.bind = 0.0.0.0
> > > agent1.sources.avroSource1.port = 4545
> > > 
> > > agent1.sources.avroSource1.interceptors = b 
> > > agent1.sources.avroSource1.interceptors.b.type = org.apache.flume.interceptor.TimestampInterceptor$Builder
> > > 
> > > agent1.sinks.fileSink1.type = hdfs
> > > agent1.sinks.fileSink1.channel = memChannel1
> > > agent1.sinks.fileSink1.hdfs.path = /flume/agg1/%y-%m-%d
> > > agent1.sinks.fileSink1.hdfs.filePrefix = agg
> > > agent1.sinks.fileSink1.hdfs.rollInterval = 0
> > > agent1.sinks.fileSink1.hdfs.rollSize = 0
> > > agent1.sinks.fileSink1.hdfs.rollCount = 0
> > > agent1.sinks.fileSink1.hdfs.fileType = DataStream
> > > agent1.sinks.fileSink1.hdfs.writeFormat = Text
> > > 
> > > 
> > > agent1.channels.memChannel1.type = memory 
> > > agent1.channels.memChannel1.capacity = 1000
> > > agent1.channels.memChannel1.transactionCapacity = 1000
> > > 
> > > 
> > > 
> > > Basically, I do not want to roll the file at all. I am just wanting to tail
and watch the show from Hadoop UI. The problem is it does not work. The console keeps saying,

> > > 
> > > agg.1350590350462.tmp 0 KB    2012-10-18 19:59
> > > 
> > > Flume console shows events getting pushes. When I stop the flume,  I see the
file gets populated, but the '.tmp' is still in the file name. And I see this exception on
close. 
> > > 
> > > 2012-10-18 20:06:49,315 (hdfs-fileSink1-call-runner-8) [DEBUG - org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:254)]
Closing /flume/agg1/12-10-18/agg.1350590350462.tmp
> > > 2012-10-18 20:06:49,316 (hdfs-fileSink1-call-runner-8) [WARN - org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:260)]
failed to close() HDFSWriter for file (/flume/agg1/12-10-18/agg.1350590350462.tmp). Exception
follows.
> > > java.io.IOException: Filesystem closed
> > > at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:264)
> > > at org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:74)
> > > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3667)
> > > at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
> > > at org.apache.flume.sink.hdfs.HDFSDataStream.close(HDFSDataStream.java:103)
> > > at org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:257)
> > > at org.apache.flume.sink.hdfs.BucketWriter.access$400(BucketWriter.java:50)
> > > at org.apache.flume.sink.hdfs.BucketWriter$3.run(BucketWriter.java:243)
> > > at org.apache.flume.sink.hdfs.BucketWriter$3.run(BucketWriter.java:240)
> > > at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:127)
> > > at org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:240)
> > > at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:748)
> > > at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:745)
> > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > > at java.lang.Thread.run(Thread.java:679)
> > > 
> > > 
> > > 
> > > Thanks
> > > Nishant
> > > 
> > > 
> > > 
> > 
> > 
> 


Mime
View raw message