flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carlos Rojas Matas <cma...@despegar.com>
Subject Wrong disk space in HDFS
Date Wed, 07 Oct 2015 14:07:39 GMT
Hi guys,

we're facing a problem with HDFS Sink. We're rolling files in an daily
basis into HDFS and after a while we're receiving lack of space warnings.
Then we restart the HDFS cluster and the available space gets reported fine
again. Even if we kill the agent the remaining space stills wrong.  It's
like flume is reserving space and in some way not releasing it afterwards.
Even if we span df or du SO commands the space seems to be reporting wrong.

The configuration we're using is as follows:

##sink
p13nAgent.sinks.hdfsSink.type=hdfs
p13nAgent.sinks.hdfsSink.hdfs.minBlockReplicas=1
p13nAgent.sinks.hdfsSink.channel=mainChannel
p13nAgent.sinks.hdfsSink.hdfs.fileType=DataStream
p13nAgent.sinks.hdfsSink.hdfs.filePrefix=$HOST_PREFIX
p13nAgent.sinks.hdfsSink.hdfs.fileSuffix=.avro
p13nAgent.sinks.hdfsSink.hdfs.path=$HDFS_PATH/p13n-storage/$ENV/%{topic}/%Y/%m/%d

#The roll size must be 297M because the snappy compression rate is aprox 43%
p13nAgent.sinks.hdfsSink.hdfs.rollSize	=	312134251
p13nAgent.sinks.hdfsSink.hdfs.rollCount	=	0
p13nAgent.sinks.hdfsSink.hdfs.rollInterval	=	0
p13nAgent.sinks.hdfsSink.hdfs.idleTimeout	=	300
p13nAgent.sinks.hdfsSink.hdfs.maxOpenFiles	=	1000
#p13nAgent.sinks.hdfsSink.hdfs.round	=	true
#p13nAgent.sinks.hdfsSink.hdfs.roundUnit	=	hour
#p13nAgent.sinks.hdfsSink.hdfs.roundValue	=	24
#p13nAgent.sinks.hdfs.hdfs.writeFormat=Text
p13nAgent.sinks.hdfs.hdfs.batchSize=5000
p13nAgent.sinks.hdfsSink.serializer=com.despegar.p13n.flume.avro.serializer.FlumeAvroEventSerializer$Builder
p13nAgent.sinks.hdfsSink.serializer.compressionCodec=snappy


Any clue will be welcomed.

Thanks in advance,

-carlos.

Mime
View raw message