From Sagar Mehta <sagarme...@gmail.com>
Subject Question about gzip compression when using Flume Ng
Date Mon, 14 Jan 2013 19:18:38 GMT
Hi Guys,

I'm using Flume Ng and it works great for me. In essence I'm using an exec
source for doing  tail -F on a logfile and using two HDFS sinks using a
File channel. So far so great - Now I'm trying to use gzip compression
using the following config as per the Flume-Ng User guide at

#gzip compression related settings
collector102.sinks.sink1.hdfs.codeC = gzip
collector102.sinks.sink1.hdfs.fileType = CompressedStream
collector102.sinks.sink1.hdfs.fileSuffix = .gz

However this is what looks to be happening

*Flume seems to write gzipped compressed output [I see the .gz files in the
output buckets], however when I try to decompress it - I get an error about
'trailing garbage ignored' and the decompressed output is in fact smaller
in size.*

hadoop@jobtracker301:/home/hadoop/sagar/temp$ ls -ltr
-rw-r--r-- 1 hadoop hadoop *5381235* 2013-01-11 20:44

hadoop@jobtracker301:/home/hadoop/sagar/temp$ gunzip

*gzip: collector102.ngpipes.sac.ngmoco.com.1357936638713.gz: decompression
OK, trailing garbage ignored*
hadoop@jobtracker301:/home/hadoop/sagar/temp$ ls -l

-rw-r--r-- 1 hadoop hadoop *58898* 2013-01-11 20:44 *
*Below are some helpful details.*
*I'm using apache-flume-1.4.0-SNAPSHOT-bin*
smehta@collector102:/opt$ ls -l flume
lrwxrwxrwx 1 root root 31 2012-12-14 00:44 flume ->

*I also have the hadoop-core jar in my path*

smehta@collector102:/opt/flume/lib$ ls -l hadoop-core-0.20.2-cdh3u2.jar
-rw-r--r-- 1 hadoop hadoop 3534499 2012-12-01 01:53
Everything is working well for me except the compression part. I'm not
quite sure what I'm missing here. So while I debug this, any ideas/help is
much appreciated.

Thanks in advance,

