Yeah I have tried the text write format in vain before, but nevertheless gave it a try again!! Below is the latest file - still the same thing.

hadoop@jobtracker301:/home/hadoop/sagar/debug$ date
Mon Jan 14 23:02:07 UTC 2013

hadoop@jobtracker301:/home/hadoop/sagar/debug$ hls /ngpipes-raw-logs/2013-01-14/2200/
Found 1 items
-rw-r--r--   3 hadoop supergroup    4798117 2013-01-14 22:55 /ngpipes-raw-logs/2013-01-14/2200/

hadoop@jobtracker301:/home/hadoop/sagar/debug$ hget /ngpipes-raw-logs/2013-01-14/2200/ .
hadoop@jobtracker301:/home/hadoop/sagar/debug$ gunzip 

gzip: decompression OK, trailing garbage ignored

Interestingly enough, the gzip page says it is a harmless warning -

However, I'm losing events on decompression so I cannot afford to ignore this warning. The gzip page gives an example about magnetic tape - there is an analogy of hdfs block here since the file is initially stored in hdfs before I pull it out on the local filesystem.


On Mon, Jan 14, 2013 at 2:52 PM,
collector102.sinks.sink1.hdfs.writeFormat = TEXT
collector102.sinks.sink2.hdfs.writeFormat = TEXT