flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jimmy <jimmyj...@gmail.com>
Subject hdfs.fileType = CompressedStream
Date Thu, 30 Jan 2014 18:51:03 GMT
I am running few tests and would like to confirm whether this is true...

hdfs.codeC = gzip
hdfs.fileType = CompressedStream
hdfs.writeFormat = Text
hdfs.batchSize = 100

now lets assume I have large number of transactions I roll file every 10

it seems the tmp file stay 0bytes and flushes at once after 10 minutes vs
if I dont use compression, the file will grow as data are written to HDFS

is this correct?

Do you see any drawback in using compressedstream and with very large
files? In my case 120MB compressed file (block size) is 10x uncompressed

View raw message