flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gumnaam Sur <gumnaam....@gmail.com>
Subject HDFS Sink writeformat / filetype / serializer
Date Tue, 31 Jul 2012 14:11:35 GMT
For HDFS Sink we have 3 properties which determine the type and content
that gets written to the file.

writeFomrat = text | writabe
fileType = SequenceFile | DataStream | CompressedStream
serializer = text | avro_event | <custom>

Can one of the devs, explain these in detail, and the output expected by
various permutation / combinations of the 3 values. and if any combination
invalid etc.

e.g. what's the difference between the combo
serializer = avro_event , fileType = SequenceFile
serializer = avro_event , fileType = DataStream

, What's the difference between writeFormat = 'text' and writeFormat =
'writable' ?

To give some background, I am looking to serialize Avro Events, in HDFS in
Sequence file,
and trying to use org.apache.avro.mapreduce.* from my hadoop jobs. I figure
using SequenceFile
should give better performance, over text, but I am not exactly sure of the
various flume options
I mentioned above.


View raw message