Hi,
For HDFS Sink we have 3 properties which determine the type and content
that gets written to the file.
writeFomrat = text | writabe
fileType = SequenceFile | DataStream | CompressedStream
serializer = text | avro_event | <custom>
Can one of the devs, explain these in detail, and the output expected by
various permutation / combinations of the 3 values. and if any combination
is
invalid etc.
e.g. what's the difference between the combo
serializer = avro_event , fileType = SequenceFile
and
serializer = avro_event , fileType = DataStream
, What's the difference between writeFormat = 'text' and writeFormat =
'writable' ?
To give some background, I am looking to serialize Avro Events, in HDFS in
Sequence file,
and trying to use org.apache.avro.mapreduce.* from my hadoop jobs. I figure
using SequenceFile
should give better performance, over text, but I am not exactly sure of the
various flume options
I mentioned above.
thanks
|