flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas.B...@continental-corporation.com
Subject Re: Re: How to customize the key in a HDFS SequenceFile sink
Date Tue, 08 Sep 2015 12:14:11 GMT
Von:    Gonzalo Herreros <gherreros@gmail.com>
An:     user@flume.apache.org, 
Datum:  08.09.2015 09:29
Betreff:        Re: How to customize the key in a HDFS SequenceFile sink

Thanks for your prompt reply. May I ask you to give me some more details. 
I'm a little confused as I've read that the "hdfs.serializer" parameter is 
ignored when using sequence files.
Does it mean that my custom serializer is responsible for writing 
"correct" SequenceFiles (e.g. using "createWriter" of 
org.apache.hadoop.io.SequenceFile)?

I assume that I have to do the following (see pseudocode below):

1)
agent configuration:
hdfs.fileType = DataStream
hdfs.serializer = MyBuilder


2)
public class MySerializer implements EventSerializer {
  customize the key and writing to the outputStream using the createWriter 
method 
}

3)
public static class MyBuilder implements EventSerializer.Builder {
  return new MySerializer(context, os)
}

Thanks a lot for your support.


I would implement a custom serializer and configure it in the standard 
Hdfs sink.
That way you control how you build the key for each event.

Regards,
Gonzalo

On 8 September 2015 at 06:42, <Thomas.Beer@continental-corporation.com> 
wrote:

Hello,

I'm using Flume's HDFS SequenceFile sink for writing data to HDFS. I'm
looking for a possibility to create "custom keys". Per default, Flume is
using the Timestamp as key within a SequenceFile. However, in my usecase I
would like to use a customized string as key (instead of the timestamp).

What are best practices for implementing/configuring such a "custom key"
within Flume?

Best, Thomas



Mime
View raw message