flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas.B...@continental-corporation.com
Subject Antwort: Re: Re: How to customize the key in a HDFS SequenceFile sink
Date Thu, 10 Sep 2015 13:49:08 GMT
Thanks a lot. Your proposed solution worked perfectly fine. Here are some 
further details about my implementation (maybe someone is interested in): 
http://stackoverflow.com/questions/32440926/flume-how-to-create-a-custom-key-for-a-hdfs-sequencefile

Best,
Thomas




Von:    Gonzalo Herreros <gherreros@gmail.com>
An:     user@flume.apache.org, 
Datum:  08.09.2015 15:35
Betreff:        Re: Re: How to customize the key in a HDFS SequenceFile 
sink



Looking at the code, I guess this sink is a bit different and the 
"serializer" property doesn't seem to be used.

I see two options:
Either configure hdfs.writeFormat with an implementation of 
SequenceFileSerializerType so it uses your own implementation of 
SequenceFileSerializer.

Or extend HDFSEventSink, pass in the constructor an extension 
of HDFSWriterFactory that when asked for a SequenceWriter return an 
extension of HDFSSequenceFile
on which you have overridden the method "append" to build the key 
whichever you want.


Regards,
Gonzalo

On 8 September 2015 at 13:14, <Thomas.Beer@continental-corporation.com> 
wrote:



Von:        Gonzalo Herreros <gherreros@gmail.com> 
An:        user@flume.apache.org, 
Datum:        08.09.2015 09:29 
Betreff:        Re: How to customize the key in a HDFS SequenceFile sink 

Thanks for your prompt reply. May I ask you to give me some more details. 
I'm a little confused as I've read that the "hdfs.serializer" parameter is 
ignored when using sequence files. 
Does it mean that my custom serializer is responsible for writing 
"correct" SequenceFiles (e.g. using "createWriter" of 
org.apache.hadoop.io.SequenceFile)? 

I assume that I have to do the following (see pseudocode below): 

1) 
agent configuration: 
hdfs.fileType = DataStream 
hdfs.serializer = MyBuilder 


2) 
public class MySerializer implements EventSerializer { 
  customize the key and writing to the outputStream using the createWriter 
method     
} 

3) 
public static class MyBuilder implements EventSerializer.Builder { 
  return new MySerializer(context, os) 
} 

Thanks a lot for your support. 


I would implement a custom serializer and configure it in the standard 
Hdfs sink. 
That way you control how you build the key for each event. 

Regards, 
Gonzalo 

On 8 September 2015 at 06:42, <Thomas.Beer@continental-corporation.com> 
wrote: 

Hello,

I'm using Flume's HDFS SequenceFile sink for writing data to HDFS. I'm
looking for a possibility to create "custom keys". Per default, Flume is
using the Timestamp as key within a SequenceFile. However, in my usecase I
would like to use a customized string as key (instead of the timestamp).

What are best practices for implementing/configuring such a "custom key"
within Flume?

Best, Thomas




Mime
View raw message