flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Riccardo Carè <riccardocar...@gmail.com>
Subject Moving binary files from spooldir source to HDFS sink
Date Thu, 15 Jan 2015 10:40:00 GMT

I am new to Flume and I am trying to experiment it by moving binary files
over two agents.

 - The first agent runs on machine A and uses a spooldir source and a
thrift sink.
 - The second agent runs on machine B, which is part of a Hadoop cluster.
It has a thrift source and an HDFS sink.

I have two questions for this configuration:
 - I know I have to use the BlobDeserializer$Builder for the source on A,
but which is the correct size for the maxBlobLength parameter? Should it be
less or greater than the expected size of the binary file?
 - I did some tests and I found that the transmitted file was corrupted on
HDFS. I think this was caused by the HDFS sink which uses TEXT as default
serializer (I assume it is writing \n characters between one event and the
other). How could I fix this?

Thank you very much in advance.

Best regards,

View raw message