flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Lin <chris....@etudata.com>
Subject Big throughput differences on different sinks.
Date Wed, 08 Feb 2012 03:50:17 GMT
Hi all,

We are using flume 0.9.4 from CDH3u2. We are testing it in our environment
and found that there are quite some differences when using different sinks.

The data source is a plain text file, and we use* exec("cat test.txt",
aggregate = true) *to specify source of the agent.
When used customdfs or formatDfs as sink, we got throughput around 40MB/s
which is comparable with direct *hadoop fs -put *in the same environment.
However when used escapedFormatDfs or collectorSink where we would like to
utilize it's escape feature, the throughput dropped to about 4MB/s for
escapedFormatDfs and 1MB/s for collectorSink.

Is there any way that we can tweak so that collectorSink can have better
throughput? Or is it the limitation on collectorSink/escapedFormatDfs? We
would like to be able to rotate the output written to HDFS on some time
interval, with an expected throughput of 10MB/s. Any comment is
appreciated, thank you.


View raw message