flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roshan Naik <ros...@hortonworks.com>
Subject Re: Problems performance with FileChannel and HDFS Sink.
Date Tue, 02 Feb 2016 19:24:22 GMT
Take a look at this. It might help.

From: Gonzalo Herreros <gherreros@gmail.com<mailto:gherreros@gmail.com>>
Reply-To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Date: Tuesday, February 2, 2016 at 8:42 AM
To: user <user@flume.apache.org<mailto:user@flume.apache.org>>
Subject: Re: Problems performance with FileChannel and HDFS Sink.

I don't know the internal details but I guess all those threads write to a single file, so
it will reach a point where there is no improvement.
On the other side having multiple sinks will create multiple files, which should scale better
but you need to make sure the files are written in different folders or pattern, which could
be an inconvenience having events for the same period in multiple files.


On 2 February 2016 at 08:38, Guillermo Ortiz <konstt2000@gmail.com<mailto:konstt2000@gmail.com>>

I have some problems with the performance of HDFS Sink. I only have one sink and one file

I thought to increase the number of sinks for my channel, but I saw as well the parameter
threadsPoolSize. What's the different between this parameter and create more sinks?

I guess that it should be a group of sinks, but I read this in another thread:
"You can add more sinks to your config.
Don't put them in a sink group just have multiple sinks pulling from the same channel. This
should increase your throughput." as answer to other question similar to mine.

Could someone explain me a little bit this??

View raw message