flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Neal <cwn...@gmail.com>
Subject Re: ExecSource->MemoryChannel->AvroSink->AvroSource->FileChannel->HDFSSink throughput question
Date Tue, 05 Feb 2013 15:57:20 GMT
Perfect.
Again, thank you so much for your time. :)
The timeout increase bought be some time, but it still ended up with the
Exception.  I love the multiple sinks idea...I should have thought of that
:)

Chris


On Mon, Feb 4, 2013 at 8:22 PM, Juhani Connolly <
juhani_connolly@cyberagent.co.jp> wrote:

>  Hey
>
>
> On 02/02/2013 01:40 AM, Chris Neal wrote:
>
> Thanks for the help Juhani :)  I'll take a look with Ganglia and see what
> things look like.
>
>  Any thoughts on keeping the ExecSource.batchSize,
> MemoryChannel.transactionCapacity, AvroSink.batch-size, and
> HDFSSink.batchSize the same?
>
>   It's not really important, so long as the avro batch size is less than
> or equal to the channel transaction capacity. The HDFS sinks batch size is
> independent of them both.
>
>
>   I looked at the MemoryChannel code, and noticed that there is a timeout
> parameter passed to doCommit(), where the execption is being thrown.  Just
> for fun, I increased it from the default to 10 seconds, and now things are
> running smoothly with the same config as before.  It's been running for
> about 24 hours now.  A step in the right direction anyway! :)
>
>
> If that fixed it, it sounds like your data is just very bursty and
> sometimes gets fed in faster than it's drained out. The solution to that
> would be either to enlarge your temporary buffer(the mem channel), to
> throttle the incoming data(probably not possible) or to increase drain
> speed(more sinks running in parallel)
>
>
>  Thanks again.
> Chris
>
> On Thu, Jan 31, 2013 at 8:12 PM, Juhani Connolly <
> juhani_connolly@cyberagent.co.jp> wrote:
>
>>  Hi Chris,
>>
>> The most likely cause of that error is that the sinks are draining
>> requests slower than your sources are feeding fresh data. Over time it will
>> fill up the capacity of your memory channel, which will then start refusing
>> additional put requests.
>>
>> You can confirm this by connecting with jmx or ganglia.
>>
>> If the write is extremely bursty, it's possible that it's just
>> temporarily going over the sink consumption rate, and increasing the
>> channel capacity could work. Otherwise, increasing the avro batch size, or
>> adding additional avro sinks(more threads) may also help. I think that
>> setting up ganglia monitoring and looking at the incoming and outgoing
>> event counts and channel fill states helps a lot in diagnosing these
>> bottlenecks, you should look into doing that.
>>
>>
>> On 02/01/2013 02:01 AM, Chris Neal wrote:
>>
>> Hi all.
>>
>>  I need some thoughts on sizing/tuning of the above (common) route in
>> FlumeNG to maximize throughput.  Here is my setup:
>>
>>  *Source JVM (ExecSource/MemoryChannel/AvroSink):*
>> -Xmx4g
>> -Xms4g
>> -XX:MaxDirectMemorySize=256m
>>
>>  Number of ExecSources in config:  124 (yes, it's a ton.  Can't do
>> anything about it :)  The write rate to the source files is fairly fast and
>> bursty.
>>
>>  ExecSource.batchSize = 1000
>> (so, when all 124 tail -F instances get 1000 events, they all dump to the
>> memory channel)
>>
>>  MemoryChannel.capacity = 1000000
>> MemoryChannel.transactionCapacity = 1000
>> (somewhat unclear on what this is.  Docs say "The number of events stored
>> in the channel per transaction", but what is a "transaction" to a
>> MemoryChannel?)
>>
>>  AvroSink.batchSize = 1000
>>
>>  *Destination JVM (AvroSource/FileChannel/HDFSSink)*
>> (Cluster of two JVMs on two servers, each configured the same as per
>> below)
>> -Xms=2g
>> -Xmx=2g
>> -XX:MaxDirectMemorySize is not defined, so whatever the default is
>>
>>  AvroSource.threads = 64
>> FileChannel.transactionCapacity = 1000
>> FileChannel.capacity = 32000000
>> HDFSSink.batchSize = 1000
>> HDFSSink.threadPoolSize = 64
>>
>>  With this configuration, in about 5 minutes, I get the common Exception:
>>
>>  "Space for commit to queue couldn't be acquired Sinks are likely not
>> keeping up with sources, or the buffer size is too tight"
>>
>>  on the Source JVM.  It is no where near the 4g max, rather only at
>> about 2.5g.
>>
>>  I'm wondering about the logic of having all the batch sizes/transaction
>> sizes 1000.  My thought was that would keep from fragmenting the transfer
>> of data, but maybe that's flawed?  Should the sizes be different?
>>
>>  Also curious about increasing the MaxDirectMemorySize to something
>> larger than 256MB?  I tried removing it altogether in my Source JVM (which
>> makes the size unbounded), but that didn't seem to make a difference.
>>
>>  I'm having some trouble figuring out where the backup is happening, and
>> how to open up the gates. :)
>>
>>  Thanks in advance for any suggestions.
>>  Chris
>>
>>
>>
>
>

Mime
View raw message