flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roshan Naik <ros...@hortonworks.com>
Subject Re: Best way to increase throughput of Exec->Memory->Avro agent.
Date Tue, 12 Mar 2013 21:12:26 GMT
i meant 640,000 not 64,000

On Tue, Mar 12, 2013 at 2:10 PM, Roshan Naik <roshan@hortonworks.com> wrote:
> beyond a certain # of sinks it wont help adding more. my suspicion is
> you may have gone way overboard.
>  if your sink-side batch size is that large and you have 64 sinks in
> the round-robin.. it will take a lot of events (64,000) to be pumped
> in by the source order before the first event can start trickling out
> of any sink.  Also memory consumption will be quite high.. each sink
> will open a transaction and hold on to 10000 events. This the cause
> for the Memory channel filling up. Until the sink side transaction is
> committed (i.e 10k events are pulled), the memory reservation on the
> channel is not relinquished. So your memory channel size will have to
> really high to support so manch sinks each with such a big batch size.
> My gut feel is that your source-side batch size is not much of an
> issue and can be smaller. Increasing the number of sinks will only
> help if the sink is indeed the bott
> On Tue, Mar 12, 2013 at 1:43 PM, Chris Neal <cwneal@gmail.com> wrote:
>> Hi all.
>> I've been working on this for quite some time, and need some advice from the
>> experts.  I have a two tiered Flume architecture:
>> App Tier (all on one server):
>>  124 ExecSources -> MemoryChannel -> AvroSinks
>> HDFS Tier (on two servers):
>>   AvroSource -> FileChannel -> HDFSSinks
>> When I run the agents, the HDFS tier is keeping up fine with the App Tier.
>> queue sizes stay between 0-10000 (I have a batch size of 10000).  All is
>> good.
>> On the App Tier, when I view the JMX data through jconsole, I watch the size
>> of the MemoryChannel grow steadily until it reaches the max, then it starts
>> throwing exceptions about not being able to put the batch on the channel as
>> expected.
>> There seems to be two basic ways to increase the throughput of the App Tier:
>> 1.  Increase the MemoryChannel's transactionCapacity and the corresponding
>> AvroSink's batch-size.  Both are set to 10000 for me.
>> 2.  Increase the number of AvroSinks to drain the MemoryChannel.  I'm up to
>> 64 Sinks now which round-robin between the two Flume Agents on the HDFS
>> tier.
>> Both of those values seem quite high to me (batch size and number of sinks).
>> Am I missing something as far as tuning?
>> Which would allow for greater increase to throughput, more Sinks or larger
>> batch size?
>> I'm stumped here.  I still think I can get this to work. :)
>> Any suggestions are most welcome.
>> Thanks for your time.
>> Chris

View raw message