flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From alo alt <wget.n...@googlemail.com>
Subject Re: Scale of a flume collector
Date Wed, 15 Feb 2012 17:44:15 GMT
Hi Kim,

around, that can handle a collector, based on modern hardware and a well switched network.
The throughput to HDFS depends on other variables on your hadoop cluster.

Flume has the opportunity to compress the data before they are written, that was my intention.
Compressing costs time and CPU, but if you don't use compression inside of a collector 10
agents for one collector should be okay. Here you should add some spares to prevent a server
crash, I used in past autoCollectorSource.


Alexander Lorenz

On Feb 15, 2012, at 5:52 PM, Kim, Jongkook wrote:

> Thanks Alex,
> When you says "180mb/s", is it data handling capacity of one collector?
> The size of data that I listed on the email is not compressed size and we are using DFO.

> If the data is compressed, do we still need 1 collector for 10 agents?
> Thanks in advance,
> -----Original Message-----
> From: alo alt [mailto:wget.null@googlemail.com] 
> Sent: Wednesday, February 15, 2012 3:06 AM
> To: flume-user@incubator.apache.org
> Subject: Re: Scale of a flume collector
> Hi,
> that depends on the sink you want to use. Lets say you use E2E chains, the collectors
are on actual hardware and you use compression I would put 10 agents per collector (180mb/s
* 60 (for a minute based file closing) = 10.8 GB / min). To get closer on RT I would suggest
a 10 sec roll, but more as 10 could be create a bottleneck at peak times
> The collectors need fast hard disks. 
> best,
> Alex 
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
> On Feb 14, 2012, at 8:25 PM, Kim, Jongkook wrote:
>> Hi all.
>> I'm in the middle of hardware provisioning for flume-hbase-hadoop solution.
>> The plan is that flume agents collect and pass log data to collectors and the collectors
write data into hbase using sink.
>> The question is a flume collector's scale.
>> Flume agents:250
>> Data receiving ratio: 5.78MB/second
>> Data writing ratio: 17.9MB/second
>> Number of data nodes: 12
>> This system will be used to provide real-time use case, so there shouldn't be delay.
>> How many collectors required to handle this request?
>> Thanks in advance,

View raw message