flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kim, Jongkook " <jongkook....@citi.com>
Subject RE: Scale of a flume collector
Date Wed, 15 Feb 2012 16:52:27 GMT
Thanks Alex,

When you says "180mb/s", is it data handling capacity of one collector?
The size of data that I listed on the email is not compressed size and we are using DFO. 
If the data is compressed, do we still need 1 collector for 10 agents?

Thanks in advance,

-----Original Message-----
From: alo alt [mailto:wget.null@googlemail.com] 
Sent: Wednesday, February 15, 2012 3:06 AM
To: flume-user@incubator.apache.org
Subject: Re: Scale of a flume collector


that depends on the sink you want to use. Lets say you use E2E chains, the collectors are
on actual hardware and you use compression I would put 10 agents per collector (180mb/s *
60 (for a minute based file closing) = 10.8 GB / min). To get closer on RT I would suggest
a 10 sec roll, but more as 10 could be create a bottleneck at peak times
The collectors need fast hard disks. 


Alexander Lorenz

On Feb 14, 2012, at 8:25 PM, Kim, Jongkook wrote:

> Hi all.
> I'm in the middle of hardware provisioning for flume-hbase-hadoop solution.
> The plan is that flume agents collect and pass log data to collectors and the collectors
write data into hbase using sink.
> The question is a flume collector's scale.
> Flume agents:250
> Data receiving ratio: 5.78MB/second
> Data writing ratio: 17.9MB/second
> Number of data nodes: 12
> This system will be used to provide real-time use case, so there shouldn't be delay.
> How many collectors required to handle this request?
> Thanks in advance,

View raw message