flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bojan Kostić <blood9ra...@gmail.com>
Subject Re: Can Flume handle +100k events per seccond?
Date Wed, 06 Nov 2013 00:41:46 GMT
Hallo Roshan,

Thanks for response.
Bit I am now confused. If I have 120 files, do I need to configure 120
sinks/sources/channels separately? Or I have missed something in the docs.
Maybe I should use Fan out flow? But then again I must set 120 params.

Best regards.
On Nov 5, 2013 8:47 PM, "Roshan Naik" <roshan@hortonworks.com> wrote:

> yes. to avoid them clobbering each other's writes.
> On Tue, Nov 5, 2013 at 4:34 AM, Bojan Kostić <blood9raven@gmail.com>wrote:
>> Sorry for late response. But I lost this email somehow.
>> Thanks for the read, it is nice start even it is old.
>> And the numbers are really promising.
>> I'm testing memory chanel, there is like 20 data sources(log servers)
>> with 60 different files each.
>> My RPC client app is basic like in examples. But it have load balancing
>> for two flume agents which are writing data to hdfs.
>> I think I read somewhere that you should have one sink per file. Is that
>> true?
>> Best regards, and sorry again for late response.
>>  On Oct 22, 2013 8:50 AM, "Juhani Connolly" <
>> juhani_connolly@cyberagent.co.jp> wrote:
>>> Hi Bojan,
>>> This is pretty old, but Mike did some testing on performance about an
>>> year and a half ago:
>>> https://cwiki.apache.org/confluence/display/FLUME/
>>> Flume+NG+Syslog+Performance+Test+2012-04-30
>>> He was getting a max of 70k events/sec on a single machine.
>>> Thing is, this is a result of a huge number of variables:
>>> - Parallelization of flows allows better parallel processing
>>> - Use of memory channel as opposed to a slower consistent channel.
>>> - Possibly the source. I have no idea how you wrote your app
>>> - Batching of events is important. Also are all events written to one
>>> file? Or are they split over many? Every file is separately processed.
>>> - Network congestion, your hdfs setup
>>> Reaching 100k events per second is definitely possible. The resources
>>> you need for it will vary significantly depending on how your setup is. The
>>> more HA type features you use, the slower delivery is likely to become. On
>>> the flipside, allowing fairly lax conditions that have a small potential
>>> for data loss(on crash for example memory channel contents are gone) will
>>> allow for close to 100k even on a single machine.
>>> On 10/14/2013 09:00 PM, Bojan Kostić wrote:
>>>> Hi, this is my first post here. But i play with flume for some time now.
>>>> My question is how well flume scale?
>>>> Can Flume ingest +100k events per seccond? Has anyone tried something
>>>> like this?
>>>> I created simple test and results are really slow.
>>>> I wrote simple app with rpc client with fallback using flume sdk which
>>>> is reading dummy log file.
>>>> In the end i have two flume agents which are writing to hdfs.
>>>> rollInterval = 60
>>>> And in hdfs i get files with ~12MB.
>>>> Do i need to use some complex topology with 3 tier?
>>>> How many flume agents should write to hdfs?
>>>> Best regards.
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

View raw message