flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ed <edor...@gmail.com>
Subject Re: Fastest way to get data into flume?
Date Fri, 28 Mar 2014 09:23:33 GMT

It looks like you had acks turned on in the config you posted for your
netcat source.  You might want to try turning them off:

agent1.sources.netcatSource.ack-every-event = false

We've gotten up around 1400 events per second on a single netcat source
feeding 2 HDFS sinks without any issues (using a memory channel).  This is
on a live network so we've never tested above that as that's the max
throughput of the events we're storing.



On Fri, Mar 28, 2014 at 5:58 AM, Asim Zafir <asim.zafir@gmail.com> wrote:

> How much data are ingesting per minute or second bases?
> How many source we are taking here ?
> What kind of channel are you using currently and what is the memory
> /storage footprint on the source as well as sink?
> is it a uniform distribution of traffic? if not, what is the max peak of
> the data throughput you you expect from a given source?
> On Thu, Mar 27, 2014 at 11:07 AM, Andrew Ehrlich <andrew@aehrlich.com>wrote:
>> What about having more than one flume agent?
>> You could have two agents that read the small messages and sink to HDFS,
>> or two agents that read the messages, serialize them, and send them to a
>> third agent which sinks them into HDFS.
>> On Thu, Mar 27, 2014 at 9:43 AM, Chris Schneider <
>> chris@christopher-schneider.com> wrote:
>>> I have a fair bit of data continually being created in the form of
>>> smallish messages (a few hundred bytes), which needs to enter flume, and
>>> eventually sink into HDFS.
>>> I need to be sure that the data lands in persistent storage and won't be
>>> lost, but otherwise throughput isn't important. It just needs to be fast
>>> enough to not back up.
>>> I'm running into a bottleneck in the initial ingestion of data.
>>> I've tried the netcat source, and the thrift source but both have capped
>>> out at a thousand or so records per second.
>>> Batching up the thrift api items into sets of 10 and using appendBatch
>>> is a pretty large speedup, but still not enough.
>>> Here's a gist of my ruby test script, and some example runs, and my
>>> config.
>>> https://gist.github.com/cschneid/9792305
>>> 1.  Are there any obvious performance changes I can do to speed up
>>> ingestion?
>>> 2. How fast can flume reasonably go? Should I switch my source to be
>>> something else that's faster? What?
>>> 3. Is there a better tool for this kind of task? (rapid, safe ingestion
>>> small messages).
>>> Thanks!
>>> Chris

View raw message