flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Sammer <esam...@cloudera.com>
Subject Re: the rolling question for collector
Date Wed, 27 Jul 2011 04:31:43 GMT

The collector roll setting is just how often the collector closes and
opens a new file to write to. Collectors don't request data from the
agents; the data is pushed to the collector as it arrives and "when
it's ready." In the case of E2E reliability, events are "ready" when
they've been received from the agent source, taken by the agent sink,
written to the local disk's WAL, and then sent across the wire. As
you've guessed, there are other factors that can impact how fast data
moves from the agent to the collector and it mostly depends on the
reliability mode, disk speed, network, CPU, and any contention for
those resources.

Flume doesn't guarantee that events make it from the agent to the
collector in a certain amount of time. In other words, your test isn't
guaranteed to succeed by Flume. What it does guarantee is that all
events make it if you're using E2E mode.

Hope that helps.

On Tue, Jul 26, 2011 at 9:04 PM, Junxian Yan <junxian.yan@gmail.com> wrote:
> Copy this question to new group
> ---------- Forwarded message ----------
> From: Junxian Yan <junxian.yan@gmail.com>
> Date: Wed, Jul 27, 2011 at 12:47 AM
> Subject: the rolling question for collector
> To: Flume Users <flume-user@cloudera.org>
> Hi Guys
> Now I'm very confuse about the rolling setting on
> collector: flume.collector.roll.millis
> If I set it as 10mins, I think that means the collector will ask for a log
> data update from agent every 10 mins, and if I setup a testing script to
> continue to write log in a 1M/m speed, that means collector will write 10M
> file into the target location.
> But when I try to run the est, I found the fact is not as expected.
> Collector will write log file eveRy 10mins, but the size of log file smaller
> tham 10M.
> So is there any other factor will effect the log writing.
> I'm running flume in 100M LAN, the network should not be  a bottle neck and
> all hard disk is SATA, the system IO should not be a bottle neck
> R

Eric Sammer
twitter: esammer
data: www.cloudera.com

View raw message