flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juhani Connolly <juhani_conno...@cyberagent.co.jp>
Subject Re: File channel performance on a single disk is poor
Date Mon, 09 Jul 2012 06:14:28 GMT
Hi, thanks for your input.

On 07/09/2012 02:42 PM, Arvind Prabhakar wrote:
> Hi,
> > It's certainly one possible solution to the issue, though I do
> > believe that the current one could be made more friendly
> > towards single disk access(e.g. batching writes to the disk
> > may well be doable and would be curious what someone
> > with more familiarity with the implementation thinks).
> The implementation of the file channel is that of a write ahead log, 
> in that it serializes all the actions as they happen. Using these 
> actions, it can reconstruct the state of the channel at anytime. There 
> are two mutually exclusive transaction types it supports - a 
> transaction consisting of puts, and one consisting of takes. It may be 
> possible to use the heap to batch the puts and takes and serialize 
> them to disk when the commit occurs.
> This approach will minimize the number of disk operations and will 
> have an impact on the performance characteristics of the channel. 
> Although it probably will improve performance, it is hard to tell for 
> sure unless we test it out under load in different scenarios.

This does sound a lot better to me. I'm not sure if there is much demand 
for restoring the state of an uncommitted set of puts/takes to a file 
channel after restarting an agent? If the transaction wasn't completed  
its current state  is not really going to be important after a restart. 
I'm really not familiar with WAL implementations, but is it not merely 
enough to write the data to be committed before the commit 
marker/informing of success? I don't think it is necessary to write each 
piece as it comes in, so long as it is done before informing of 

Another matter that I'm curious of is whether or not we actually need 
separate files for the data and checkpoints... Can we not add a magic 
header before each type of entry to differentiate, and thus guarantee 
significantly more sequential access? What is killing performance on a 
single disk right now is the constant seeks. The problem with this 
though would be putting together a file format that allows quick seeking 
through to the correct position, and rolling would be a lot harder. I 
think this is a lot more difficult and might be more of a long term target.


> Regards,
> Arvind Prabhakar
> On Wed, Jul 4, 2012 at 3:33 AM, Juhani Connolly 
> <juhani_connolly@cyberagent.co.jp 
> <mailto:juhani_connolly@cyberagent.co.jp>> wrote:
>     It looks good to me as it provides a nice balance between
>     reliability and throughput.
>     It's certainly one possible solution to the issue, though I do
>     believe that the current one could be made more friendly towards
>     single disk access(e.g. batching writes to the disk may well be
>     doable and would be curious what someone with more familiarity
>     with the implementation thinks).
>     On 07/04/2012 06:36 PM, Jarek Jarcec Cecho wrote:
>         We had connected discussion about this "SpillableChannel"
>         (working name) on FLUME-1045 and I believe that consensus is
>         that we will create something like that. In fact, I'm planning
>         to do it myself in near future - I just need to prioritize my
>         todo list first.
>         Jarcec
>         On Wed, Jul 04, 2012 at 06:13:43PM +0900, Juhani Connolly wrote:
>             Yes... I was actually poking around for that issue as I
>             remembered
>             seeing it before.  I had before also suggested a compound
>             channel
>             that would have worked like the buffer store in scribe,
>             but general
>             opinion was that it provided too many mixed configurations
>             that
>             could make testings and verifying correctness difficult.
>             On 07/04/2012 04:33 PM, Jarek Jarcec Cecho wrote:
>                 Hi Juhally,
>                 while ago I've filled jira FLUME-1227 where I've
>                 suggested creating some sort of SpillableChannel that
>                 would behave similarly as scribe. It would be normally
>                 acting as memory channel and it would start spilling
>                 data to disk in case that it would get full (my
>                 primary goal here was to solve issue when remote goes
>                 down, for example in case of HDFS maintenance). Would
>                 it be helpful for your case?
>                 Jarcec
>                 On Wed, Jul 04, 2012 at 04:07:48PM +0900, Juhani
>                 Connolly wrote:
>                     Evaluating flume on some of our servers, the file
>                     channel seems very
>                     slow, likely because like most typical web servers
>                     ours have a
>                     single raided disk available for writing to.
>                     Quoted below is a suggestion from a  previous
>                     issue where our poor
>                     throughput came up, where it turns out that on
>                     multiple disks, file
>                     channel performance is great.
>                     On 06/27/2012 11:01 AM, Mike Percy wrote:
>                         We are able to push > 8000 events/sec (2KB per
>                         event) through a single file channel if you
>                         put checkpoint on one disk and use 2 other
>                         disks for data dirs. Not sure what the limit
>                         is. This is using the latest trunk code. Other
>                         limitations may be you need to add additional
>                         sinks to your channel to drain it faster. This
>                         is because sinks are single threaded and
>                         sources are multithreaded.
>                         Mike
>                     For the case where the disks happen to be
>                     available on the server,
>                     that's fantastic, but I suspect that most use
>                     cases are going to be
>                     similar to ours, where multiple disks are not
>                     available. Our use
>                     case isn't unusual as it's primarily aggregating
>                     logs from various
>                     services.
>                     We originally ran our log servers with a
>                     exec(tail)->file->avro
>                     setup where throughput was very bad(80mb in an
>                     hour). We then
>                     switched this to a memory channel which was
>                     fine(the peak time 500mb
>                     worth of hourly logs went through). Afterwards we
>                     switched back to
>                     the file channel, but with 5 identical avro sinks.
>                     This did not
>                     improve throughput(still 80mb).
>                     RecoverableMemoryChannel showed very
>                     similar characteristics.
>                     I presume this is due to the writes going to two
>                     separate places,
>                     and being further compounded by also writing out
>                     and tailing the
>                     normal web logs: checking top and iostat, we could
>                     confirm we have
>                     significant iowait time, far more than we have
>                     during typical
>                     operation.
>                     As it is, we seem to be more or less guaranteeing
>                     no loss of logs
>                     with the file channel. Perhaps we could look into
>                     batching
>                     puts/takes for those that do not need 100% data
>                     retention but want
>                     more reliability than with the MemoryChannel which
>                     can potentially
>                     lose the entire capacity on a restart? Another
>                     possibility is
>                     writing an implementation that writes primarily
>                     sequentially. I've
>                     been meaning to get a deeper look at the
>                     implementation itself to
>                     give a more informed commentary on the contents
>                     but unfortunately
>                     don't have the cycles right now, hopefully someone
>                     with a better
>                     understanding of the current implementation(along
>                     with its
>                     interaction with the OS file cache) can comment on
>                     this.

View raw message