flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roshan Naik <ros...@hortonworks.com>
Subject Re: Replay log taking to much time
Date Thu, 23 Jul 2015 07:39:27 GMT
You can set the 'checkpointOnClose = true if its not already the case (default is true). This
setting that was added in 1.6.
It will create a checkpoint when flume is trying to shutdown file channel ... consequently
replay  on restart/reconfgure should be much quicker.


From: Shady Xu <shadyxu@gmail.com<mailto:shadyxu@gmail.com>>
Reply-To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Date: Thursday, July 23, 2015 12:35 AM
To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Subject: Re: Replay log taking to much time

Yes I'm using Flume 1.6 now and dualCheckpoints are also used, but every time I restart the
agent, it takes less time but still dozens of minutes to replay the log. This is not normal,

2015-06-25 23:15 GMT+08:00 Johny Rufus <jrufus@cloudera.com<mailto:jrufus@cloudera.com>>:
If the checkpointing interval is 30 seconds (by default), and dualCheckpoints are enabled
(in case, the agent was interrupted while writing a checkpoint), then replay should happen
only from the last 30 secs (worst case 60 secs). Not sure if this is happening in your case,
or  a Full replay is happening.


On Wed, Jun 24, 2015 at 10:40 PM, Shady Xu <shadyxu@gmail.com<mailto:shadyxu@gmail.com>>
I have tried 1.6, replaying log has been faster, but not enough. We have G bytes of logs,
replaying these logs still takes us hours even days. This is frustrating, and has been the
biggest concern for us to use it in a larger scale.

2015-06-01 15:32 GMT+08:00 Hari Shreedharan <hshreedharan@cloudera.com<mailto:hshreedharan@cloudera.com>>:
1.6 has been released. We were waiting for maven central to sync up. Now that it is on central,
I will post the update on the site tomorrow.

On Sunday, May 31, 2015, Shady Xu <shadyxu@gmail.com<mailto:shadyxu@gmail.com>>
I noticed that Flume 1.6 has been released on Github but not the official website. I have
compiled some of the modules from source myself (for other reasons), but I'm not sure compiling
the whole project  is a good idea.

We have tons of data, every time we change the configurations, replaying log takes us way
too many hours...

2015-04-17 12:38 GMT+08:00 Hari Shreedharan <hshreedharan@cloudera.com>:
Changes that went into Flume 1.6 should improve replay time. Flume 1.6 will be out in a few


On Thu, Apr 16, 2015 at 7:55 PM, Shady Xu <shadyxu@gmail.com> wrote:
Every time I restart Flume NG, it will try to replay the log and the process usually takes
hours. During this time, Flume does not take any data from the source.

So how can I make the replay faster?



View raw message