flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roshan Naik <ros...@hortonworks.com>
Subject Re: Replay log taking to much time
Date Thu, 23 Jul 2015 19:28:24 GMT
BTW.. Some more  things...

  *   When you shutdown flume (with simple kill), can you check the create timestamp on the
checkpoint file ? Basically see how old it is.
  *   Since you have dual checkpoint enabled. Also check the create timestamp on the  backup
checkpoint, see duration of time between the latest and backup checkpoint to ensure that the
checkpoints are happening on the right schedule.
  *   When the replay happens on startup, some replay metrics are printed in the log in terms
number of inserts/removals that were replayed. See if it corresponds to number of events you
estimate to have flown through the channel in the interval since the creation timestamp on
the latest checkpoint.

If I recall correctly, there were some code changes that went into 1.5 (or maybe 1.4) that
 did seem to slowdown FC replay on startup.
-roshan

From: Roshan Naik <roshan@hortonworks.com<mailto:roshan@hortonworks.com>>
Reply-To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Date: Thursday, July 23, 2015 1:29 AM
To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Subject: Re: Replay log taking to much time

 Don't use -9

From: Shady Xu <shadyxu@gmail.com<mailto:shadyxu@gmail.com>>
Reply-To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Date: Thursday, July 23, 2015 1:23 AM
To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Subject: Re: Replay log taking to much time

I didn't set this property so it has its default value true. Any other idea?

BTW, if I use `kill -9` to kill the flume process, flume will not be able to create a checkpoint,
right?

2015-07-23 15:39 GMT+08:00 Roshan Naik <roshan@hortonworks.com<mailto:roshan@hortonworks.com>>:
You can set the 'checkpointOnClose = true if its not already the case (default is true). This
setting that was added in 1.6.
It will create a checkpoint when flume is trying to shutdown file channel ... consequently
replay  on restart/reconfgure should be much quicker.

-roshan

From: Shady Xu <shadyxu@gmail.com<mailto:shadyxu@gmail.com>>
Reply-To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Date: Thursday, July 23, 2015 12:35 AM
To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Subject: Re: Replay log taking to much time

Yes I'm using Flume 1.6 now and dualCheckpoints are also used, but every time I restart the
agent, it takes less time but still dozens of minutes to replay the log. This is not normal,
right?

2015-06-25 23:15 GMT+08:00 Johny Rufus <jrufus@cloudera.com<mailto:jrufus@cloudera.com>>:
If the checkpointing interval is 30 seconds (by default), and dualCheckpoints are enabled
(in case, the agent was interrupted while writing a checkpoint), then replay should happen
only from the last 30 secs (worst case 60 secs). Not sure if this is happening in your case,
or  a Full replay is happening.

Thanks,
Rufus

On Wed, Jun 24, 2015 at 10:40 PM, Shady Xu <shadyxu@gmail.com<mailto:shadyxu@gmail.com>>
wrote:
I have tried 1.6, replaying log has been faster, but not enough. We have G bytes of logs,
replaying these logs still takes us hours even days. This is frustrating, and has been the
biggest concern for us to use it in a larger scale.

2015-06-01 15:32 GMT+08:00 Hari Shreedharan <hshreedharan@cloudera.com<mailto:hshreedharan@cloudera.com>>:
1.6 has been released. We were waiting for maven central to sync up. Now that it is on central,
I will post the update on the site tomorrow.


On Sunday, May 31, 2015, Shady Xu <shadyxu@gmail.com<mailto:shadyxu@gmail.com>>
wrote:
I noticed that Flume 1.6 has been released on Github but not the official website. I have
compiled some of the modules from source myself (for other reasons), but I'm not sure compiling
the whole project  is a good idea.

We have tons of data, every time we change the configurations, replaying log takes us way
too many hours...

2015-04-17 12:38 GMT+08:00 Hari Shreedharan <hshreedharan@cloudera.com>:
Changes that went into Flume 1.6 should improve replay time. Flume 1.6 will be out in a few
days.


Thanks,
Hari

On Thu, Apr 16, 2015 at 7:55 PM, Shady Xu <shadyxu@gmail.com> wrote:
Every time I restart Flume NG, it will try to replay the log and the process usually takes
hours. During this time, Flume does not take any data from the source.

So how can I make the replay faster?




--

Thanks,
Hari






Mime
View raw message