flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike <mikethe...@gmail.com>
Subject Re: Flume Master Issues
Date Fri, 26 Aug 2011 15:17:28 GMT
I recall a similar problem I had with this.

It ended up being another pid-style file dropped somewhere else.


See if those are still around once all the flume procs are dead.


On Fri, Aug 26, 2011 at 11:03 AM, Matthew Rathbone
<matthew@foursquare.com> wrote:
> Hey all,
> We're having totally unpredictable issues with the flume master installation
> lately, here's what happened to us last night / today:
> Yesterday we added 8 new nodes to flume. They got set-up fine, and the
> configs were registered.
> a few hours later the master totally stops responding to anything
> (web/shell/nodes), I don't find out until this morning.
> I try to stop it using the init script, that doesn't do anything, and it
> continues to run, but be unresponsive
> I kill -9 the flume processes, and remove the pid file, figuring I can just
> start it again
> now the master won't start "master already running on
> pid=<non-existent-pid>"
> when I finally get it to start (changing the pid directory), it starts being
> unresponsive again
> restart it, it does the same
> stop all flume-nodes, restart it, looks good, start the flume nodes, it goes
> unresponsive again
> restart it, and this time it works
> The only log above an INFO statement that I can see is this:
> 2011-08-26 14:38:34,527 WARN com.cloudera.flume.agent.FlumeNode: Unable to
> load output format plugin class  - Class not found
> but I don't think that's causing the issues.
> I do have a flume-node running on the same machine, could there be some sort
> of race condition happening?
> Has anyone else seen behavior like this?
> Any idea how to fix it?
> Hoping someone can shed some light on this, I'm really not sure what's going
> on.
> Thanks all
> --
> Matthew Rathbone
> Foursquare | Software Engineer | Server Engineering Team
> matthew@foursquare.com | @rathboma | 4sq

View raw message