flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Rathbone <matt...@foursquare.com>
Subject Flume Master Issues
Date Fri, 26 Aug 2011 15:03:59 GMT
Hey all,

We're having totally unpredictable issues with the flume master installation lately, here's
what happened to us last night / today:

Yesterday we added 8 new nodes to flume. They got set-up fine, and the configs were registered.
a few hours later the master totally stops responding to anything (web/shell/nodes), I don't
find out until this morning.

I try to stop it using the init script, that doesn't do anything, and it continues to run,
but be unresponsive
I kill -9 the flume processes, and remove the pid file, figuring I can just start it again

now the master won't start "master already running on pid=<non-existent-pid>"
when I finally get it to start (changing the pid directory), it starts being unresponsive
restart it, it does the same
stop all flume-nodes, restart it, looks good, start the flume nodes, it goes unresponsive
restart it, and this time it works

The only log above an INFO statement that I can see is this:
2011-08-26 14:38:34,527 WARN com.cloudera.flume.agent.FlumeNode: Unable to load output format
plugin class - Class not found

but I don't think that's causing the issues.

I do have a flume-node running on the same machine, could there be some sort of race condition
Has anyone else seen behavior like this?
Any idea how to fix it?

Hoping someone can shed some light on this, I'm really not sure what's going on.

Thanks all 

Matthew Rathbone
Foursquare | Software Engineer | Server Engineering Team
matthew@foursquare.com (mailto:matthew@foursquare.com) | @rathboma (http://twitter.com/rathboma)
| 4sq (http://foursquare.com/rathboma)

View raw message