I have two logical nodes on my servers which I initialize using a derivation of the flume-daemon.sh script. After I upgraded to v094_cdh3u2, I started seeing the second node not begin forwarding its files to a collector. An excerpt from the init script:
export FLUME_HOME=/usr/local/flume-0.9.4-cdh3u2
export FLUME_LOG_DIR="/var/log/flume"
export FLUME_LOGFILE=flume-flume-node-$HOSTNAME.log
log=$FLUME_LOG_DIR/flume-flume-$HOSTNAME.out
for IN in `cat /etc/flume-node.conf`; do
counter=0;
arr=$(echo $IN | tr ";" "\n")
for x in $arr
do
flume_args[$counter]=`echo $x`;
counter=$(( counter + 1 ))
done
flume_host_name=`/bin/hostname`${flume_args[1]}
nohup ${FLUME_HOME}/bin/flume node -n $flume_host_name > "$log" 2>&1 < /dev/null &
done
Both nodes' processes are running, both are ACTIVE on Node Status table
and both have the correct configuration on the Node Configuration
table. But files accumulate in the /logged directory for the
second node.
The problem resolves with a refresh {node_name} command to the master.
Configuration is:
Agent1: node_name1 useast_events syslogUdp(5140) {value("app","ngn") => autoE2EChain }
Collector: collector_name useast_events autoCollectorSource collectorSink(s3://events...)
Agent2: node_name2 useast_accesslogs syslogUdp(5140) {value("app","ngn") => autoE2EChain }
Collector: collector_name useast_accesslogs autoCollectorSource collectorSink(s3://accesslogs...)
After I submit the refresh command, the agent's sink actually is changed from {value("app","ngn") => autoE2EChain } to:
{ value( "app", "ngn" ) => { ackedWriteAhead => { stubbornAppend
=> { insistentOpen => < logicalSink(
"collector_2_1a_094_events" ) ? < logicalSink(
"collector_1_1c_094_events" ) ? logicalSink(
"collector_1_1b_094_events" ) > > } } } }
I can't figure out what is happening, but I do set the FLUME_LOGFILE environment variable only once (i.e., outside the for loop). Sometimes I get multiple nodes writing to the same log file concurrently; but other times I will see a second log file created with a date extension that only the second node writes to.
Does anyone have any suggestions to guarantee both nodes are initialized correctly? I could add a refresh command in the init script, but I want to make sure that I understood the problem since this wasn't happening before the upgrade.
Thanks,
Jay S.