I have two logical nodes on my servers which I initialize using a derivation of the flume-daemon.sh script. After I upgraded to v094_cdh3u2, I started seeing the second node not begin forwarding its files to a collector.  An excerpt from the init script:

export FLUME_HOME=/usr/local/flume-0.9.4-cdh3u2
export FLUME_LOG_DIR="/var/log/flume"
export FLUME_LOGFILE=flume-flume-node-$HOSTNAME.log
log=$FLUME_LOG_DIR/flume-flume-$HOSTNAME.out

for IN in `cat /etc/flume-node.conf`; do
          counter=0;
          arr=$(echo $IN | tr ";" "\n")
          for x in $arr
          do
              flume_args[$counter]=`echo $x`;
              counter=$(( counter + 1 ))
          done
          flume_host_name=`/bin/hostname`${flume_args[1]}
          nohup ${FLUME_HOME}/bin/flume node -n $flume_host_name > "$log" 2>&1 < /dev/null &
done

Both nodes' processes are running, both are ACTIVE on Node Status table and both have the correct configuration on the Node Configuration table.  But files accumulate in the /logged directory for the second node.

The problem resolves with a refresh {node_name} command to the master.

Configuration is:

Agent1: node_name1 useast_events syslogUdp(5140) {value("app","ngn") => autoE2EChain }
Collector: collector_name useast_events autoCollectorSource collectorSink(s3://events...)

Agent2: node_name2 useast_accesslogs syslogUdp(5140) {value("app","ngn") => autoE2EChain }
Collector: collector_name useast_accesslogs autoCollectorSource collectorSink(s3://accesslogs...)


After I submit the refresh command, the agent's sink actually is changed from {value("app","ngn") => autoE2EChain } to:

{ value( "app", "ngn" ) => { ackedWriteAhead => { stubbornAppend => { insistentOpen => < logicalSink( "collector_2_1a_094_events" ) ? < logicalSink( "collector_1_1c_094_events" ) ? logicalSink( "collector_1_1b_094_events" ) > > } } } }

I can't figure out what is happening, but I do set the FLUME_LOGFILE environment variable only once (i.e., outside the for loop).  Sometimes I get multiple nodes writing to the same log file concurrently; but other times I will see a second log file created with a date extension that only the second node writes to.

Does anyone have any suggestions to guarantee both nodes are initialized correctly? I could add a refresh command in the init script, but I want to make sure that I understood the problem since this wasn't happening before the upgrade.

Thanks,

Jay S.