It sounds like you might want to use the log4j appender than.
This will only require the flume-ng sdk libs on the servers where the log messages originate.

as well as a flume-ng agent to send the events to.

You do Not need to install flume on the hdfs namenode.
In order for the hdfs sink to write to hdfs you simply need to configure the sink properly with the hdfs write path.
hdfs.path hdfs://namenode/flume/webdata/

Please have a read over the flume user guide as their is a lot of good info in there that you may find useful.

One example flow could be:

AppServer configured with log4j appender --> flume agent [source (avro) | channel | sink (hdfs) --> hdfs

On Thu, Feb 7, 2013 at 10:07 AM, Seshu V <> wrote:
Hello Jeff, 

   Thanks for the reply.  My use case is not really special.  We have multiple products and each product emits traditional log messages in different servers.  I would like to stream those into HDFS.  The logs are generally in apache or log4j format.  
   So, I have many sources from where I want to stream the logs into HDFS.   I can have a channel/collector machine where I install flume.   I guess, my question is, do I need to install flume on the servers where the log messages lie and do I need to install flume in HDFS namenode too?

- Seshu  

On Wed, Feb 6, 2013 at 7:47 PM, Jeff Lord <> wrote:

It really is going to depend on your use case.
Though it sounds that you may need to run an agent on each of the source machines.
Which source do you plan to use? It may also be the case that you can use the flume rpc client to write data directly from your application to the flume collector machine.


On Wed, Feb 6, 2013 at 4:49 PM, Seshu V <> wrote:
Hi All,

    I have used Flume 0.9.3 a while back, it worked fine at that time.  Now, I am looking to use 'Flume NG', started reading documentation today.  In Flume 0.9.3, I installed flume agents on the servers wherever I had the data source.   And, I had a collector machine separately.  My sink was HDFS.   I see that Flume NG is using Channel.    
    My question is that I have multiple source servers and my sink is HDFS.  I also have another machine for Channel (collector in old days).   Do I need to install flume NG  in all the source machines and Channel machine?  Or can I install flume NG only on the Channel server and (somehow) specify in the configuration to pull data from source machines and specify the sink as HDFS?
     Thanks in advance for your replies..

- Seshu