This is really doable with minimal efforts on your end. 

Use flume and hdfs sink. You can actually name the files as you like and rollover on hdfs based on number of events,size or time. 

Developers can then access the logs through hdfs namenode URI or a simple java dfs client inside a container can solve it as well with more security in place. 

On the question of having better way of collecting logs, yes you can achieve it by using pipes but will be little complicate for very minimal performance improvement by my views. Others may suggest it otherwise. 

On Tue, Dec 4, 2012 at 3:34 PM, Emile Kao <> wrote:
Hello guys,
now that I have successfuly setup a running Flume / Hadoop system for my customer, I would like to ask for a help in trying to implement a requirement requested by the customer:

Here is how the use case is looking like:

1. Customer has many Apache Web server and WebSphere Application server that produce many logs.

2. Customer wants to provide the logs to the developer team without giving them direct access to the machines hosting the logs.

3. The idea is now to collect all the log files and put them together in one place and let the developer team get access to them through a web interface.

4. My goal is to resolve this problem using Flume / Hadoop


1. Which is the best way to implement such a scenario using Flume/ Hadoop?

2. The customer would like to keep the log files in thier original state (file name, size, etc..). Is it practicable using Flume?

3. Is there a better way to collect the files without using "Exec source" and "tail -F" command?

Many Thanks and Cheers,

Nitin Pawar