Thanks for the clarification.

On Fri, Nov 27, 2015 at 2:15 PM, Gonzalo Herreros <gherreros@gmail.com> wrote:
Yes, the best way to consolidate multiple sources is to use an avro sinks that forwards to the agent that writes to hdfs (which exposes an avro source to listen to the other avro sinks).


On 27 November 2015 at 08:28, zaenal rifai <togatta.fudo@gmail.com> wrote:
sorry, i mean avro sink




On 27 November 2015 at 14:52, Gonzalo Herreros <gherreros@gmail.com> wrote:
Hi Zaenal,

There is no "avro channel", Flume will write by default avro to any of the channels.
The point is that a memory channel or even a file channel will very quickly fill up because a single sink cannot keep up with the many sources.

Regards,
Gonzalo

On 27 November 2015 at 03:43, zaenal rifai <togatta.fudo@gmail.com> wrote:
why not to use avro channel gonzalo ?

On 26 November 2015 at 20:12, Gonzalo Herreros <gherreros@gmail.com> wrote:
You cannot have multiple processes writing concurrently to the same hdfs file.
What you can do is have a topology where many agents forward to an agent that writes to hdfs but you need a channel that allows the single hdfs writer to lag behind without slowing the sources.
A kafka channel might be a good choice.

Regards,
Gonzalo

On 26 November 2015 at 11:57, yogendra reddy <yogendra.60@gmail.com> wrote:
Hi All,

Here's my current flume setup for a hadoop cluster to collect service logs

- Run flume agent in each of the nodes
- Configure flume sink to write to hdfs and the files end up in this way

..flume/events/node0logfile
..flume/events/node1logfile

..flume/events/nodeNlogfile

But I want to be able to write all the logs from multiple agents to a single file in hdfs . How can I achieve this and what would the topology look like. 
can this be done via collector ? If yes, where can I run the collector and how will this scale for a 1000+ node  cluster.

Thanks,
Yogendra