sorry, i mean avro sink




On 27 November 2015 at 14:52, Gonzalo Herreros <gherreros@gmail.com> wrote:
Hi Zaenal,

There is no "avro channel", Flume will write by default avro to any of the channels.
The point is that a memory channel or even a file channel will very quickly fill up because a single sink cannot keep up with the many sources.

Regards,
Gonzalo

On 27 November 2015 at 03:43, zaenal rifai <togatta.fudo@gmail.com> wrote:
why not to use avro channel gonzalo ?

On 26 November 2015 at 20:12, Gonzalo Herreros <gherreros@gmail.com> wrote:
You cannot have multiple processes writing concurrently to the same hdfs file.
What you can do is have a topology where many agents forward to an agent that writes to hdfs but you need a channel that allows the single hdfs writer to lag behind without slowing the sources.
A kafka channel might be a good choice.

Regards,
Gonzalo

On 26 November 2015 at 11:57, yogendra reddy <yogendra.60@gmail.com> wrote:
Hi All,

Here's my current flume setup for a hadoop cluster to collect service logs

- Run flume agent in each of the nodes
- Configure flume sink to write to hdfs and the files end up in this way

..flume/events/node0logfile
..flume/events/node1logfile

..flume/events/nodeNlogfile

But I want to be able to write all the logs from multiple agents to a single file in hdfs . How can I achieve this and what would the topology look like. 
can this be done via collector ? If yes, where can I run the collector and how will this scale for a 1000+ node  cluster.

Thanks,
Yogendra