flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Alten-Lorenz <wget.n...@gmail.com>
Subject Re: In flume-ng is there any advantages of 2-tier topology in a cluster of 30-40 nodes?
Date Wed, 30 Jan 2013 06:26:13 GMT

If the agents (Tier 1) have access to HDFS, each single client can put data into HDFS. But
this doesn't make really sense, instead you want different files from different hosts in a
structured view (maybe per host a directory, the contents inside split into buckets).

When you implement a Tier 2 (maybe 2 or more servers who has access to HDFS), you can have
more features like loadbalancing, HA and mirrored sinks, as example (one sink put the data
into HDFS, the other sink into a other system for backup maybe). For stability and reliability
a Tier 2 architecture is recommend. And made some things easier ;)


On Jan 30, 2013, at 7:05 AM, Jagadish Bihani <jagadish.bihani@pubmatic.com> wrote:

> Hi
> In our scenario there are around 30 machines from which we want to put data into HDFS.
> Now the approach we thought of initially was:
> 1. First tier  : Agent which collect data from source then pass it to avro sink.
> 2. Second tier:  Lets call those agents 'collectors' which collect data from First tier
agents and then dump it to HDFS.
> (Second tier agents are fewer in number say 4:1)
> Instead of above topology if I simply use HDFS sink in first tier agents. It can serve
the purpose.
> And also number of nodes are lesser (say 30) that won't hurt HDFS namenode too much compared
> to if number of nodes were say 1000.
> But apart from that I don't say any advantage of adding the 2nd tier.
> Is there any advantage I am missing in terms of failover, HDFS performance or any other
> Regards,
> Jagadish

Alexander Alten-Lorenz
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

View raw message