I don't mean to hijack the thread, but is this tiered approach recommended over reading from a local queue and having 10 or so nodes write directly to hbase when using the async hbase sink?

Iain Wright

This email message is confidential, intended only for the recipient(s) named above and may contain information that is privileged, exempt from disclosure under applicable law. If you are not the intended recipient, do not disclose or disseminate the message to anyone except the intended recipient. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender by return email, and delete all copies of this message.

On Tue, Oct 9, 2012 at 5:52 PM, Camp, Roy <rcamp@ebay.com> wrote:

You would run a flume-ng instance on each node with an avro-sink.  Then on your collector machine you will run another flume-ng instance with an avro-collector.


If you run more than one collector you can setup sink groups and define that it does failover or load balancing.


The concept of a flume master from flume 0.9.x does not exist on flume-ng.  I personally use the node and collector configs in the same config file under a different agent name, and then keep them synced on all machines. 


These two docs are pretty helpful:










From: Juan Gentile [mailto:juan.gentile@globant.com]
Sent: Tuesday, October 09, 2012 11:04 AM
To: user@flume.apache.org
Subject: Flume-ng - Distributed




I'm new to Flume-ng, I'd like to ask you if you can tell me how I can accomplish to have an agent distributed in a cluster. I've have developed my own source and sink version that reads from a queue and the sink stores the messages read to hdfs. If I want to have this running in multiple instances, do I have to submit it on each node?


This is my conf file:

agent1.channels.channel1.type = memory

agent1.channels.channel1.capacity = 1000

agent1.channels.channel1.transactionCapacity = 1000


agent1.sources.source1.channels = channel1

agent1.sources.source1.type = MySource


agent1.sinks.sink1.channel = channel1

agent1.sinks.sink1.type = MySink


agent1.channels = channel1

agent1.sources = source1

agent1.sinks = sink1



I see that there is the concept of 'master' a 'node' in the previous version of flume, do I have something similar here?