I don't mean to hijack the thread, but is this tiered approach recommended over reading from a local queue and having 10 or so nodes write directly to hbase when using the async hbase sink?
You would run a flume-ng instance on each node with an avro-sink. Then on your collector machine you will run another flume-ng instance with an avro-collector.
If you run more than one collector you can setup sink groups and define that it does failover or load balancing.
The concept of a flume master from flume 0.9.x does not exist on flume-ng. I personally use the node and collector configs in the same config file under a different agent name, and then keep them synced on all machines.
These two docs are pretty helpful:
I'm new to Flume-ng, I'd like to ask you if you can tell me how I can accomplish to have an agent distributed in a cluster. I've have developed my own source and sink version that reads from a queue and the sink stores the messages read to hdfs. If I want to have this running in multiple instances, do I have to submit it on each node?
This is my conf file:
agent1.channels.channel1.type = memory
agent1.channels.channel1.capacity = 1000
agent1.channels.channel1.transactionCapacity = 1000
agent1.sources.source1.channels = channel1
agent1.sources.source1.type = MySource
agent1.sinks.sink1.channel = channel1
agent1.sinks.sink1.type = MySink
agent1.channels = channel1
agent1.sources = source1
agent1.sinks = sink1
I see that there is the concept of 'master' a 'node' in the previous version of flume, do I have something similar here?