flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carlos Rojas Matas <cma...@despegar.com>
Subject Multiple agents in high availability
Date Thu, 24 Sep 2015 20:58:58 GMT
Hi Guys!

Thanks for accepting my request. We're using flume to ingest massive amount
of data from a kafka source and we're not sure about how to configure a
flume cluster with HA. This is a brief:

1 - we use kafka to hold intermediate data about our users activity.
2- we use flume to ingest all that data and send it to avro files in hdfs.
3- we wan't to have high availability, that is, not a single agent but a
cluster of agents.
4- the thing is that we cannot have duplicates in the target files. If we
start several agents consuming from the same topic each one of them
potentially could receive the same events, which breaks out the former

Is there a way to configure multiple sources such that Kafka see them as a
single one?

Thanks in advance,

View raw message