flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Margus Roo <mar...@roo.ee>
Subject Two parallel agents from same source to same sink
Date Thu, 21 Jan 2016 15:05:24 GMT

I try to set up flume high availability
 From rsyslog comes same feed to two different servers s1 and s2.
In both servers are configured flume-agents to listen feed from rsyslog.
Both agents are writing feed to HDFS.
What I am getting into HDFS is different files with duplicated content.

Is there any best practice architecture how to use flume in situations 
like this.
What I am trying to avoid is in situation when one server is down then 
syslog is forwarded into two servers and at least one can transport 
events to HDFS.

At the moment I thought I can clean after some time duplicates before 
hive will use directory.

Margus (margusja) Roo
skype: margusja
+372 51 48 780

View raw message