flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jagadish Bihani <jagadish.bih...@pubmatic.com>
Subject HDFS sink data loss possible ?
Date Wed, 29 May 2013 14:12:17 GMT

Based on our observations on our production setup in flume:

We have seen file roll sink delivering almost 1% events greater than those
delivered by HDFS sink per day.
(We have replicating setup and two different
file channels for the sinks).

Configuration :
Flume version:1.3.1
Flume topology: 30 first tier machines and 3 second tier machines (which 
deliver to HDFS and local file system)
HDFS compression codec :lzop
Channels : File channel for every source-sink pair.
Hadoop version :1.0.3 (Apache Hadoop)

Things are working fine but we see some data loss in the HDFS (though 
not very huge
1 million in 1 billion events).

Is it possible in some scenario?  (Just to add datanodes of the hadoop 
cluster are highly loaded. Can that lead to any disaster?)


View raw message