flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jagadish Bihani <jagadish.bih...@pubmatic.com>
Subject HDFS sink data loss possible ?
Date Wed, 29 May 2013 14:12:17 GMT
Hi

Based on our observations on our production setup in flume:

We have seen file roll sink delivering almost 1% events greater than those
delivered by HDFS sink per day.
(We have replicating setup and two different
file channels for the sinks).

Configuration :
========
Flume version:1.3.1
Flume topology: 30 first tier machines and 3 second tier machines (which 
deliver to HDFS and local file system)
HDFS compression codec :lzop
Channels : File channel for every source-sink pair.
Hadoop version :1.0.3 (Apache Hadoop)

Things are working fine but we see some data loss in the HDFS (though 
not very huge
1 million in 1 billion events).

Is it possible in some scenario?  (Just to add datanodes of the hadoop 
cluster are highly loaded. Can that lead to any disaster?)

Regards,
Jagadish


Mime
View raw message