flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juhani Connolly <juhani_conno...@cyberagent.co.jp>
Subject Re: Reliability in Flume
Date Thu, 24 Jan 2013 07:33:04 GMT
Hi Henry,

Just to add to Mike's response:

When used with secure channels(mainly file channel) and with transports 
that can be rolled back(avro), message delivery is 
guarranteed(eventually). The only way you can lose data is for a part of 
the chain to be permanently removed: HD failure or removal of the 
physical hardware.

Prevention of data duplication has never been an objective of flume, 
though it is uncommon in a properly configured setup. The larger your  
batch sizes are, the more duplication you may get with each partial 
failure. Similarly ordered arrival of data is not guarranteed. The best 
way to address these two issues, if it is a concern, is to run a 
map-reduce task or similar to reduce to unique rows and/or reorder.

On 01/24/2013 12:26 PM, Henry Ma wrote:
> Dear Flume developers and users,
> I understand that Flume NG uses channel-based transactions to 
> guarantee reliable message delivery between agents. But in 
> some extreme failure scenes, will Flume keep total Reliability? I have 
> thought of these scenes below.
> 1. In transactions between agent, what will happen if the receiving 
> agent process down just after it commits its put transaction and 
> before sends the success indication to the sending agent? Will the 
> sending agent send the same event again when the receiving agent 
> recovers, and cause data duplication?
> 2. In the communication between the client (data source, sending data 
> to the first-hop agent) and the first-hop agent, what will happen if 
> the agent process down just after it receives the event and before 
> saves to its channel? Will it cause data loss?
> 3. In the communication between the final-hup agent and the storage 
> system (such as MySQL, HDFS, file system, etc.), what happened if the 
> agent down before it commits the saving transaction but has saved some 
> data in the storage? Will this cause data duplication after the 
> recover of the agent?
> Thank you very much!
> -- 
> Best Regards,
> Henry Ma

View raw message