flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Friso van Vollenhoven <fvanvollenho...@xebia.com>
Subject Re: Preventing Data Loss during Restart
Date Wed, 13 Feb 2013 06:47:31 GMT
If you are using rsyslog as your syslog daemon (the default in CentOS and RHEL), then there
is such a thing as reliable TCP transport built in (http://www.rsyslog.com/doc/rsyslog_reliable_forwarding.html).
Flume doesn't have a source for this, but it would be a nice feature to build it. I thought
about this before. We also use syslog for sending logs to Flume and are also not keen on running
Java front-end boxes, so we have the same problem.

I will try to have a look at their protocol today, to see how complex it would be (not making
any promises).


On 12 feb. 2013, at 21:19, <Matt.Elliott@gdc4s.com<mailto:Matt.Elliott@gdc4s.com>>
 <Matt.Elliott@gdc4s.com<mailto:Matt.Elliott@gdc4s.com>> wrote:

Yeah I’m starting to answer my own question. We’re using 1.3 so we do have Avro. We were
trying to avoid installing anything on our client (Source) machines so that we could avoid
installing Java on machines we didn’t need it on.

From: Hari Shreedharan [mailto:hshreedharan@cloudera.com]
Sent: Tuesday, February 12, 2013 2:42 PM
To: user@flume.apache.org<mailto:user@flume.apache.org>
Subject: Re: Preventing Data Loss during Restart


What version of Flume are you using? Also note that Syslog is a fire and forget protocol,
so when you reconfigure, any events not persisted to the file channel would be lost. Since
there is no way of informing the data source that the data was not written to disk, this data
could in fact be lost. We recommend using a source which actually does report failure, like
Avro/Thrift (available on trunk, not in any release yet) or HTTP. This will allow you to retry
if Flume reports failure.


Hari Shreedharan

On Tuesday, February 12, 2013 at 11:24 AM, Matt.Elliott@gdc4s.com<mailto:Matt.Elliott@gdc4s.com>

I’ve seen some threads on this online in the past but I can’t seem to find a distinct
answer. We’re deploying Flume in a production environment where we’re going to be grabbing
log data from syslog and other sources. While Flume supports run time configuration changes
we are still noticing data loss during testing even with a file channel. Now, this is a single
channel, source, and sink set up, no redundancy. Does anyone know of a clean way to support
guaranteed delivery without redundancy?


This message and/or attachments may include information subject to GDC4S S.P. 1.8.6 and GD
Corporate Policy 07-105 and are intended to be accessed only by authorized recipients.  Use,
storage and transmission are governed by General Dynamics and its policies. Contractual restrictions
apply to third parties.  Recipients should refer to the policies or contract to determine
proper handling.  Unauthorized review, use, disclosure or distribution is prohibited.  If
you are not an intended recipient, please contact the sender and destroy all copies of the
original message.

View raw message