flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ganesh Prabhu <ganesh.pra...@FireEye.com>
Subject Flume v1.8 with taildir source seems to be dropping few events...
Date Thu, 15 Mar 2018 00:21:49 GMT
Hi,

We have to read and parse the log files generated by our application server to look for events
that need to be processed. There are multiple application servers generating this log file.
For flume agent all log files are available under one directory using NFS mount. I am using
TAILDIR source and couple of interceptors. I am using file based channel and custom Sink to
process the events. We are expecting approximately 1M events in a day that need to be processed.
I see few events missing (between 10 and 50) in a day (24 hour period) and they seem to happen
in bunch (e.g. at some time say 9am 5-10 files will be missing).

To debug this issue we created file_roll sink and memory channel to log the events to check
if this issue is related to source. I see the same events missing in file_roll sink as well.
It seems like the issue may be in the TAILDIR source. How do I further debug this issue? Any
help in this regard will be highly appreciated.

BTW, We are using flume 1.8 version and I noticed there is missing events in TAILDIR source
which was resolved in this release. BTW, I have also set the idletimeout to 600000 (as noted
in TAILDIR jira issue).

Please find below flume source and channel configuration.

Appreciate your help and support to root cause this issue. Please feel free to ask for more
information. Thanks in advance.

Thanks,
Ganesh

# Describe/configure the source
stats-agent.sources.r1.type = TAILDIR
stats-agent.sources.r1.positionFile = /tmp/stats_taildir_position.json
stats-agent.sources.r1.filegroups = f1
stats-agent.sources.r1.filegroups.f1 = /tmp/smf-.*prod-.*_uploads\.log
stats-agent.sources.r1.idleTimeout = 600000

stats-agent.sources.r1.interceptors = i1 i2
stats-agent.sources.r1.interceptors.i1.type = regex_filter
stats-agent.sources.r1.interceptors.i1.regex = contentType=stats-contents
stats-agent.sources.r1.interceptors.i1.excludeEvents = false

stats-agent.sources.r1.interceptors.i2.type = search_replace
stats-agent.sources.r1.interceptors.i2.searchPattern = ^.*savedPath=
stats-agent.sources.r1.interceptors.i2.replaceString =

# File based channel for custom sink.
stats-agent.channels.c1.type = file
stats-agent.channels.c1.checkpointDir=/tmp/checkpoint
stats-agent.channels.c1.dataDirs=/tmp/data

# Memory based channel to log events using file_roll sink.
stats-agent.channels.c2.type = memory
stats-agent.channels.c2.capacity = 1000
stats-agent.channels.c2.transactionCapacity = 100

# Event Logging sink
stats-agent.sinks.k3.type = file_roll
stats-agent.sinks.k3.sink.directory = /tmp/flume/stats-contents
stats-agent.sinks.k3.sink.rollInterval = 0
stats-agent.sinks.k3.sink.batchSize = 10
stats-agent.sinks.k3.sink.pathManager.extension = log
stats-agent.sinks.k3.sink.pathManager.prefix = stats-contents-

This email and any attachments thereto may contain private, confidential, and/or privileged
material for the sole use of the intended recipient. Any review, copying, or distribution
of this email (or any attachments thereto) by others is strictly prohibited. If you are not
the intended recipient, please contact the sender immediately and permanently delete the original
and any copies of this email and any attachments thereto.

Mime
View raw message