flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kowshik Prakasam <kows...@gmail.com>
Subject Stamping metadata to Event(s) in multitail source
Date Sun, 05 Feb 2012 04:56:20 GMT
Hi,

I'm trying to use Flume to tail logs from multiple log files on the disk.
Each log file
corresponds to a service in the system that emits events. The obvious
choice Flume source choice
was multitail(....), and my flume agent configuration would be :

    mutlitail(logfile1, logfile2, logfile3, ..., logfileN) |
rpcSink("xxx_host", xxx_port)

The receiver of the above events is a flume collector whose configuration
is:

     collectorSource(xxx_port) | myCustomSink()

Here, myCustomSink() is a flume plugin that I've developed to process
events in a certain way.

Other than just tailing events from all log files, I would like to stamp
some custom metadata
to each event, for example name of the service that produced the event,
such that at the receiving end
(collector), I can identify the service that produced the events. i.e.
Ideally, in myCustomSink(),
I expect to do:

com.cloudera.flume.core.Event.get("KEY_1")
com.cloudera.flume.core.Event.get("KEY_2")
com.cloudera.flume.core.Event.get("KEY_3")
...
...
com.cloudera.flume.core.Event.get("KEY_M")

Where, KEY_1, KEY_2, ... , KEY_M are custom metadata that I stamped at the
source.

One solution to this is to write my own plugin that extends multitail(...)
to stamp the metadata, however
this doesn't seem to be straightforward from an implementation standpoint.

Is there an alternate easy method that I could instead follow? Note that I
can't use a decorator along with
the rpcSink to stamp the metadata. This is because some of the metadata
directly depends on the source
service that produced the event, so by the time the event reaches the
decorator the identity of its source
is already lost.


Thanks!
-Kowshik

Mime
View raw message