I'm trying to use Flume to tail logs from multiple log files on the disk. Each log file
corresponds to a service in the system that emits events. The obvious choice Flume source choice
was multitail(....), and my flume agent configuration would be :
mutlitail(logfile1, logfile2, logfile3, ..., logfileN) | rpcSink("xxx_host", xxx_port)
The receiver of the above events is a flume collector whose configuration is:
collectorSource(xxx_port) | myCustomSink()
Here, myCustomSink() is a flume plugin that I've developed to process events in a certain way.
Other than just tailing events from all log files, I would like to stamp some custom metadata
to each event, for example name of the service that produced the event, such that at the receiving end
(collector), I can identify the service that produced the events. i.e. Ideally, in myCustomSink(),
I expect to do:
Where, KEY_1, KEY_2, ... , KEY_M are custom metadata that I stamped at the source.
One solution to this is to write my own plugin that extends multitail(...) to stamp the metadata, however
this doesn't seem to be straightforward from an implementation standpoint.
Is there an alternate easy method that I could instead follow? Note that I can't use a decorator along with
the rpcSink to stamp the metadata. This is because some of the metadata directly depends on the source
service that produced the event, so by the time the event reaches the decorator the identity of its source
is already lost.