flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Lord <jl...@cloudera.com>
Subject Re: Flume HDFS Sink: Dynamic Path format for IP Address
Date Tue, 04 Nov 2014 22:15:30 GMT
Hi Traino,

The syslog multiport source should automatically build the event using the
hostname from the syslog message. From there you can just use the macro on
your hdfs sink to use the value of the hostname event header.

e.g.

agent.sinks.sink-1.hdfs.path = /user/flume/Syslog/%{host}/

Hope this helps.

-Jeff


On Sat, Nov 1, 2014 at 12:30 PM, Traiano Welcome <traiano@gmail.com> wrote:
>
> Hi List
>
>
> I've configured flume to accept remote syslogs from rsyslog on a number
of hosts, and am currently using an HDFS sink, specified as follows in the
agent config:
>
> #sink for syslog udp collection
> tier1.sinks.sink2.type         = hdfs
> tier1.sinks.sink2.channel      = channel2
> tier1.sinks.sink2.hdfs.path         =
hdfs:///tmp/remote-syslogs/%y-%m-%d/%H%M/%S
> tier1.sinks.sink2.hdfs.fileType     = DataStream
> tier1.sinks.sink2.hdfs.writeFormat  = Text
> tier1.sinks.sink2.hdfs.rollSize     = 0
> tier1.sinks.sink2.hdfs.rollCount    = 10000
> tier1.sinks.sink2.hdfs.rollInterval = 600
>
> Syslog entries are being collected and written to hdfs with the directory
structure generated as specified above, i.e  ...
tmp/remote-syslogs/%y-%m-%d/%H%M/%S however, I'd like to have the dynamic
path
> generated include the ip address of the remote host sending the syslog to
the source, as in something like:
>
> tier1.sinks.sink2.hdfs.path         =
hdfs:///tmp/remote-syslogs/%HOSTNAME/%y-%m-%d/%H%M/%S
>
> If a parameter like %HOSTNAME is at all possible.
>
> My question: Is there a selector or other parameter supported by flume
that I could use for this?
>
> I've looked in the user guide's section on hdfs sink specification, but
it does not seem to address other possibilities for dynamic path format,
the only similar feature I can see is the"interceptor", of which the "host
interceptor seems similar to what I have in mind:
>
> http://flume.apache.org/FlumeUserGuide.html#host-interceptor
>
> ... however, that seems to apply to the source's agent only.
>
> Could there be an interceptor configuration that would extract the
sending address or hostname from an incoming syslog packet to the agent
source?
>
> Many thanks in advance,
> Traiano

Mime
View raw message