flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Otto <o...@wikimedia.org>
Subject Need for UDP / Multicast Source
Date Mon, 14 Jan 2013 17:29:24 GMT
Hi all,

I'm an Systems Engineer at the Wikimedia Foundation, and we're investigating using Flume for
our web request log HDFS imports.  We've previously been using Kafka, but have had to change
short term architecture plans in order to get data into HDFS reliably and regularly soon.

Our current web request logs are available for consumption over a multicast UDP stream.  I
could hack something together to try and pipe this into Flume using the existing sources (SyslogUDPSource,
or maybe some combination of socat + NetcatSource), but I'd rather reduce the number of moving
parts.  I'd like to consume directly from the multicast UDP stream as a Flume source.

I coded up proof of concept based on the SyslogUDPSource, mainly just stripping out the syslog
event header extraction, and adding in multicast Datagram connection code.  I plan on cleaning
this up, and making this a generic raw UDP source, with multicast being a configuration option.

My question to you guys is, is this something the Flume community would find useful?  If so,
should I open up a JIRA to track this?  I've got a fork of the Flume git repo over on github
and will be doing my work there.  I'd love to share it upstream if it would be useful.

-Andrew Otto
 Systems Engineer
 Wikimedia Foundation

View raw message