flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Shreedharan <hshreedha...@cloudera.com>
Subject Re: Need for UDP / Multicast Source
Date Mon, 14 Jan 2013 17:37:06 GMT
Hi Andrew, 

Really happy to hear Wikimedia Foundation is considering Flume. I am fairly sure that if you
find such a source useful, there would definitely be others who find it useful too. I'd recommend
filing a jira and starting a discussion, and then submitting the patch. We would be happy
to review and commit it. 


Hari Shreedharan

On Monday, January 14, 2013 at 9:29 AM, Andrew Otto wrote:

> Hi all,
> I'm an Systems Engineer at the Wikimedia Foundation, and we're investigating using Flume
for our web request log HDFS imports. We've previously been using Kafka, but have had to change
short term architecture plans in order to get data into HDFS reliably and regularly soon.
> Our current web request logs are available for consumption over a multicast UDP stream.
I could hack something together to try and pipe this into Flume using the existing sources
(SyslogUDPSource, or maybe some combination of socat + NetcatSource), but I'd rather reduce
the number of moving parts. I'd like to consume directly from the multicast UDP stream as
a Flume source.
> I coded up proof of concept based on the SyslogUDPSource, mainly just stripping out the
syslog event header extraction, and adding in multicast Datagram connection code. I plan on
cleaning this up, and making this a generic raw UDP source, with multicast being a configuration
> My question to you guys is, is this something the Flume community would find useful?
If so, should I open up a JIRA to track this? I've got a fork of the Flume git repo over on
github and will be doing my work there. I'd love to share it upstream if it would be useful.
> Thanks!
> -Andrew Otto
> Systems Engineer
> Wikimedia Foundation

View raw message