flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evan Chan ...@ooyala.com>
Subject Re: Roadmap / Partitioning by key
Date Mon, 27 Jun 2011 16:51:44 GMT

Thanks for the response, my replies are inlined.

On Sun, Jun 26, 2011 at 11:07 PM, Jonathan Hsieh <jon@cloudera.com> wrote:

> Evan,
> A basic ability to demultiplex (demux) events exists today but is only
> available for writing files to different dirs in HDFS.  The ability to do
> content-based routing for computational purposes is not currently on the
> road map.  While architecturally be possible to demux in Flume, Flume is
> currently focused on sending data from many places to a few.
> Can you describe your use case or what you would want to do with if you had
> this capability?  This would help us frame this discussion.

Let's say we wanted to do some data aggregation according to a key such as
IP address or domain name.   Since the number of keys is large, aggregation
within each node would not be very efficient unless the number of keys per
node could be reduced....  or there was some sort of really fast distributed
key cache that exists across all nodes.

> If there are a small finite number of  categories, demuxing could
> potentially built as plugins for today's Flume.  For something more general
> or adaptive, a larger development effort would be required.

I might be interested in helping with that effort, and what that entails,
but this may be a discussion more appropriate for the dev mailing list.

> Another approach that could be done today would be to send data from Flume
> to a system that does demux and custom routing (starting to go down the
> complex-event-processing path)..
> 1) Flume could potentially connect to S4 and deliver it data.  Flume could
> have a path that delivers to hdfs, and have another copy sent to S4.

Ah yes, S4.  At this point though, it seems Flume is more mature than S4.

> 2) Flume could send data to FlumeBase (a system built on top of Flume)
> which may (or may not) provide this capability.

FlumeBase doesn't have much documentation, so from what I can tell it
wouldn't have this capability.

> 3) Flume could send data to an open-source system called Esper. (I don't
> know much about it currently)

Esper does sound interesting, but I believe it is single-node.


> Jon
> On Sat, Jun 25, 2011 at 6:31 PM, Evan Chan <ev@ooyala.com> wrote:
>> Hi Flume community,
>> I hope that the incubator list is being read....  hello to everyone, I'm
>> new to Flume.
>> Is there a roadmap for future development of Flume?
>> I'm interested in particular to see if the ability to have a sink that can
>> route events to different nodes based on a key (something that Yahoo S4 can
>> do) will be in the roadmap, and how hard it would be to develop a feature
>> like that.
>> thanks!
>> Evan
>> --
>> --
>> *Evan Chan*
>> Senior Software Engineer |
>> ev@ooyala.com | (650) 996-4600
>> www.ooyala.com | blog <http://www.ooyala.com/blog> | @ooyala<http://www.twitter.com/ooyala>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com

*Evan Chan*
Senior Software Engineer |
ev@ooyala.com | (650) 996-4600
www.ooyala.com | blog <http://www.ooyala.com/blog> |

View raw message