flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Percy <mpe...@apache.org>
Subject Re: Analysis of Data
Date Fri, 08 Feb 2013 08:56:57 GMT
Good to hear more of your thoughts. Please see inline.

On Thu, Feb 7, 2013 at 8:55 PM, Nitin Pawar <nitinpawar432@gmail.com> wrote:

I can understand  the idea of having data processed inside flume by
> streaming it to another flume agent. But do we really need to re-engineer
> something inside flume is what I am thinking? Core flume dev team may have
> better ideas on this but currently for streaming data processing storm is a
> huge candidate.
> flume does have have an open jira on this integration FLUME-1286<https://issues.apache.org/jira/browse/FLUME-1286>

Yes, a Storm sink could be useful. But that wouldn't preclude us from
taking a hard look at what may be missing in Flume itself, right?

It will be interesting to draw up the comparisons in performance if the
> data processing logic is added to to flume. We do see currently people
> having a little bit of pre-processing of their data (they have their own
> custom channel types where they modify the data and sink it)

It sounds like you have some experience with Flume. Are you guys using it
at Rightster?

I work with a lot of folks to set up and deploy Flume, many of which do
lookups / joins with other systems, transformations, etc. in real time
along their data ingest pipeline before writing the data to HDFS or HBase
for further processing and archival. I wouldn't say these are really heavy
number crunching implementations in Flume, but certainly i see a lot of
inline parsing, inspection, enrichment, routing, and the like going on. I
think Flume could do a lot more, given the right abstractions.


View raw message