flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arvind Prabhakar <arv...@apache.org>
Subject Re: Flume-NG Channels
Date Thu, 12 Jan 2012 17:34:13 GMT
Praveen,

While I agree with you that this should be a first class concept, I do feel
that it does not merit changing the event interface (to specify a category
for example).

On the other hand, the header namespace "flume.*" will be reserved for
flume internal handling and routing - so that could be considered a
standardization of sorts - making it as close to a first class concept as
possible without requiring an interface change.

Regarding your suggestion of prefix/suffix comparison - that could well be
a selector implementation. So out of the box we will have three selectors -
a replicating selector, a fixed string mapping selector, and a
prefix/suffix match selector.

In using the prefix/suffix match selector, I want to highlight Ralph's
earlier comment on the performance impact of doing any non-trivial
processing at the point of de-multiplexing. Any overhead will likely result
in performance impact that could create significant backup for the upstream
flow.  That said, I don't mind having an implementation as long as the
performance tradeoffs are well understood.

I have created FLUME-930
<https://issues.apache.org/jira/browse/FLUME-930>to track this
requirement. Lets continue the discussion on this JIRA hence
forth.

Thanks,
Arvind

On Thu, Jan 12, 2012 at 3:49 AM, Praveen Ramachandra <
praveen_ramachandra@yahoo.com> wrote:

> Awesome that works
>
> two things comes to my mind
>
> 1. From a practical point-of-view we should have the ability to have
> prefix/suffix (full regex could be an overkill) specified to map to channels
> 1.1 e.g., "instrument.*" --> "ticker channel" will map "instrument.stock"
> and instrument.mutual_fund to "ticker channel"
> 2. From a product perspective this should be a first class "concept" like
> channel, source, sink etc., some candidates that comes to my mind "flow",
> "category", "e2echannel" (yak :-), might be source of confusion  ) etc.,
> 2.1 as in event "flow", "event" category etc.,
>
>
> --
> Regards,
> Praveen Ramachandra
>
>
>    ------------------------------
> *From:* "arvind@cloudera.com" <arvind@cloudera.com>
> *To:* flume-user@incubator.apache.org
> *Sent:* Thursday, January 12, 2012 11:34 AM
> *Subject:* Re: Flume-NG Channels
>
> Point taken Ralph - avoiding a for-loop within the implementation of a
> channel selector is important for performance. In this particular case
> (that Praveen describes), the channel selector will be making a mapping
> based decision. For example:
>
> header value  --> channel
> "stock" --> "ticker channel"
> "temp" --> "weather channel"
>
> All of this information will be statically configured for the agent and so
> the selector will be able to configure itself during initialization and
> create this mapping. Once setup, when an event arrives, the lookup will be
> constant time to figure out which channel must be used (hashtable/hashmap).
>
> Do you see any issues with such implementation?
>
> Thanks,
> Arvind
>
>
> On Wed, Jan 11, 2012 at 9:24 PM, Ralph Goers <ralph.goers@dslextreme.com>wrote:
>
> One thing I've learned from working on Log4j 2.0 is that for loops are
> actually a lot slower than you might think. In a configuration that desires
> a single channel there should be no for loop. Instead, it should go
> directly to the channel. In the case of multiple channels then the
> "channel" that is selected should be a multiplexing channel that is
> configured with other channels. The for loop (or while loop) is in the
> multiplexing channel.
>
> Thus, your ChannelSelector could (and should) in fact, be a Channel that
> can select any or all of its configured channels.
>
> FWIW, in Log4j 2 in the XML configuration you would specify
>
> <RollingFileAppender name="MainAppender" ...>
>   <MarkerFilter marker="MyMarker"/>
> </RollingFileAppender>
>
> or
>
> <RollingFileAppender" name="MainAppender" ...>
>   <filters>
>     <MarkerFilter marker="MyMarker"/>
>     <ThresholdFilter level="DEBUG"/>
>   <filters>
> </RollingFileAppender>
>
> The filters element is actually a CompositeFilter that invokes each of its
> configured filters in turn.
>
> Ralph
>
> On Jan 11, 2012, at 5:55 PM, Arvind Prabhakar wrote:
>
> Hi Praveen,
>
> Here is what I could muster up after some thought on this use-case:
>
>
>    - We modify the source interface to accept a "Channel Processor", a
>    new component that is responsible for putting the event into one or more
>    channels.
>    - A channel processor will delegate the selection of the channel to
>    place the event on via another component called "Channel Selector" which is
>    responsible for selecting the appropriate channel from the list of channels
>    the source is configured with.
>    - The default implementation of channel selector in the channel
>    processor will be a "replicating channel selector" which will result in the
>    event being copied over to all configured channels.
>    - Another implementation of the channel selector will be "Mapping
>    Channel Selector" which will allow events to be mapped to different
>    channel(s) based on the value of a specified header.
>
>
> With this facility, you will be able to inject headers into events at the
> point of origination and then configure the mapping channel selector at
> each source in the pipeline to place the event on separate channels as
> desired based on the value of the header.
>
> Do you think this will adequately address your use case? If not, what do
> you think is missing here.
>
> Thanks,
> Arvind
>
>
> On Tue, Jan 10, 2012 at 8:03 PM, Praveen Ramachandra <
> praveen_ramachandra@yahoo.com> wrote:
>
> Hi
>
> Security is not the reason for isolation.
>
> Isolation could be used to realize quite a few quality attributes of the
> system, e.g., many aspects of QoS.
>
> Regardless, if we have specific event handling requirement that are
> different for each "kind" of data the question is how do one realize it
> using flume-ng.
>
> As it stands currently, sources/sinks & channels are tied to the hip,
> which is fine. Only issue is requiring to allocate dedicated host/port to
> achieve.
>
>
> As I had mentioned in my first email, one could develop custom
> sources/sinks and configuration that goes along with to mux/demux events
> that are flowing through the system.
>
> Question to ask ourself is, why is there a need to have a change in
> deployment to accommodate a new "flow" in the system.
>
>
>
> --
> Regards,
> Praveen Ramachandra
>
>   ------------------------------
> *From:* Ralph Goers <ralph.goers@dslextreme.com>
> *To:* flume-user@incubator.apache.org
> *Sent:* Tuesday, January 10, 2012 6:01 PM
> *Subject:* Re: Flume-NG Channels
>
> When you speak of flow isolation are you doing that for security, failure
> protection or for some other reason?  From a failure protection case you
> would need physically different Flume agents, not just channels. I'm not
> sure what the security gains are in isolation, if any.
>
> I guess to give you a proper response I would want to know what your
> actual requirements are and possibly why.
>
> For what its worth, I also work in a multi-tenant environment and this has
> never been a requirement.
>
> Ralph
>
>
>
> On Jan 10, 2012, at 12:42 AM, Praveen Ramachandra wrote:
>
> Hi arvind,
>
> Thanks for responding.
>
> if we want to model separation not only in transit but also at rest i.e.,
> if channel has a filechannel/jdbcchannel/memorychannel backing separation
> is required when data resides in those channels before they are shipped to
> the next hop.
>
> on multi-tenant, I was trying to figure out from isolation perspective.
> Flow isolation is required from one collecting agent tier, to aggregating
> agent-tier and a tier that is going to deposit/deliver the events.
>
> "How do you propose the platform be modified in order to support this
> use-case?" you ask, Thinking out loud now :-).
> One option is to have a notion of a flow that is visible at flume-ng
> level, applications will map channels to flows and sources/sinks across
> agent tiers, can mux/demux it appropriately.
>
> This will also decouple mapping across agent tiers i.e.,
>
> If you smell scribe in my above description, I wouldn't hold it against
> you :-). Honestly the simplicity of scribe let us prototype for our use
> case in a matter of hour or two, compared to many days that it took to get
> almost similar thing prototyped with flume. We even struggle today to model
> the use cases seamlessly in flume (og or ng).
>
>
> --
> Regards,
> Praveen Ramachandra
>
>
>
>
>   ------------------------------
> *From:* Arvind Prabhakar <arvind@apache.org>
> *To:* flume-user@incubator.apache.org; Praveen Ramachandra <
> praveen_ramachandra@yahoo.com>
> *Sent:* Monday, January 9, 2012 11:15 PM
> *Subject:* Re: Flume-NG Channels
>
> Hi Praveen,
>
> First to your question:
>
> > Did I get the modeling right with flume-ng
>
> More-or-less yes. The one distinction that I would like to point out
> is that having separate source-sink end points for individual channels
> is stemming more from your requirement than by design of flume. A
> channel in flume implementation does not care how many sources write
> to it or how many sink's read from it.
>
> > 2. Is there a better way to do it at a platform level
> >             2.1 I know if I can write a bunch of custom sinks/sources and
> > embed a notion of channel to which each events belong to in the message,
> I
> > can effectively mux and demux the events at either ends.
>
> The key issue here is the layering of a multi-tenant semantic on top
> of flows. Since fundamentally flume is not aware of the contents of
> the events in a flow, and does not expose any client auth/id model -
> there is no inherent support of doing this out of the box.
>
> Moreover, from your description it seems that the channels that
> logically separate out the flows will operate within the same agent.
> If that is the case, then it may be a better option to use a single
> channel and have a multiplexing terminal sink that can route the
> messages to the correct destination.
>
> >             2.2 Which means the default support for channel is also not
> of
> > much use
>
> How do you propose the platform be modified in order to support this
> use-case?
>
> Thanks,
> Arvind
>
>
>
> On Mon, Jan 9, 2012 at 9:36 PM, Praveen Ramachandra
> <praveen_ramachandra@yahoo.com> wrote:
> > They are in low 100's in the best case scenario, and could be in 1000 in
> the
> > worst case scenario.
> >
> > I believe this aspect can be pretty much shielded from application if the
> > underlying platform has the right set of responsibilities.
> >
> >
> > --
> > Regards,
> > Praveen Ramachandra
> >
> >
> >
> > ________________________________
> > From: Ralph Goers <ralph.goers@dslextreme.com>
> > To: flume-user@incubator.apache.org
> > Sent: Monday, January 9, 2012 6:53 PM
> > Subject: Re: Flume-NG Channels
> >
> >
> > On Jan 9, 2012, at 2:28 AM, Praveen Ramachandra wrote:
> >
> > Hi,
> >
> > We were trying to design a multi-tenanted system using flume-ng, where
> each
> > logically independent data set is modelled through a channel going
> through
> > the system of collectors, aggregators and delivery agents (to end
> > destination). Each channel will carry data that logically belong
> together.
> > The requirement is that we should be able to bring up and tear down a
> > channel with ease.
> >
> >
> > When we completed the exercise, it turned out that we have to run a
> separate
> > Source/Sink, at a designated host/port combination for each channel. The
> > issue with this is that, it is an operational overhead that we have work
> > with net-ops to punch holes in the firewall to let tcp traffic flow on
> > non-standard ports. I would imagine that it would be the case in many
> > organizations as well.
> >
> > Two questions.
> >
> > 1. Did I get the modeling right with flume-ng
> > 2. Is there a better way to do it at a platform level
> >             2.1 I know if I can write a bunch of custom sinks/sources and
> > embed a notion of channel to which each events belong to in the message,
> I
> > can effectively mux and demux the events at either ends.
> >             2.2 Which means the default support for channel is also not
> of
> > much use
> >
> >
> > What is your target destination(s) for the tenants?  Can they all flow
> > through a single channel in Flume and then be delivered to the correct
> > destination by a smarter sink at the end?
> >
> > Ralph
> >
> >
>
>
>
>
>
>
>
>
>
>
>

Mime
View raw message