flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Chavez <pcha...@verticalsearchworks.com>
Subject RE: Get Flume 'bad' event out of channel.
Date Wed, 05 Jun 2013 16:34:51 GMT
Josh,

I'm assuming by 'bad' event you mean one that does not have the required headers for tokenized
paths. If that's the case there are two potential ways to solve this.

One way is to use multiplexing channel selectors, then you can setup a default path that handles
any events missing the header(s). This gets unwieldy fast though if you are routing with multiple
headers. I used this method for awhile but eventually abandoned it since I use 3 headers to
route events.

The second way is to have a static interceptor on your first source that has 'preserveExisting'
set to true (which is default behavior). In my case we use two 'type' fields and I just have
an interceptor set the value 'MissingLogType', etc for each possible header. Since I bucket
by these header values I can quickly find corrupt events this way. I use a timesampt interceptor
in much the same way, except in that case it'll stamp the event with whenever the source first
saw it. This can result in an event being bucketed in the wrong date/time partition but that's
better than it gumming up the whole data flow.

Hope that helps,
Paul Chavez

-----Original Message-----
From: Josh [mailto:josh.myers@mydrivesolutions.com] 
Sent: Wednesday, June 05, 2013 3:53 AM
To: user@flume.apache.org
Subject: Get Flume 'bad' event out of channel.

Hi Guys,

I know this was covered back in May (not so long ago) but was wondering if there has been
any movement on this?

We have written a custom serializer to take data from an http data source using the JSON handler.
The data source gets sent JSON from our pipeline, which checks that all needed headers are
present for serialization and raises exceptions if not, but we have seen a few events come
in that cannot be serialized due to missing parts of JSON or any number of other reasons.
Currently I can't see a way to get these out of the channel without:

a) chucking out the whole channel and everything in it.
b) attaching a custom sink/serializer to the channel which is not so fussy to pass the event.

Neither of these really seem like great options. We are using file channels and all data that
is written to disk looks to be in binary format. If needed, as a last resort, could we write
a tool to pull java objects out of a channel and write the rest back into the channel? Are
there any plans to implement anything of this kind already? 

As previously suggested I would be nice to be able to:

a) Dump the event to a data file and throw a warning in the log messages?
b) Throw the event away
c) Move the event to an alternate channel where it can be handled differently 



Thanks,
Josh
--
www.mydrivesolutions.com

This email and any attachments is private and confidential. If you have received this message
in error please remove it from your systems and notify the author.
MyDrive Solutions Limited is registered in England and Wales, No 07330334. 
Registered office: Surrey Technology Centre, 40 Occam Road, Guildford GU2 7YG, UK

Mime
View raw message