flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Souvik Bose <souvik.b...@delgence.com>
Subject Re: Exception Handling with Flume
Date Tue, 09 Dec 2014 07:00:40 GMT
Hi Hari,
Thanks you for replying on my question. You are absolutely right, I am 
using only one channel for both the sinks which is causing the problem. 
Thanks for pointing that out, One problem is solved.
For spooldirectory, I am processing the files directly using my own 
custom interceptor. Here is the config for the source:

dnAgent.sources.gpslog.type = spooldir
dnAgent.sources.gpslog.spoolDir = /home/ktspool
dnAgent.sources.gpslog.batchSize = 500
dnAgent.sources.gpslog.channels = MemChannel
dnAgent.sources.gpslog.fileHeader = true
dnAgent.sources.gpslog.deletePolicy = immediate
dnAgent.sources.gpslog.useStrictSpooledFilePolicies = false
dnAgent.sources.gpslog.interceptors = KTFlowProcessInterceptor

Generally this works great if everything is okay. But the problem is the 
gps provider doesn't have full control on what comes in so sometimes 
blank file with 0 bytes size comes in which causes flume to stop 
processing with exception and I have to manually restart the flume.

P.S: I am using flume 1.4.0 on cdh 4.4.0 on 4 data nodes in EC2.

Thanks & Regards,
On 12/8/2014 11:36 PM, Hari Shreedharan wrote:
> You are likely reading from the same channel for both sinks. That 
> means only one sink gets your data. You’d need to have 2 channels 
> connected to the same source and each sink get its own channel.
> About the Spool Dir not processing data, what format/serializer etc 
> are you using?
> Thanks,
> Hari
> On Mon, Dec 8, 2014 at 3:37 AM, Souvik Bose <souvik.bose@delgence.com 
> <mailto:souvik.bose@delgence.com>> wrote:
>     Hello All,
>     I am stuck with a problem with flume version 1.4.0. I am using
>     spooldirectory source with a custom interceptor to process encoded
>     gps files and save it in hdfs and solr (using morphline solr
>     sink). The main informtion is stored on the file name itself which
>     is coming in on the spool directory and the content is irrelevant.
>     So I am using the custom interceptor to extract and transform the
>     file header and store the extracted data in Json format as the
>     output of the event.
>     My problem comes in:
>     1. When there is a 0 byte file comes in (generally files come in
>     with a "!" symbol in the content) flume stops and throws an
>     exception. We don't need the content of the file in any case, but
>     still face exception as flume cannot handle 0 byte files.
>     2. When there is content with some weird characters like !ƒ!,
>     flume stops with exception
>     3. Even when everything is running fine, I am losing some data/
>     events. On closer introspection I found that some are available in
>     hdfs but not in solr and vice versa. I am not using any processor
>     sinkgroups like failover or load balancing. Is it because of that?
>     I want to achieve a solution where I can handle any exceptions and
>     the file/data which causes the exception is discarded and flume
>     processes the next file in the spool directory. The date comes in
>     at high velocity 100 files every seconds. So manually deleting the
>     file and retstarting flume is the regular practice I do to keep
>     everything back on track. But I am sure there must be some better
>     ways to handle this case. Can you guys please suggests some better
>     alternatives for my approach please//?/
>     Thanks & Regards,
>     Souvik Bose
>     ///

Met vriendelijke groeten / Mit freundlichen Grüßen / With kind regards,

Delgence | Delivering Intelligence
Delivering high quality IT solutions.

*Souvik Bose*

Development Office:
Rishi Tech Park Office No. E -3, Premises No. 02-360 Street No. 360 New 
Town Rajarhat
Kolkata-700156. India

Europe Office:
Liessentstraat 9a, 5405 AH  Uden
The Netherlands

*T*+91 9831607354 | T +31 616392268 | *
E* Souvik.bose@delgence.com <mailto:Souvik.bose@delgence.com> | *W* 
www.delgence.com <http://www.delgence.com>

/This communication and any attachments hereto may contain confidential 
information. Unauthorized use//
//or disclosure to additional parties is prohibited. If you are not an 
intended recipient, kindly notify the sender//
//and destroy all copies in your possession/

View raw message