flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashish <paliwalash...@gmail.com>
Subject Re: Spooling source
Date Mon, 23 Feb 2015 06:58:11 GMT
AFAIK, interceptor is the best way without touching the code.

How bad does the performance becomes? Must have collected some
numbers, if possible can you share.

The other way is touch SpoolDirectory source code and save folder name
in a variable. The Source opens each file for processing, during that
time you can parse the set the folder name and keep on adding it to a
specific header.

The other way it to keep track of folder in interceptor, and update it
when the file name changes. It means you still have to check each
event. If you can come up with an efficient solution to detect diff in
file name, perf hit would go down.

HTH !

On Wed, Feb 11, 2015 at 9:47 AM, mahendran m <mahendranec@hotmail.com> wrote:
> Hi ,
>
> I am moving logs from local machine to HDFS server using flume with spooling
> directory. Each log contain lacks of lines
>
> My use case is below
>
> Log file name foldername-filename-timestamp.suffix  example file name is
> LogFiles-Log1-1463238298.log
>
> my CONF is below
>
> a1.sinks = k1
> a1.channels = c1
>
> #the source
>
> a1.sources.r1.type = spooldir
> a1.sources.r1.spoolDir  = F:\\SpoolingDirectory
> a1.sources.r1.deletePolicy=immediate
> a1.sources.r1.fileHeader = true
> a1.sources.r1.interceptors = i1
> a1.sources.r1.interceptors.i1.type =
> com.company.CustomInterceptor.CustomInterceptor$Builder
>
> #the sink
> a1.sinks.k1.type = hdfs
> a1.sinks.k1.hdfs.fileType = DataStream
> a1.sinks.k1.hdfs.fileSuffix= .txt
> a1.sinks.k1.hdfs.path  =
> hdfs://localhost:9000/spoolingdirectory/{foldername}
>
> #Channel
> a1.channels.c1.type = memory
> a1.channels.c1.capacity = 10000
> a1.channels.c1.transactionCapacity = 1000
>
> #Flow
> a1.sources.r1.channels = c1
> a1.sinks.k1.channel = c1
>
>
> in the custom interceptor we will process the file hear and extract the
> folder name and add this as {foldername} header it is use in hdfspath. What
> problem we are facing is  for single file with lacks line this interceptor
> extract the same folder name for lacks of time  this will leads very high
> performance degradation.
>
> Is there any way to handle my case without performing the same file header
> for lacks time ?
>
> thanks.
>
>



-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

Mime
View raw message