flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Shreedharan <hshreedha...@cloudera.com>
Subject Re: Exec source doesn't flush the last data
Date Wed, 13 Mar 2013 05:02:52 GMT
That is a known issue. I believe there was some feedback on the jira. Unfortunately, it has
not been incorporated. I will get to it at some point and fix it :-)

-- 
Hari Shreedharan


On Tuesday, March 12, 2013 at 8:33 PM, larryzhang wrote:

> Hi, 
>    I did a simple test about exec source, and found it didn't flush the last data. Here's
the steps:
>    a. create the source file 1.test, which has sequence number from 1 to 15, like this:
>     ----------
>           1
>           2
>          ...
>           15
>    ----------
>    b. create the configure file flume_simple.conf like this:
> -------------------------
>        a1.sources = r1
> a1.channels = c1
> a1.sinks = k1
> 
> a1.sources.r1.type = exec
> a1.sources.r1.command = tail -n +0 -F /opt/scripts/tvhadoop/flume/flume-1.3.0/source/1.test
> a1.sources.r1.channels = c1
> a1.sources.r1.batchSize = 10
> 
> a1.channels.c1.type = memory
> 
> a1.sinks.k1.type = file_roll
> a1.sinks.k1.channel = c1
> a1.sinks.k1.sink.directory = /opt/scripts/tvhadoop/flume/flume-1.3.0/sink
> ---------------------
>      c. run flume with command: 
>              bin/flume-ng agent --conf conf -f conf/flume_simple.conf -Dflume.root.logger=DEBUG,console
-n a1
> 
>      After more than 1 minute(file roll interval), I check the output directory, there
are 2 files, one has number from 1 to 10, and the other has nothing. 
>      I think this is because the batchSize was set to 10, the the last 5 numbers didn't
get flushed and lost. Even I apply the patch in 'https://issues.apache.org/jira/browse/FLUME-1819',
nothing changed. If I debug into the code, I found the red codes outside while clause never
get executed. 
>        ----------------
>           while ((line = reader.readLine()) != null) {
>             counterGroup.incrementAndGet("exec.lines.read");
>             eventList.add(EventBuilder.withBody(line.getBytes()));
>             if(eventList.size() >= bufferCount) {
>               channelProcessor.processEventBatch(eventList);
>               eventList.clear();
>             }
>           }
>           if(!eventList.isEmpty()) {
>             channelProcessor.processEventBatch(eventList);
>           }
>        --------------
>     In my scenario, the source log files are divided by hour, so I need to change the
file name in flume configure file. Because of the above bug, I can only set the batchsize
of execSource to 1, which significantly slowdown the through pass.   I wonder how to solve
this problem. Any suggestions are most welcomed.
> Best Regards,
> larry
>      
>      
> 


Mime
View raw message