flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Hsieh <...@cloudera.com>
Subject Re: flume data duplication
Date Tue, 06 Sep 2011 07:23:29 GMT
[sending to flume-user@incubator.apache.org, bcc flume-dev@cloudera.org]

Metamoi,

How are you sending data from tail to mongo?  My guess is that you have an
agent setup in E2E mode and then a collector that doesn't hvae a collector
sink or collector wrapping mongo.

agent:
source tail
sink agentSink(xxxx)

collector:
source: collectorSink
sink:  mongo

If this is the case, you need to wrap you mongo with a collector sink so
that acks get sent to tell the agent to stop resending data.

collector's sink should be: collector(30000) { mongoSink() }

Jon.
On Mon, Sep 5, 2011 at 11:02 PM, metamoi <metamoi@gmail.com> wrote:

> I use the following command:
> tail("/var/log/flume/test.log", startFromEnd="true")
>
> On Sep 6, 2:58 pm, metamoi <meta...@gmail.com> wrote:
> > I set an agent, which sent a new record per minute.
> > After five minutes, the agent sent five record to a collector, which
> > stored these data on the mongodb.
> > I think that there are five records in the collection (table in mysql)
> > of mongodb.
> > But there are 15 records in it.
> > At first insertion, there is only one record after a minute.
> > next, though after two minutes, agent sent another new record, there
> > are two records including first record.
> > So, there are three records in the collection of mongodb.
> >
> > In like manner, after five minutes, there are five records including
> > previous four records.
> >
> > In sum, 1 + 2 + 3 + 4 + 5 = 15 records are stored in the db.
> >
> > Is this a bug of flume?
> > There is anyone who ever met this kind of problem?
> >
> > Thanks in advance.
>



-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Mime
View raw message