flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sharninder <sharnin...@gmail.com>
Subject Re: Batchsize in kafka sink
Date Mon, 28 Sep 2015 02:03:03 GMT
That does make sense. Thanks Gonzalo. We do use the async producer with a
default kafka num.messages. We don't care about a few messages being lost
in the event of a crash or something so I think we'll continue using the
async producer but picking up an X number of messages in a single
transaction will surely help withe reducing IO on the flume server.

Thanks a lot.

--
Sharninder


On Sun, Sep 27, 2015 at 3:16 PM, Gonzalo Herreros <gherreros@gmail.com>
wrote:

> There are subtle but significant differences.
>
> When you configure in the sink: "batchSize" you are specifying how many
> messages are taken as a transaction from the channel at once (like in any
> other sink).
> While the Kafka property "batch.num.messages" (which in the flume config
> is specified as "kafka.batch.num.messages", specifies the batch size for
> sending messages to the broker from an asynchronous producer. By default
> the producer is synchronous, so that configuration property would do
> nothing.
>
> If you use the synchronous producer (which is default), the messages taken
> from the channel as a batch (100 by default) will be sent together to the
> kafka broker.
> However, if you change the producer to async then it's more complicated,
> by default "kafka.batch.num.messages" is 200 so it means that the Sink
> will take 100 from the channel and commit that but those messages will be
> kept in memory until another 100 are taken (so there is a risk of losing
> messages).
>
> I would stay away for the async producer in a Flume sink because you want
> the sink to control the pace (a file or memory channel will be faster) so
> it doesn't need to buffer in memory risking message loss. An async producer
> is useful when the client is an online application you don't want to delay.
>
> Answering you question: if you don't specify any batching properties, by
> default it will delivery messages in batches of 100, which is probably good
> in most cases.
> Hope that makes sense.
>
> Regards,
> Gonzalo
>
>
> On 26 September 2015 at 05:19, Sharninder <sharninder@gmail.com> wrote:
>
>> Anyone ?
>>
>> > On 25-Sep-2015, at 3:51 PM, Sharninder <sharninder@gmail.com> wrote:
>> >
>> > Hi,
>> >
>> > We want to move to the built-in kafka sink from our own custom
>> implementation and I have a question about the batchsize config parameter.
>> >
>> > Looking at the code of the sink, I can tell that the batchsize is used
>> to construct the list of keyed messages fed to the producer.
>> >
>> > My question is what is the difference between this variable and the
>> kafka batch.num.messages parameter?
>> >
>> > Is the flume parameter necessary ?
>> >
>> > --
>> > Sharninder
>> >
>> >
>>
>
>


-- 
--
Sharninder

Mime
View raw message