flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gonzalo Herreros <gherre...@gmail.com>
Subject Re: Kafka Sink, bad distribution of data in the partitions.
Date Tue, 15 Dec 2015 08:14:19 GMT
Unless you are using a custom partitioner, the DefaultPartitioner assigns
them randomly so the content of the headers shouldn't make any difference.
The only explanation I can see for what you are seeing is that somehow the
producer thinks there are only 2.
Are the msgs going just to 0 and 1 or different numbers? Can you try with
another topic and see if that happens too?

How are you checking where are the msg going?


On 14 December 2015 at 22:52, Guillermo Ortiz <konstt2000@gmail.com> wrote:

> I'm using a an architecture as:
> Logs --> SpoolDir -->MemChannel --> AvroSink  -->
> AvroSource --> MemChannel --> KafkaSink.
> I have a cluster with three kafka nodes and have created a topic with six
> partitions and replication factor one to make a POC.
> I have seen that 95% of the data goes to two partitions, these two
> partitions are in the same kafka node. I am not creating a "key" header on
> my events in Flume. So, reading the documentation the key is generated
> randomly. The messages are logs from different sources. Is it normal this
> behavior?

View raw message