flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guillermo Ortiz <konstt2...@gmail.com>
Subject Kafka Sink, bad distribution of data in the partitions.
Date Mon, 14 Dec 2015 22:52:25 GMT
I'm using a an architecture as:
Logs --> SpoolDir -->MemChannel --> AvroSink  -->
AvroSource --> MemChannel --> KafkaSink.

I have a cluster with three kafka nodes and have created a topic with six
partitions and replication factor one to make a POC.

I have seen that 95% of the data goes to two partitions, these two
partitions are in the same kafka node. I am not creating a "key" header on
my events in Flume. So, reading the documentation the key is generated
randomly. The messages are logs from different sources. Is it normal this

View raw message