flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Williams <jasonjwwilli...@gmail.com>
Subject Re: Kafka Sink random partition assignment
Date Tue, 07 Jun 2016 08:43:18 GMT
Hey Chris,

Thanks for help! 

Is that a limitation of the Flume Kafka sink or Kafka itself? Because when I use another Kafka
producer and publish without a key, it randomly moves among the partitions on every publish.


Sent via iPhone

> On Jun 7, 2016, at 00:08, Chris Horrocks <chris@hor.rocks> wrote:
> The producers bind to random partitions and move every 10 minutes. If you leave it long
enough you should see it in the producer flume agent logs, although there's nothing to stop
it from "randomly" choosing the same partition twice. Annoyingly there's no concept of producer
groups (yet) to ensure that producers apportion the available partitions between them as this
would create a synchronisation issue between what should be entirely independent processes.

> -- 
> Chris Horrocks
>> On 7 June 2016 at 00:32:29, Jason J. W. Williams (jasonjwwilliams@gmail.com) wrote:
>> Hi,
>> New to flume and I'm trying to relay log messages received over netcat source to
Kafka sink. 
>> Everything seems to be fine, except that Flume is acting like it IS assigning a partition
key to the produced messages though none is assigned. I'd like the messages to be assigned
to a random partition, so that consumers are load balanced.
>> * Flume 1.6.0
>> * Kafka
>> Flume config: https://gist.github.com/williamsjj/8ae025906955fbc4b5f990e162b75d7c
>> Kafka topic config: kafka-topics --zookeeper localhost/kafka --create --topic activity.history
--partitions 20 --replication-factor 1
>> Python consumer program: https://gist.github.com/williamsjj/9e67287f0154816c3a733a39ad008437
>> Test program (publishes to Flume): https://gist.github.com/williamsjj/1eb097a187a3edb17ec1a3913e47e58b
>> Flume agent listens on 3132tcp for connections, and publishes messages received to
the Kafka activity.history topic.  I'm running two instances of the Python consumer.
>> What happens however, is all logs messages get sent to a single Kafka consumer...if
I restart Flume (leave consumers running) and re-run the test, all messages get published
to the other consumer. So it feels like Flume is assigning a permanent partition key even
though one is not defined (and should therefore be random).
>> Any advice is greatly appreciated.
>> -J

View raw message