flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmed Vila <av...@devlogic.eu>
Subject Re: Avro source and sink
Date Wed, 16 Sep 2015 07:59:43 GMT

Kafka channel is not meant to be read by the kafka source as it has
additional transaction log information's.

Remove the kafka source from second agent and point it's kafka channel
channel to the same topic.

Agent1: spool -> morphlines -> kagka channel - no sink
Agent2: no source - kafka channel -> hdfs sink
On Sep 15, 2015 22:11, "Buntu Dev" <buntudev@gmail.com> wrote:

> Thanks Gonzalo. You are correct about the topology in that I'm using Kafka
> channel as a source. Based on this thread
> <http://search-hadoop.com/m/z1pWR1tSM5S1qr5Bt>, I was under the
> impression that Kafka sink is redundant.
> Heres the topology:
> Agent#1: spooldir source -> morphlines (transforms to avro) -> kafka
> channel (topic 'K1')
> Agent#2: kafka source (topic 'K1') -> File channel -> HDFS sink
> Please let me know if Agent#1 should be writing to a Kafka sink as well
> for Agent#2 to use that as source and what is the difference?
> Thanks!
> On Tue, Sep 15, 2015 at 11:47 AM, Gonzalo Herreros <gherreros@gmail.com>
> wrote:
>> I'm not sure if I understand your topology and what you mean exactly by
>> "used Kafka channel/sink", it would help if you send the configuration.
>> My best guess about the error is that you are pointing the kafka source
>> to a topic that is used by a channel and not by a kafka sink
>> Regards,
>> Gonzalo
>> On Sep 15, 2015 6:42 PM, "Buntu Dev" <buntudev@gmail.com> wrote:
>>> Currently I have a single flume agent that converts apache logs into
>>> Avro and writes to HDFS sink. I'm looking for ways to create tiered
>>> topology and want to have the Avro records available to other flume agents.
>>> I used Kafka channel/sink to write these Avro records but was running into
>>> this error when using the Kafka source to read the records:
>>>  Caused by: java.io.IOException: Not a data file.
>>>     at
>>> org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
>>>     at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
>>> For using tiered topology, should I be using Avro sink and write to
>>> host/port for other flume agent to read using Avro source? or is there any
>>> other data format that I should consider if I want to stick with Kafka as
>>> the channel/sink?
>>> Thanks!

This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. This email contains confidential information. It should 
not be copied, disclosed to, retained or used by, any party other than the 
intended recipient. Any unauthorised distribution, dissemination or copying 
of this E-mail or its attachments, and/or any use of any information 
contained in them, is strictly prohibited and may be illegal. If you are 
not an intended recipient then please promptly delete this e-mail and any 
attachment and all copies and inform the sender directly via email. Any 
emails that you send to us may be monitored by systems or persons other 
than the named communicant for the purposes of ascertaining whether the 
communication complies with the law and company policies.

View raw message