Thanks Gonzalo. You are correct about the topology in that I'm using Kafka channel as a source. Based on this thread, I was under the impression that Kafka sink is redundant.

Heres the topology:
Agent#1: spooldir source -> morphlines (transforms to avro) -> kafka channel (topic 'K1')
Agent#2: kafka source (topic 'K1') -> File channel -> HDFS sink

Please let me know if Agent#1 should be writing to a Kafka sink as well for Agent#2 to use that as source and what is the difference?

Thanks!


On Tue, Sep 15, 2015 at 11:47 AM, Gonzalo Herreros <gherreros@gmail.com> wrote:

I'm not sure if I understand your topology and what you mean exactly by "used Kafka channel/sink", it would help if you send the configuration.

My best guess about the error is that you are pointing the kafka source to a topic that is used by a channel and not by a kafka sink

Regards,
Gonzalo


On Sep 15, 2015 6:42 PM, "Buntu Dev" <buntudev@gmail.com> wrote:
Currently I have a single flume agent that converts apache logs into Avro and writes to HDFS sink. I'm looking for ways to create tiered topology and want to have the Avro records available to other flume agents. I used Kafka channel/sink to write these Avro records but was running into this error when using the Kafka source to read the records:

 Caused by: java.io.IOException: Not a data file.
    at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
    at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)


For using tiered topology, should I be using Avro sink and write to host/port for other flume agent to read using Avro source? or is there any other data format that I should consider if I want to stick with Kafka as the channel/sink?

Thanks!