You can use a URL (on HDFS/HTTP), that points to the schema: https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/AvroEventSerializer.java#L70

Use that URL to store your schema for the event, so you don't have to add it to the event itself.

Avro schema is only embedded in the files and not in event data, so we need to make sure we write to the correct file based on the event's own schema. avro_event works because we write the events out in a fixed schema (not the event's schema itself).


Thanks,
Hari

On Tue, Mar 8, 2016 at 1:05 PM, Justin Ryan <juryan@ziprealty.com> wrote:
Hiya folks, still struggling with this, is anyone on the list familiar with AvroEventSerializer$Builder ?

While I have gotten past my outright failure, I’ve only done so by adopting a fairly inflexible schema, which seems counter to the goal of using avro.  Particularly frustrating is that flume simply needs to pass the existing message along, though I understand it likely needs to grok to separate messages.  I can’t even find Kafka consumer code which is capable of being schema-aware.

From: Justin Ryan <juryan@ziprealty.com>
Reply-To: <user@flume.apache.org>
Date: Thursday, March 3, 2016 at 2:08 PM
To: <user@flume.apache.org>
Subject: Re: Avro source: could not find schema for event

Update:

So, I changed my serializer from org.apache.flume.sink.hdfs.AvroEventSerializer$Builder to avro_event, and this started working.  Well, working-ish, the data is a little funky but it’s arriving, being delivered to HDFS, and I can pull a file and examine it manually.

I seem to remember that I had the former based on some things I read about not having to specify a schema, since the schema is embedded in the avro data.

So I’m confused, it seems that my previous configuration should have worked without any special attention to the schema, but I got complaints that the schema couldn’t be found.

If anyone could shed a bit of light here, it would be much appreciated.

From: Justin Ryan <juryan@ziprealty.com>
Reply-To: <user@flume.apache.org>
Date: Monday, February 29, 2016 at 2:52 PM
To: "user@flume.apache.org" <user@flume.apache.org>
Subject: Avro source: could not find schema for event

Hiya,

I’ve got a fairly simply flume agent pulling events from kafka and landing them in HDFS.  For plain text messages, this works fine.

I created a topic specifically for the purpose of testing sending avro messages through kafka to land in HDFS, which I’m having some trouble with.

I noted from https://thisdataguy.com/2014/07/28/avro-end-to-end-in-hdfs-part-2-flume-setup/ the example of flume’s default avro schema[0], which will do for my testing, and set up my python-avro producer to send messages with this schema.  Unfortunately, I still have flume looping this message in its’ log:

  org.apache.flume.FlumeException: Could not find schema for event

I’m running out of assumptions to rethink / verify here, would appreciate any guidance on what I may be missing..

Thanks in advance,

Justin

[0] {
 "type": "record",
 "name": "Event",
 "fields": [{
   "name": "headers",
   "type": {
     "type": "map",
     "values": "string"
   }
 }, {
   "name": "body",
   "type": "bytes"
 }]
}