flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Langston <jlangs...@resolutebi.com>
Subject Avro configuration
Date Tue, 27 Dec 2016 19:47:33 GMT
Hi all,

I'm looking for some guidance , I have been trying to get a flow working that involves the

Source Avro --> mem channel --> file_roll

File Roll config

agent.sinks.persistence-sink.type = file_roll
agent.sinks.persistence-sink.sink.directory = /home/flume/persistence
agent.sinks.persistence-sink.sink.serializer = avro_event
agent.sinks.persistence-sink.batchSize = 1000
agent.sinks.persistence-sink.sink.rollInterval = 300

Once the data is on local disk, I want to flume the data to another flume server

Source spooldir --> mem channel -- Avro Sink (to another flume server)

agent.sources.persistence-dev-source.type = spooldir
agent.sources.persistence-dev-source.spoolDir = /home/flume/ready
agent.sources.persistence-dev-source.deserializer = avro
agent.sources.persistence-dev-source.deserializer.schemaType = LITERAL
agent.sources.persistence-dev-source.batchSize = 1000

The problem is that file_roll will put the incoming Avro data into a Avro container before
storing the data on the local file system. Then when the data is picked up by the spooldir
source , and sent to the flume server, it will have the file_roll headers when being read
by the interceptor.

Is there a recommended way to save the Avro data coming in, that will maintain its integrity
when sending on to another flume server, which is waiting on Avro data to multiplex and send
to its channels.

I have tried many different variations, with the result of the above configurations getting
the Avro to the other server with the Avro data that was received, but the problem is that
the applications will see the container headers from the file_roll , and not the headers from
the records from the initial Avro data.



schema that gets set by file_roll on its writes to disk:

  "type" : "record",
  "name" : "Event",
  "fields" : [ {
    "name" : "headers",
    "type" : {
      "type" : "map",
      "values" : "string"
  }, {
    "name" : "body",
    "type" : "bytes"
  } ]

View raw message