flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Wise <m...@nextdoor.com>
Subject Flume 1.3.0 + HDFS Sink + S3N + avro_vent + Hive…?
Date Wed, 08 May 2013 17:42:28 GMT
We're still working on getting our POC of Flume up and running... right now we have log events
that pass through our Flume nodes via a Syslog input and are happily sent off to ElasticSearch
for indexing. We're also sending these events to S3, but we're finding that they seem to be
unreadable with the avro tools.

> # S3 Output Sink
> agent.sinks.s3.type = hdfs
> agent.sinks.s3.channel = fc1
> agent.sinks.s3.hdfs.path = s3n://XXX:XXX@our_bucket/flume/events/%y-%m-%d/%H
> agent.sinks.s3.hdfs.rollInterval = 600
> agent.sinks.s3.hdfs.rollSize = 0
> agent.sinks.s3.hdfs.rollCount = 10000
> agent.sinks.s3.hdfs.batchSize = 10000
> agent.sinks.s3.hdfs.serializer = avro_event
> agent.sinks.s3.hdfs.fileType = SequenceFile
> agent.sinks.s3.hdfs.timeZone = UTC


When we try to look at the avro-serialized files, we get this error:

> [localhost avro]$ java -jar avro-tools-1.7.4.jar getschema FlumeData.1367857371493
> Exception in thread "main" java.io.IOException: Not a data file.
>         at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
>         at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
>         at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:89)
>         at org.apache.avro.tool.DataFileGetSchemaTool.run(DataFileGetSchemaTool.java:48)
>         at org.apache.avro.tool.Main.run(Main.java:80)
>         at org.apache.avro.tool.Main.main(Main.java:69)

At this point we're a bit unclear how we're supposed to use these FlumeData files with normal
Avro tools?

--Matt
Mime
View raw message