flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eran Kutner <e...@gigya.com>
Subject Re: Avro files are empty with snappy compression enabled
Date Tue, 18 Sep 2012 14:39:58 GMT
anyone knows why this is happening?

-eran



On Tue, Sep 11, 2012 at 2:26 AM, Eran Kutner <eran@gigya.com> wrote:

> Hi,
> I'm trying to compress avro files written with hdfs sink everything
> appears to work but the files themselves are mostly empty. It appears that
> instead of writing the actual data only some kind of a header is written
> for every data row in the file. This is a hex dump of such a file:
> 0000000 0000 6100 0000 0900 0061 fe0a 0001 017e
> 0000010 0000 0000 0064 0000 6409 0a00 01fe 8a00
> 0000020 0001 0000 6400 0000 0900 0064 fe0a 0001
> 0000030 018a 0000 0000 0064 0000 6409 0a00 01fe
> 0000040 8a00 0001 0000 6400 0000 0900 0064 fe0a
> 0000050 0001 018a 0000 0000 0064 0000 6409 0a00
> 0000060 01fe 8a00 0001 0000 6400 0000 0900 0064
> 0000070 fe0a 0001 018a 0000 0000 0064 0000 6409
> 0000080 0a00 01fe 8a00 0001 0000 6400 0000 0900
> 0000090 0064 fe0a 0001 018a 0000 0000 0064 0000
> 00000a0 6409 0a00 01fe 8a00 0001 0000 6400 0000
> 00000b0 0900 0064 fe0a 0001 018a 0000 0000 0064
> 00000c0 0000 6409 0a00 01fe 8a00 0001 0000 6400
> 00000d0 0000 0900 0064 fe0a 0001 018a 0000 0000
> 00000e0 0064 0000 6409 0a00 01fe 8a00 0001 0000
> 00000f0 6400 0000 0900 0064 fe0a 0001 018a 0000
> 0000100 0000 0064 0000 6409 0a00 01fe 8a00 0001
> 0000110 0000 6400 0000 0900 0064 fe0a 0001 018a
> 0000120 0000 0000 0064 0000 6409 0a00 01fe 8a00
> 0000130 0001 0000 6400 0000 0900 0064 fe0a 0001
> 0000140 018a 0000 0000 0064 0000 6409 0a00 01fe
> 0000150 8a00 0001 0000 6400 0000 0900 0064 fe0a
> 0000160 0001 018a 0000 0000 0064 0000 6409 0a00
> 0000170 01fe 8a00 0001 0000 6400 0000 0900 0064
> 0000180 fe0a 0001 018a 0000 0000 0064 0000 6409
> 0000190 0a00 01fe 8a00 0001 0000 6400 0000 0900
> 00001a0 0064 fe0a 0001 018a 0000 0000 0064 0000
> 00001b0 6409 0a00 01fe 8a00 0001 0000 6400 0000
> 00001c0 0900 0064 fe0a 0001 018a 0000 0000 0064
> 00001d0 0000 6409 0a00 01fe 8a00 0001 0000 6400
>
> Notice the repeating pattern within the data, it looks like empty headers
> with no data.
>
> This is my sink config:
> agent.sinks.hdfsSink2.type = hdfs
> agent.sinks.hdfsSink2.channel = memoryChannel2
> agent.sinks.hdfsSink2.hdfs.path=hdfs://hadoop2-m1:8020/raw-events/%Y-%m-%d
> agent.sinks.hdfsSink2.hdfs.filePrefix=load-events.%{hostname}.avro
> agent.sinks.hdfsSink2.hdfs.rollInterval=60
> agent.sinks.hdfsSink2.hdfs.rollCount=0
> agent.sinks.hdfsSink2.hdfs.rollSize=0
> agent.sinks.hdfsSink2.hdfs.fileType=CompressedStream
> agent.sinks.hdfsSink2.hdfs.codeC=snappy
> agent.sinks.hdfsSink2.hdfs.writeFormat=Text
> agent.sinks.hdfsSink2.hdfs.batchSize=1000
> agent.sinks.hdfsSink2.serializer = avro_event
>
>
> Any help would be appreciated.
>
> Thanks.
>
> -eran
>
>

Mime
View raw message