anyone knows why this is happening?

-eran



On Tue, Sep 11, 2012 at 2:26 AM, Eran Kutner <eran@gigya.com> wrote:
Hi,
I'm trying to compress avro files written with hdfs sink everything appears to work but the files themselves are mostly empty. It appears that instead of writing the actual data only some kind of a header is written for every data row in the file. This is a hex dump of such a file:
0000000 0000 6100 0000 0900 0061 fe0a 0001 017e
0000010 0000 0000 0064 0000 6409 0a00 01fe 8a00
0000020 0001 0000 6400 0000 0900 0064 fe0a 0001
0000030 018a 0000 0000 0064 0000 6409 0a00 01fe
0000040 8a00 0001 0000 6400 0000 0900 0064 fe0a
0000050 0001 018a 0000 0000 0064 0000 6409 0a00
0000060 01fe 8a00 0001 0000 6400 0000 0900 0064
0000070 fe0a 0001 018a 0000 0000 0064 0000 6409
0000080 0a00 01fe 8a00 0001 0000 6400 0000 0900
0000090 0064 fe0a 0001 018a 0000 0000 0064 0000
00000a0 6409 0a00 01fe 8a00 0001 0000 6400 0000
00000b0 0900 0064 fe0a 0001 018a 0000 0000 0064
00000c0 0000 6409 0a00 01fe 8a00 0001 0000 6400
00000d0 0000 0900 0064 fe0a 0001 018a 0000 0000
00000e0 0064 0000 6409 0a00 01fe 8a00 0001 0000
00000f0 6400 0000 0900 0064 fe0a 0001 018a 0000
0000100 0000 0064 0000 6409 0a00 01fe 8a00 0001
0000110 0000 6400 0000 0900 0064 fe0a 0001 018a
0000120 0000 0000 0064 0000 6409 0a00 01fe 8a00
0000130 0001 0000 6400 0000 0900 0064 fe0a 0001
0000140 018a 0000 0000 0064 0000 6409 0a00 01fe
0000150 8a00 0001 0000 6400 0000 0900 0064 fe0a
0000160 0001 018a 0000 0000 0064 0000 6409 0a00
0000170 01fe 8a00 0001 0000 6400 0000 0900 0064
0000180 fe0a 0001 018a 0000 0000 0064 0000 6409
0000190 0a00 01fe 8a00 0001 0000 6400 0000 0900
00001a0 0064 fe0a 0001 018a 0000 0000 0064 0000
00001b0 6409 0a00 01fe 8a00 0001 0000 6400 0000
00001c0 0900 0064 fe0a 0001 018a 0000 0000 0064
00001d0 0000 6409 0a00 01fe 8a00 0001 0000 6400

Notice the repeating pattern within the data, it looks like empty headers with no data.

This is my sink config:
agent.sinks.hdfsSink2.type = hdfs
agent.sinks.hdfsSink2.channel = memoryChannel2
agent.sinks.hdfsSink2.hdfs.path=hdfs://hadoop2-m1:8020/raw-events/%Y-%m-%d
agent.sinks.hdfsSink2.hdfs.filePrefix=load-events.%{hostname}.avro
agent.sinks.hdfsSink2.hdfs.rollInterval=60
agent.sinks.hdfsSink2.hdfs.rollCount=0
agent.sinks.hdfsSink2.hdfs.rollSize=0
agent.sinks.hdfsSink2.hdfs.fileType=CompressedStream
agent.sinks.hdfsSink2.hdfs.codeC=snappy
agent.sinks.hdfsSink2.hdfs.writeFormat=Text
agent.sinks.hdfsSink2.hdfs.batchSize=1000
agent.sinks.hdfsSink2.serializer = avro_event


Any help would be appreciated.

Thanks.

-eran