flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Miller <Alan.Mil...@synopsys.com>
Subject streaming Avro to HDFS
Date Wed, 06 Feb 2013 17:58:48 GMT
Hi I'm just getting started with Flume and trying to understand the flow of things.

I have avro binary data files being generated on remote nodes and I want to use
Flume (1.2.0) to stream them to my HDFS cluster at a central location. It seems I can
stream the data but the resulting files on HDFS seem corrupt.  Here's what I did:

For my "master" (on the NameNode of my Hadoop cluster)  I started this:
flume-ng agent -f agent.conf  -Dflume.root.logger=DEBUG,console -n agent
With this config:
agent.channels = memory-channel
agent.sources = avro-source
agent.sinks = hdfs-sink

agent.channels.memory-channel.type = memory
agent.channels.memory-channel.capacity = 1000
agent.channels.memory-channel.transactionCapacity = 100

agent.sources.avro-source.channels = memory-channel
agent.sources.avro-source.type = avro
agent.sources.avro-source.bind =
agent.sources.avro-source.port = 41414

agent.sinks.hdfs-sink.type = hdfs
agent.sinks.hdfs-sink.channel = memory-channel
agent.sinks.hdfs-sink.hdfs.path = hdfs://namenode1:9000/flume

On a remote node I streamed a test file like this:
flume-ng avro-client -H -p 41414 -F /tmp/test.avro

I can see the master is writing to HDFS
  13/02/06 09:37:55 INFO hdfs.BucketWriter: Creating hdfs://namenode1:9000/flume/FlumeData.1360172273684.tmp
  13/02/06 09:38:25 INFO hdfs.BucketWriter: Renaming hdfs://namenode1:9000/flume/FlumeData.1360172273684.tmp
  to hdfs://namenode1:9000/flume/FlumeData.1360172273684

But the data doesn't seem right. The original file is 4551 bytes, the file written to
HDFS was only 219 bytes
  [localhost] $ ls -l FlumeData.1360172273684 /tmp/test.avro
  -rwxr-xr-x 1 amiller amiller  219 Feb  6 18:51 FlumeData.1360172273684
  -rwxr-xr-x 1 amiller amiller 4551 Feb 6 12:00 /tmp/test.avro

  [localhost] $ avro cat /tmp/test.avro
  {"system_model": null, "nfsv4": null, "ip": null, "site": null, "nfsv3": null, "export":
null, "ifnet": [{"send_bps": 1234, "recv_bps": 5678, "name": "eth0"}, {"send_bps": 100, "recv_bps":
200, "name": "eth1"}, {"send_bps": 0, "recv_bps": 0, "name": "eth2"}], "disk": null, "hostname":
"localhost", "total_mem": null, "ontapi_version": null, "serial_number": null, "cifs": null,
"cpu_model": null, "volume": null, "time_stamp": 1357639723, "aggregate": null, "num_cpu":
null, "cpu_speed_mhz": null, "hostid": null, "kernel_version": null, "qtree": null, "processor":

  [localhost] $ hadoop fs -copyToLocal /flume/FlumeData.1360172273684 .
  [localhost] $ avro cat FlumeData.1360172273684
  panic: ord() expected a character, but string of length 0 found


View raw message