Looks like this is happening because for certain metrics which cannot be converted into float (for example strings), we are sending the type as float, causing Ganglia to log those messages. I think it should be fairly easy to write a patch to fix this. I filed FLUME-1870 to track this.


Hari Shreedharan

On Wednesday, January 23, 2013 at 1:23 PM, Mike Percy wrote:

Not sure when or how it broke, as I know of people using it in production. There is a way to configure it for different versions of Ganglia, like 3.0, 3.1. Might be worth trying both values to see if it's a problem with one or the other: http://flume.apache.org/FlumeUserGuide.html#ganglia-reporting

On Wed, Jan 23, 2013 at 1:12 PM, Connor Woodson <cwoodson.dev@gmail.com> wrote:
I have not had success with Ganglia, due to the same issue I think as you've encountered.

- Connor

On Wed, Jan 23, 2013 at 6:04 AM, Christian Schroer <cschroer@autoscout24.com> wrote:

i have the same problem here, using Flume-NG 1.2 from CDH4.1.2.
I deleted all related RRDs, gmetad recreates them and i see those errors again.

Disk space, inodes, ... are fine. All RRDs not related to Flume-NG are fine, too.

These errors result in a gmetad crash after while (more metrics => earlier crash). If I disable ganglia support gmetad runs without any problem.


-----Urspr√ľngliche Nachricht-----
Von: Alexander Alten-Lorenz [mailto:wget.null@gmail.com]
Gesendet: Mittwoch, 12. Dezember 2012 15:43
An: user@flume.apache.org
Betreff: Re: Setting up flume to use ganglia results in a lot of error messages in /var/log/messages

Looks like the RRD's are damaged, maybe the harddisk full?

- Alex

On Dec 12, 2012, at 11:38 AM, Juhani Connolly <juhani_connolly@cyberagent.co.jp> wrote:

> We just noticed that ganglia's gmetad is spamming messages like the following
> Dec  9 03:13:20 om-pat-obs01 /usr/sbin/gmetad[17407]: RRD_update (/var/lib/ganglia/rrds/Flume KDDI/blog-wap02/flume.SINK.avro2.Type.rrd): /var/lib/ganglia/rrds/Flume KDDI/blog-wap02/flume.SINK.avro2.Type.rrd: conversion of 'SINK' to float not complete: tail 'SINK'
> Dec  9 03:13:20 om-pat-obs01 /usr/sbin/gmetad[17407]: RRD_update (/var/lib/ganglia/rrds/Flume KDDI/blog-wap02/flume.CHANNEL.ch1.Type.rrd): /var/lib/ganglia/rrds/Flume KDDI/blog-wap02/flume.CHANNEL.ch1.Type.rrd: conversion of 'CHANNEL' to float not complete: tail 'CHANNEL'
> Dec  9 03:13:20 om-pat-obs01 /usr/sbin/gmetad[17407]: RRD_update (/var/lib/ganglia/rrds/Flume KDDI/blog-wap11/flume.SINK.avro1.Type.rrd): /var/lib/ganglia/rrds/Flume KDDI/blog-wap11/flume.SINK.avro1.Type.rrd: conversion of 'SINK' to float not complete: tail 'SINK'
> Dec  9 03:13:20 om-pat-obs01 /usr/sbin/gmetad[17407]: RRD_update (/var/lib/ganglia/rrds/Flume KDDI/blog-wap11/flume.SOURCE.scribe.Type.rrd): /var/lib/ganglia/rrds/Flume KDDI/blog-wap11/flume.SOURCE.scribe.Type.rrd: conversion of 'SOURCE' to float not complete: tail 'SOURCE'
> The counters are tracked fine on the web interface, but we've had issues(gmetad crashing or not starting up, I'm still trying to get specifics from the responsible people). I can't say for sure if this is a ganglia problem or a problem with flumes ganglia support. Anyone else feeding their counters to ganglia getting similar? Perhaps the ganglia ml may be more appropriate, not sure on this one.

Alexander Alten-Lorenz
German Hadoop LinkedIn Group: http://goo.gl/N8pCF