Thanks Hari for your help in this. Appreciate it.
We will work towards upgrading to CDH 4.2.1 soon, and hopefully, this issue is resolved.
From: Hari Shreedharan <firstname.lastname@example.org>
To: "email@example.com" <firstname.lastname@example.org>
Sent: Monday, May 13, 2013 7:58 PM
Subject: Re: IOException with HDFS-Sink:flushOrSync
The patch also made it to Hadoop 2.0.3.
On Monday, May 13, 2013, Hari Shreedharan wrote:
On Monday, May 13, 2013 at 7:23 PM, Rahul Ravindran wrote:
We are using cdh 4.1.2 - Hadoop version 2.0.0. Looks like cdh 4.2.1 also uses the same Hadoop version. Any suggestions on any mitigations?
Sent from my phone.Excuse the terseness.
On Monday, May 13, 2013 at 6:50 PM, Matt Wise wrote:
So we've just had this happen twice to two different flume machines... we're using the HDFS sink as well, but ours is writing to an S3N:// URL. Both times our sink stopped working and the filechannel clogged up immediately causing serious problems. A restart of Flume worked -- but the filechannel was so backed up at that point that it took a good long while to get Flume started up again properly.
Anyone else seeing this behavior?
(oh, and we're running flume 1.3.0)
We have noticed this a few times now where we appear to have an IOException from HDFS and this stops draining the channel until the flume process is restarted. Below are the logs: namenode-v01-00b is the active namenode (namenode-v01-00a is standby). We are using Quorum Journal Manager for our Namenode HA, but there was no Namenode failover which was initiated. If this is an expected error, should flume handle it and gracefully retry (thereby not requiring a restart)?