Hi Borja,

I would need more information to confirm this but at first glance this sounds like a destination related issue (I suspect that sink cannot commit tx as write fails). Are you sure that HDFS would otherwise be able to accept data from your flume sink?

How does your sink/source configuration look like? Do you have an excerpt of the log about the skipped events?

Cheers,
Attila


Attila Simon
Software Engineer

 Cloudera Inc.

On Tue, May 24, 2016 at 5:54 AM, Borja Garrido <borja.garrido.bear@cern.ch> wrote:
Hi all,

I've been experiencing a really weird behavior with Flume, basically my sinks weren't working so data started accumulating in the file channels, which caused them to grow in number of files.

When I detect that I stop the agent, stop the source and tried to start it so I could drain the channel, but I saw log replaying skipping the events.

After some read I move the checkpoint folder (with the agent stopped), so it will be empty in the next start, then the replay started taking into account the old log files in the channel, but ended up creating a new one and not doing anything with the rest, so right now I have around 20 log file in the channels that weight 1.6 GB each and Flume is not taking care of them apparently.

Of course for the replay to work I needed to increase the transactionCapacity of the channel

agent-hdfssink.channels.cn.type = file
agent-hdfssink.channels.cn.checkpointDir = /var/spool/flume/n/checkpoint
agent-hdfssink.channels.cn.dataDirs = /var/spool/flume/ln/data
agent-hdfssink.channels.cn.transactionCapacity = 1000
agent-hdfssink.channels.cn.capacity = 6000000

The kind of sink I'm using is HDFS, my question is if this is normal behavior and if there is any way to make flume send this data, as it seems it doesn't take care of the older log files.

I've also made a try moving everything outside of the channel and just letting there a file with its metadata (same result) and no errors in any case :S.

Thanks in advance for any help
Cheers,
Borja