flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Alten-Lorenz <wget.n...@gmail.com>
Subject Re: Why used space of flie channel buffer directory increase?
Date Wed, 20 Mar 2013 07:11:04 GMT
HI,

I suspect tail -F and nc for filling up the directory. Whats inside of such a file which grows
without a event?

My assumption:
nc is open one stream, and deliver over this stream all incoming events. Flume doesn't know
that no event is coming in, since the stream never breaks up. I wondering if you could use
syslog(-ng) for the event delivery?

Cheers,
 Alex



On Mar 20, 2013, at 2:30 AM, Zhiwen Sun <pensz01@gmail.com> wrote:

> Thanks all for your reply.
> 
> @Kenison 
> I stop my tail -F | nc program and there is no new event file in HDFS, so I think there
is no event arrive. To make sure, I will test again with enable JMX.
> 
> @Alex
> 
> The latest log is following. I can't see any exception or warning.
> 
> 13/03/19 15:28:16 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901.tmp
to hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901
> 13/03/19 15:28:16 INFO hdfs.BucketWriter: Creating hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp
> 13/03/19 15:28:17 INFO file.EventQueueBackingStoreFile: Start checkpoint for /home/zhiwensun/.flume/file-channel/checkpoint/checkpoint,
elements to sync = 3
> 13/03/19 15:28:17 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata:
logWriteOrderID: 1363659953997, queueSize: 0, queueHead: 362981
> 13/03/19 15:28:17 INFO file.LogFileV3: Updating log-7.meta currentPosition = 216278208,
logWriteOrderID = 1363659953997
> 13/03/19 15:28:17 INFO file.Log: Updated checkpoint for file: /home/zhiwensun/.flume/file-channel/data/log-7
position: 216278208 logWriteOrderID: 1363659953997
> 13/03/19 15:28:26 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp
to hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902
> 13/03/19 15:28:27 INFO hdfs.BucketWriter: Creating hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp
> 13/03/19 15:28:37 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp
to hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903
> 13/03/19 15:28:37 INFO hdfs.BucketWriter: Creating hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp
> 
> 13/03/19 15:28:47 INFO file.EventQueueBackingStoreFile: Start checkpoint for /home/zhiwensun/.flume/file-channel/checkpoint/checkpoint,
elements to sync = 2
> 13/03/19 15:28:47 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata:
logWriteOrderID: 1363659954200, queueSize: 0, queueHead: 362981
> 13/03/19 15:28:47 INFO file.LogFileV3: Updating log-7.meta currentPosition = 216288815,
logWriteOrderID = 1363659954200
> 13/03/19 15:28:47 INFO file.Log: Updated checkpoint for file: /home/zhiwensun/.flume/file-channel/data/log-7
position: 216288815 logWriteOrderID: 1363659954200
> 13/03/19 15:28:48 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp
to hdfs://127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904
> 
> @Hari
> em, 12 hours passed. The size of file channel directory has no reduce.
> 
> Files in file channel directory:
> 
> -rw-r--r-- 1 zhiwensun zhiwensun    0 2013-03-19 09:15 in_use.lock
> -rw-r--r-- 1 zhiwensun zhiwensun 1.0M 2013-03-19 10:11 log-6
> -rw-r--r-- 1 zhiwensun zhiwensun   29 2013-03-19 10:12 log-6.meta
> -rw-r--r-- 1 zhiwensun zhiwensun 207M 2013-03-19 15:28 log-7
> -rw-r--r-- 1 zhiwensun zhiwensun   29 2013-03-19 15:28 log-7.meta
> -rw-r--r-- 1 zhiwensun zhiwensun 207M 2013-03-19 15:28 ./file-channel/data/log-7
> -rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 10:12 ./file-channel/data/log-6.meta
> -rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 15:28 ./file-channel/data/log-7.meta
> -rw-r--r-- 1 zhiwensun zhiwensun 0 2013-03-19 09:15 ./file-channel/data/in_use.lock
> -rw-r--r-- 1 zhiwensun zhiwensun 1.0M 2013-03-19 10:11 ./file-channel/data/log-6
> 
> 
> 
> 
> 
> Zhiwen Sun 
> 
> 
> 
> On Wed, Mar 20, 2013 at 2:32 AM, Hari Shreedharan <hshreedharan@cloudera.com> wrote:
> It is possible for the directory size to increase even if no writes are going in to the
channel. If the channel size is non-zero and the sink is still writing events to HDFS, the
takes get written to disk as well (so we know what events in the files were removed when the
channel/agent restarts). Eventually the channel will clean up the files which have all events
taken (though it will keep at least 2 files per data directory, just to be safe).
> 
> -- 
> Hari Shreedharan
> 
> On Tuesday, March 19, 2013 at 10:32 AM, Alexander Alten-Lorenz wrote:
> 
>> Hey,
>> 
>> what says debug? Do you can gather logs and attach them?
>> 
>> - Alex
>> 
>> On Mar 19, 2013, at 5:27 PM, "Kenison, Matt" <Matt.Kenison@disney.com> wrote:
>> 
>>> Check the JMX counter first, to make sure you really are not sending new events.
If not, is it your checkpoint directory or data directory that is increasing in size?
>>> 
>>> 
>>> From: Zhiwen Sun <pensz01@gmail.com>
>>> Reply-To: "user@flume.apache.org" <user@flume.apache.org>
>>> Date: Tue, 19 Mar 2013 01:19:19 -0700
>>> To: "user@flume.apache.org" <user@flume.apache.org>
>>> Subject: Why used space of flie channel buffer directory increase?
>>> 
>>> hi all:
>>> 
>>> I test flume-ng in my local machine. The data flow is :
>>> 
>>> tail -F file | nc 127.0.0.01 4444 > flume agent > hdfs
>>> 
>>> My configuration file is here :
>>> 
>>>> a1.sources = r1
>>>> a1.channels = c2
>>>> 
>>>> a1.sources.r1.type = netcat
>>>> a1.sources.r1.bind = 192.168.201.197
>>>> a1.sources.r1.port = 44444
>>>> a1.sources.r1.max-line-length = 1000000
>>>> 
>>>> a1.sinks.k1.type = logger
>>>> 
>>>> a1.channels.c1.type = memory
>>>> a1.channels.c1.capacity = 10000
>>>> a1.channels.c1.transactionCapacity = 10000
>>>> 
>>>> a1.channels.c2.type = file
>>>> a1.sources.r1.channels = c2
>>>> 
>>>> a1.sources.r1.interceptors = i1
>>>> a1.sources.r1.interceptors.i1.type = timestamp
>>>> 
>>>> a1.sinks = k2
>>>> a1.sinks.k2.type = hdfs
>>>> a1.sinks.k2.channel = c2
>>>> a1.sinks.k2.hdfs.path = hdfs://127.0.0.1:9000/flume/events/%Y-%m-%d
>>>> a1.sinks.k2.hdfs.writeFormat = Text
>>>> a1.sinks.k2.hdfs.rollInterval = 10
>>>> a1.sinks.k2.hdfs.rollSize = 10000000
>>>> a1.sinks.k2.hdfs.rollCount = 0
>>>> 
>>>> a1.sinks.k2.hdfs.filePrefix = app
>>>> a1.sinks.k2.hdfs.fileType = DataStream
>>> 
>>> 
>>> 
>>> it seems that events were collected correctly.
>>> 
>>> But there is a problem boring me: Used space of file channel (~/.flume) has always
increased, even there is no new event.
>>> 
>>> Is my configuration wrong or other problem?
>>> 
>>> thanks.
>>> 
>>> 
>>> Best regards.
>>> 
>>> Zhiwen Sun
>> 
>> --
>> Alexander Alten-Lorenz
>> http://mapredit.blogspot.com
>> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
> 
> 

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF


Mime
View raw message