flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Shreedharan <hshreedha...@cloudera.com>
Subject Re: Why used space of flie channel buffer directory increase?
Date Wed, 20 Mar 2013 08:06:35 GMT
If you reduce the capacity the channel will be able to buffer fewer events.
If you want to reduce the space used when there are only a few events
remaining set the config param: "maxFileSize" to something lower(this is in
bytes). I don't advice setting this to lower than a few hundred megabytes
(in fact, the default value works pretty well - do you really need to save
3GB space?)- else you will end up having a huge number of small files if
there are many events wait to be taken from the channel.


Hari


On Wed, Mar 20, 2013 at 12:50 AM, Zhiwen Sun <pensz01@gmail.com> wrote:

> Hi Hari:
>
> Is that means I can reduce the capacity of file channel to cut down max
> disk space used by file channel?
>
>
> Zhiwen Sun
>
>
>
> On Wed, Mar 20, 2013 at 3:23 PM, Hari Shreedharan <
> hshreedharan@cloudera.com> wrote:
>
>>  Hi,
>>
>> Like I mentioned earlier, we will always keep 2 data files in each data
>> directory (the ".meta" files are metadata associated to the actual data).
>> Once a log-8 is created(when log-7 gets rotated when it hits maximum size)
>> and all of the events in log-6 are taken, then log-6 will get deleted, but
>> you will still will see log-7 and log-8. So what you are seeing is not
>> unexpected.
>>
>>
>> Hari
>>
>> --
>> Hari Shreedharan
>>
>> On Tuesday, March 19, 2013 at 6:30 PM, Zhiwen Sun wrote:
>>
>> Thanks all for your reply.
>>
>> @Kenison
>> I stop my tail -F | nc program and there is no new event file in HDFS, so
>> I think there is no event arrive. To make sure, I will test again with
>> enable JMX.
>>
>> @Alex
>>
>> The latest log is following. I can't see any exception or warning.
>>
>> 13/03/19 15:28:16 INFO hdfs.BucketWriter: Renaming hdfs://
>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901.tmp to hdfs://
>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901
>> 13/03/19 15:28:16 INFO hdfs.BucketWriter: Creating hdfs://
>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp
>> 13/03/19 15:28:17 INFO file.EventQueueBackingStoreFile: Start checkpoint
>> for /home/zhiwensun/.flume/file-channel/checkpoint/checkpoint, elements to
>> sync = 3
>> 13/03/19 15:28:17 INFO file.EventQueueBackingStoreFile: Updating
>> checkpoint metadata: logWriteOrderID: 1363659953997, queueSize: 0,
>> queueHead: 362981
>> 13/03/19 15:28:17 INFO file.LogFileV3: Updating log-7.meta
>> currentPosition = 216278208, logWriteOrderID = 1363659953997
>> 13/03/19 15:28:17 INFO file.Log: Updated checkpoint for file:
>> /home/zhiwensun/.flume/file-channel/data/log-7 position: 216278208
>> logWriteOrderID: 1363659953997
>> 13/03/19 15:28:26 INFO hdfs.BucketWriter: Renaming hdfs://
>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp to hdfs://
>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902
>> 13/03/19 15:28:27 INFO hdfs.BucketWriter: Creating hdfs://
>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp
>> 13/03/19 15:28:37 INFO hdfs.BucketWriter: Renaming hdfs://
>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp to hdfs://
>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903
>> 13/03/19 15:28:37 INFO hdfs.BucketWriter: Creating hdfs://
>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp
>>
>> 13/03/19 15:28:47 INFO file.EventQueueBackingStoreFile: Start checkpoint
>> for /home/zhiwensun/.flume/file-channel/checkpoint/checkpoint, elements to
>> sync = 2
>> 13/03/19 15:28:47 INFO file.EventQueueBackingStoreFile: Updating
>> checkpoint metadata: logWriteOrderID: 1363659954200, queueSize: 0,
>> queueHead: 362981
>> 13/03/19 15:28:47 INFO file.LogFileV3: Updating log-7.meta
>> currentPosition = 216288815, logWriteOrderID = 1363659954200
>> 13/03/19 15:28:47 INFO file.Log: Updated checkpoint for file:
>> /home/zhiwensun/.flume/file-channel/data/log-7 position: 216288815
>> logWriteOrderID: 1363659954200
>> 13/03/19 15:28:48 INFO hdfs.BucketWriter: Renaming hdfs://
>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp to hdfs://
>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904
>>
>>
>> @Hari
>> em, 12 hours passed. The size of file channel directory has no reduce.
>>
>> Files in file channel directory:
>>
>> -rw-r--r-- 1 zhiwensun zhiwensun    0 2013-03-19 09:15 in_use.lock
>> -rw-r--r-- 1 zhiwensun zhiwensun 1.0M 2013-03-19 10:11 log-6
>> -rw-r--r-- 1 zhiwensun zhiwensun   29 2013-03-19 10:12 log-6.meta
>> -rw-r--r-- 1 zhiwensun zhiwensun 207M 2013-03-19 15:28 log-7
>> -rw-r--r-- 1 zhiwensun zhiwensun   29 2013-03-19 15:28 log-7.meta
>> -rw-r--r-- 1 zhiwensun zhiwensun 207M 2013-03-19 15:28
>> ./file-channel/data/log-7
>> -rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 10:12
>> ./file-channel/data/log-6.meta
>> -rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 15:28
>> ./file-channel/data/log-7.meta
>> -rw-r--r-- 1 zhiwensun zhiwensun 0 2013-03-19 09:15
>> ./file-channel/data/in_use.lock
>> -rw-r--r-- 1 zhiwensun zhiwensun 1.0M 2013-03-19 10:11
>> ./file-channel/data/log-6
>>
>>
>>
>>
>>
>>
>> Zhiwen Sun
>>
>>
>>
>> On Wed, Mar 20, 2013 at 2:32 AM, Hari Shreedharan <
>> hshreedharan@cloudera.com> wrote:
>>
>>  It is possible for the directory size to increase even if no writes are
>> going in to the channel. If the channel size is non-zero and the sink is
>> still writing events to HDFS, the takes get written to disk as well (so we
>> know what events in the files were removed when the channel/agent
>> restarts). Eventually the channel will clean up the files which have all
>> events taken (though it will keep at least 2 files per data directory, just
>> to be safe).
>>
>> --
>> Hari Shreedharan
>>
>> On Tuesday, March 19, 2013 at 10:32 AM, Alexander Alten-Lorenz wrote:
>>
>> Hey,
>>
>> what says debug? Do you can gather logs and attach them?
>>
>> - Alex
>>
>> On Mar 19, 2013, at 5:27 PM, "Kenison, Matt" <Matt.Kenison@disney.com>
>> wrote:
>>
>> Check the JMX counter first, to make sure you really are not sending new
>> events. If not, is it your checkpoint directory or data directory that is
>> increasing in size?
>>
>>
>> From: Zhiwen Sun <pensz01@gmail.com>
>> Reply-To: "user@flume.apache.org" <user@flume.apache.org>
>> Date: Tue, 19 Mar 2013 01:19:19 -0700
>> To: "user@flume.apache.org" <user@flume.apache.org>
>> Subject: Why used space of flie channel buffer directory increase?
>>
>> hi all:
>>
>> I test flume-ng in my local machine. The data flow is :
>>
>> tail -F file | nc 127.0.0.01 4444 > flume agent > hdfs
>>
>> My configuration file is here :
>>
>> a1.sources = r1
>> a1.channels = c2
>>
>> a1.sources.r1.type = netcat
>> a1.sources.r1.bind = 192.168.201.197
>> a1.sources.r1.port = 44444
>> a1.sources.r1.max-line-length = 1000000
>>
>> a1.sinks.k1.type = logger
>>
>> a1.channels.c1.type = memory
>> a1.channels.c1.capacity = 10000
>> a1.channels.c1.transactionCapacity = 10000
>>
>> a1.channels.c2.type = file
>> a1.sources.r1.channels = c2
>>
>> a1.sources.r1.interceptors = i1
>> a1.sources.r1.interceptors.i1.type = timestamp
>>
>> a1.sinks = k2
>> a1.sinks.k2.type = hdfs
>> a1.sinks.k2.channel = c2
>> a1.sinks.k2.hdfs.path = hdfs://127.0.0.1:9000/flume/events/%Y-%m-%d
>> a1.sinks.k2.hdfs.writeFormat = Text
>> a1.sinks.k2.hdfs.rollInterval = 10
>> a1.sinks.k2.hdfs.rollSize = 10000000
>> a1.sinks.k2.hdfs.rollCount = 0
>>
>> a1.sinks.k2.hdfs.filePrefix = app
>> a1.sinks.k2.hdfs.fileType = DataStream
>>
>>
>>
>>
>> it seems that events were collected correctly.
>>
>> But there is a problem boring me: Used space of file channel (~/.flume)
>> has always increased, even there is no new event.
>>
>> Is my configuration wrong or other problem?
>>
>> thanks.
>>
>>
>> Best regards.
>>
>> Zhiwen Sun
>>
>>
>> --
>> Alexander Alten-Lorenz
>> http://mapredit.blogspot.com
>> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>>
>>
>>
>>
>>
>

Mime
View raw message