flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ritesh Adval <riteshad...@gaikai.com>
Subject Re: Flume File Channel Filling Up The Disk With Transaction Log, Any Way To Prevent It
Date Tue, 26 Nov 2013 01:30:15 GMT
What I meant was we are processing metrics and events, so one agent for
metrics and one for events. so in one rack VM we have 1 event and 1 metric
agent for a total of 2 agents per VM and same goes for cluster and zone. So
total 6 agents in 3 VMs.

Thanks for all the suggestions, I am going to try those out.

Ritesh




On Mon, Nov 25, 2013 at 5:04 PM, Jeff Lord <jlord@cloudera.com> wrote:

> Not sure what you mean by one agent each for event.
> Its possible that you may be able to use one agent for your needs and this
> could possibly alleviate disk contention by the agents as the file channel
> is concerned.
> Either way it sounds like you either need to decrease the size of the file
> channel or increase the size of your disk.
>
>
> On Mon, Nov 25, 2013 at 3:38 PM, Ritesh Adval <riteshadval@gaikai.com>wrote:
>
>> We have one agent each for event
>> And metric and we have 3 hops where
>> These goes through (rack, cluster and zone)  so we run these 2 agents
>> together running on each hop. (total 6 agents, 2 in each VM)
>>
>> Is running single agent per VM recommend ?
>>
>> -Ritesh
>>
>>
>>
>> On Nov 25, 2013, at 3:23 PM, Jeff Lord <jlord@cloudera.com> wrote:
>>
>> Its fine to run in a VM.
>> Out of curiosity why are you running two agents on the machine though?
>>
>>
>>
>> On Mon, Nov 25, 2013 at 1:54 PM, Brock Noland <brock@cloudera.com> wrote:
>>
>>> It the channel is full your clients will get a rejection notice.
>>>
>>> Capacity planning on the FC is a mix between event size, channel size,
>>> and disk size. If flume is holding on to the logs, it's because it
>>> needs them.  If you are constantly running out of space, then yes,
>>> it's quite likely decreasing channel capacity is a logical course of
>>> action.
>>>
>>> Brock
>>>
>>> On Mon, Nov 25, 2013 at 3:30 PM, Ritesh Adval <riteshadval@gaikai.com>
>>> wrote:
>>> > Thanks but if it keeps any tx log which have events in channel, then it
>>> > seems it would go out of diskspace, since our clients will keep sending
>>> > events to it and it will keep creating those tx logs till it has
>>> diskspace?
>>> > Or Am I missing something here?
>>> >
>>> > what we need is the client to start getting meesage rejection if the
>>> flume
>>> > agent file channel has reached its limit in terms of pending messages
>>> in tx
>>> > logs or capacity.  Do you think we should reduce the channel capacity,
>>> > currently it is set to 1M
>>> >
>>> >
>>> > Ritesh
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Mon, Nov 25, 2013 at 1:00 PM, Brock Noland <brock@cloudera.com>
>>> wrote:
>>> >>
>>> >> It will keep any tx log that has a corresponding event in the channel
>>> >> + 2 per data directory.
>>> >>
>>> >> On Mon, Nov 25, 2013 at 2:55 PM, Ritesh Adval <riteshadval@gaikai.com
>>> >
>>> >> wrote:
>>> >> > Thanks but we do not know how many transaction log files it will
>>> create,
>>> >> > so
>>> >> > it may go out of disk space even if we set lower maxFileSize. 
Do we
>>> >> > know
>>> >> > how many max log files it will keep in flume 1.4 ?
>>> >> >
>>> >> > Ritesh
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Mon, Nov 25, 2013 at 12:50 PM, Brock Noland <brock@cloudera.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> Lower the maxFileSize.
>>> >> >>
>>> >> >> On Mon, Nov 25, 2013 at 2:41 PM, Ritesh Adval <
>>> riteshadval@gaikai.com>
>>> >> >> wrote:
>>> >> >> > Hi,
>>> >> >> >
>>> >> >> > We are running two flume 1.4  agents each with 2 file
channel on
>>> a VM
>>> >> >> > of
>>> >> >> > size 15GB.
>>> >> >> >
>>> >> >> > Is VM recommded to run flume or do we need bare metal
boxes?
>>> >> >> >
>>> >> >> >
>>> >> >> > Every week or so we are running into situation where due
to our
>>> sinks
>>> >> >> > on
>>> >> >> > these agents not able to send message to upstream agents,
the
>>> flume
>>> >> >> > file
>>> >> >> > channels get filled with large transaction logs.
>>> >> >> >
>>> >> >> > Here is what we see on 4 channels :
>>> >> >> >
>>> >> >> > $ du -h /srv/flume/
>>> >> >> > 4.9G    /srv/flume/metricChannel1-Cluster/data
>>> >> >> > 7.7M    /srv/flume/metricChannel1-Cluster/checkpoint
>>> >> >> > 4.9G    /srv/flume/metricChannel1-Cluster
>>> >> >> > 4.9G    /srv/flume/metricChannel2-Cluster/data
>>> >> >> > 7.7M    /srv/flume/metricChannel2-Cluster/checkpoint
>>> >> >> > 4.9G    /srv/flume/metricChannel2-Cluster
>>> >> >> > 214M    /srv/flume/eventChannel2-Cluster/data
>>> >> >> > 7.7M    /srv/flume/eventChannel2-Cluster/checkpoint
>>> >> >> > 222M    /srv/flume/eventChannel2-Cluster
>>> >> >> > 215M    /srv/flume/eventChannel1-Cluster/data
>>> >> >> > 7.7M    /srv/flume/eventChannel1-Cluster/checkpoint
>>> >> >> > 223M    /srv/flume/eventChannel1-Cluster
>>> >> >> > 11G     /srv/flume/
>>> >> >> >
>>> >> >> >
>>> >> >> > Here is an example of tx logs on metricChannel1, we are
seeing 5
>>> log
>>> >> >> > files.
>>> >> >> > Is there
>>> >> >> > a way to restrict the number of log files kept? I think
in older
>>> >> >> > version
>>> >> >> > of
>>> >> >> > flume it was max 2 log files but we are seeing more than
2 as
>>> shown
>>> >> >> > below:
>>> >> >> >
>>> >> >> >
>>> >> >> >  $ ls -l /srv/flume/metricChannel1-Cluster/data/
>>> >> >> > total 4.5G
>>> >> >> > -rw-r--r-- 1 flume flume    0 Nov 23 00:39 in_use.lock
>>> >> >> > -rw-r--r-- 1 flume flume 1.1G Nov 23 11:11 log-1
>>> >> >> > -rw-r--r-- 1 flume flume   47 Nov 24 21:14 log-1.meta
>>> >> >> > -rw-r--r-- 1 flume flume 1.1G Nov 23 21:18 log-2
>>> >> >> > -rw-r--r-- 1 flume flume   47 Nov 24 21:14 log-2.meta
>>> >> >> > -rw-r--r-- 1 flume flume 1.1G Nov 24 07:13 log-3
>>> >> >> > -rw-r--r-- 1 flume flume   47 Nov 24 21:14 log-3.meta
>>> >> >> > -rw-r--r-- 1 flume flume 1.1G Nov 24 17:08 log-4
>>> >> >> > -rw-r--r-- 1 flume flume   47 Nov 24 21:14 log-4.meta
>>> >> >> > -rw-r--r-- 1 flume flume 425M Nov 24 21:15 log-5
>>> >> >> > -rw-r--r-- 1 flume flume   47 Nov 24 21:14 log-5.meta
>>> >> >> >
>>> >> >> >
>>> >> >> > we have set maxFileSize to 1GB  and it looks like each
tx log is
>>> >> >> > within
>>> >> >> > that
>>> >> >> > limit and capacity on file channel to 1M message
>>> >> >> >
>>> >> >> > agent.channels.metricChannel2.transactionCapacity=1000
>>> >> >> > agent.channels.metricChannel2.capacity=1000000
>>> >> >> > agent.channels.metricChannel2.maxFileSize=1073741824
>>> >> >> >
>>> >> >> >
>>> >> >> > What we want to avoid is transaction log filling up the
disk,  Is
>>> >> >> > there
>>> >> >> > a
>>> >> >> > way to achieve this.
>>> >> >> > We are ok to discard the message.
>>> >> >> >
>>> >> >> > Thanks
>>> >> >> > Ritesh
>>> >> >> >
>>> >> >> >
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
>>> >> >
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
>>>
>>
>>
>

Mime
View raw message