flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Umesh Telang <Umesh.Tel...@bbc.co.uk>
Subject RE: checkpoint lifecycle
Date Thu, 30 Jan 2014 14:16:35 GMT
Hi Brock,

Our heap size is 2GB.

Thanks for the advice on data directories. Could you please let me know the heuristic for
that?   (e.g. 1 data directory per N-sized channel where N is...)

Thanks also for suggesting back up checkpoints - are these something that increases the integrity
of Flume's execution in an automatic fashion, or does it aid in some form of manual recovery?

Re: FLUME-2155, I've scanned through it, and will read it in more detail. I'm not sure about
the unit of measurement for some of the metrics (milliseconds?), but is there any guidance
as to at which order of magnitude (10^4, 10^6 or 10^8 ?) the channel size causes the replay
issue to become apparent?

Thank you,
Umesh

________________________________
From: Brock Noland [brock@cloudera.com]
Sent: 30 January 2014 13:27
To: user@flume.apache.org
Subject: RE: checkpoint lifecycle


How large is your heap?

You will likely want two data directories per disk. Also with a channel that large I strongly
recommend using back up checkpoints.

Additionally https://issues.apache.org/jira/browse/FLUME-2155 will be very useful to you as
well.

On Jan 30, 2014 4:21 AM, "Umesh Telang" <Umesh.Telang@bbc.co.uk<mailto:Umesh.Telang@bbc.co.uk>>
wrote:

Hi Hari,

The capacity of the channel is 150,000,000. The other properties of the file channel are as
below:
a1.channels.s3-file-channel.type = file
a1.channels.s3-file-channel.checkpointDir = /mnt/flume-file-channels/s3-file-channel/checkpoint
a1.channels.s3-file-channel.dataDirs = /mnt/flume-file-channels/s3-file-channel/data
a1.channels.s3-file-channel.transactionCapacity = 20000
a1.channels.s3-file-channel.capacity = 150000000

We've been experimenting with the configuration. We haven't specifically noticed an increase
in the checkpoint size. It's just that as the size we've observed is in the order of gigabytes,
we wanted to understand how the checkpoint size would vary, if at all.

Based on what you've said, it looks like the checkpoint size is a direct function of the channel
capacity. So, for a given channel capacity... as long as there is enough disk space initially
provisioned, that should be sufficient for that flume agent.

Thanks again for clarifying!

Umesh


________________________________
From: Hari Shreedharan [hshreedharan@cloudera.com<mailto:hshreedharan@cloudera.com>]
Sent: 29 January 2014 18:55
To: user@flume.apache.org<mailto:user@flume.apache.org>
Subject: Re: checkpoint lifecycle

What is the capacity of your channel? I would assume that the checkpoint size will remain
the same throughout.


Thanks,
Hari


On Wednesday, January 29, 2014 at 9:37 AM, Umesh Telang wrote:

Thanks for the quick response, Hari!

We are using version 1.4.0 of Flume.

The contents and sizes of the checkpoint directory are as below:
$ ls -lh
total 1.2G
-rw-r--r-- 1 flume flume 1.2G Jan 29 17:34 checkpoint
-rw-r--r-- 1 flume flume   25 Jan 29 17:34 checkpoint.meta
-rw-r--r-- 1 flume flume    0 Jan 28 07:56 in_use.lock
-rw-r--r-- 1 flume flume   32 Jan 29 17:34 inflightputs
-rw-r--r-- 1 flume flume   32 Jan 29 17:34 inflighttakes

Thanks,
Umesh


________________________________
From: Hari Shreedharan [hshreedharan@cloudera.com<mailto:hshreedharan@cloudera.com>]
Sent: 29 January 2014 17:22
To: user@flume.apache.org<mailto:user@flume.apache.org>
Subject: Re: checkpoint lifecycle

The checkpoint file itself should be fixed size though other files in that directory may vary
in size. What version of flume are you using? Newer versions should have more files in that
directory.

On Wednesday, January 29, 2014, Umesh Telang <Umesh.Telang@bbc.co.uk<mailto:Umesh.Telang@bbc.co.uk>>
wrote:

Hello,

Under a file channels checkpoint directory, I see the following files:
checkpoint
checkpoint.meta

I wanted to know whether the size of the checkpoint file should reach a steady state if the
amount and rate of input to the file chain remains the same.

My understanding is that the checkpoint file is associate with the write ahead log. Is this
something that continues to grow indefinitely?

Or is there some lifecycle management that cleans out very old entries from the write ahead
log?

If not, is there some strategy that we should employ to manage the size of the checkpoint
file (in our case, it's currently over 1GB after 2 days' operation).

Thanks for any advice on this.

Kind regards,
Umesh




----------------------------

http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal views which are
not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify
the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------



----------------------------

http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal views which are
not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify
the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------




----------------------------

http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal views which are
not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify
the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------



----------------------------

http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal views which are
not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify
the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------
Mime
View raw message