flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashish <paliwalash...@gmail.com>
Subject Re: multiple flume clients and memory
Date Thu, 26 Mar 2015 03:18:15 GMT
Do all these clients have memory usage is in same range? If yes, then
taking a heap dump would reveal what is consuming memory.

As Hari said, the batch is kept in-memory, meaning Event size would
matter. Here is what I would do to debug this

1. See the memory usage of all client
2. If they are in range, would use VisualVM to get the heap dump of
any one of the process, else take heap dump of a few process (max, min
usage etc)
3. Use Eclipse MAT or other tool to see what's consuming the memory

Can also try tweaking the batch size to see if it makes any difference
in memory usage.

On Thu, Mar 26, 2015 at 8:33 AM, Matt Fair <matt.fair@gmail.com> wrote:
> The machine that I have seen it both on my machine with 16 GB and 60 GB of
> memory, when running about 40 clients and ~4k clients respectively using up
> 100% of memory.  If I run without the flume client I have no memory
> problems, but when I insatiate a flume RPCClient, then I run into memory
> problems.
> Thanks,
> Matt
> On Wed, Mar 25, 2015 at 6:42 PM, Hari Shreedharan
> <hshreedharan@cloudera.com> wrote:
>> How much memory are you talking about? The RPC client will hold on to the
>> batch of events you sent, plus some additional threading overhead. Under the
>> hood, it uses a Netty client which should not really have a big memory
>> footprint.
>> Thanks,
>> Hari
>> On Wed, Mar 25, 2015 at 3:27 PM, Matt Fair <matt.fair@gmail.com> wrote:
>>> I have an application that launches a bunch of processes (40+) on the
>>> same machine, each one connects to flume using the default flume RPCClient.
>>> I however have noticed that each RPCClient takes up a decent amount of
>>> memory, and when you create as many clients like I am, it adds up to a lot
>>> of memory.  One thought I had to alleviate having to create all of the
>>> clients was to create only a single RPCClient and then have my other
>>> processes connect to it via a socket, but that seems a little redundant
>>> since that is what the RPCClient is suppose to do anyways.  Have others
>>> found themselves in this same situation?  Is there a way to handle memory
>>> more efficiently or is there another RPCClient implementation that doesn't
>>> take up as much memory?
>>> Thanks,
>>> Matt


Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

View raw message