Bug 473961

Summary: clvmd memory leak
Product: Red Hat Enterprise Linux 5 Reporter: Chris <caronc>
Component: cmanAssignee: Christine Caulfield <ccaulfie>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 5.2CC: agk, bmr, bturner, ccaulfie, cfeist, cluster-maint, cward, dejohnso, dwysocha, edamato, heinzm, jbrassow, matt, mbroz, michael.hagmann, nvarney, prockai, rlerch, tao
Target Milestone: rc   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: cman-2.0.100-1.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-02 11:07:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chris 2008-12-01 16:32:42 UTC
Description of problem:

frequent calls to 'lvdisplay' and 'vgdisplay' cause clvmd to allocate more memory which is never later released... 

Version-Release number of selected component (if applicable):
Currently we are using lvm2-cluster-2.02.32-4.el5 

How reproducible:
Easily

Steps to Reproduce:
1. Set up a simple cluster with a logical volume (clustered).
2. open one window with 'top' to monitor the '%MEM'
3. in another terminal window run this simple loop and watch the memory grow:

while [ 1 -eq 1 ]; do vgdisplay ; done

or

while [ 1 -eq 1 ]; do lvdisplay ; done

  
Actual results:


Expected results:


Additional info:
Currently we are using Nagios, so it's the frequent NRPE calls to vgdisplay and lvdisplay that are doing it for us...

clvmd accumulates to over 1 GIG of memory after 8 days or so...

Comment 1 Christine Caulfield 2008-12-12 10:37:26 UTC
I have profiled clvmd quite extensively and can't find a memory leak in that code.

However, I did find a very small, occasional leak in libcman - the library that clvmd uses to communication with the cluster manager.

It only occurs when messages for the client (clvmd) are queued up, but running vgdisplay in a tight loop as you suggest showed up one or two 288-byte blocks leaked over a few minutes so it is conceivable that they could add up to a gigabyte over several days given that clvmd will probably be doing other things too.

The patch to fix this is in the git master and STABLE2 branches and we have just missed RHEL5.3 so I'll add this to the RHEL5.4 update.

Comment 2 Christine Caulfield 2008-12-16 08:53:32 UTC
commit 0f32c58025a75bcaf05f62b3ecd7e05e389c37eb
Author: Christine Caulfield <ccaulfie>
Date:   Fri Dec 12 10:30:12 2008 +0000

    cman: fix memory leak

Comment 6 Chris 2009-01-02 14:06:56 UTC
I'm uncertain to the access you have on your end; however ticket #1877308 explains a support representative unable to see this memory leak issue as resolved using the (new) stable RHEL branch.  

He also claims (the representative) that this issue does not currently appear RHEL 5.3 which is partially good news.  Therefore this issue got resolved by some other means...  Our company can not upgraded to v 5.3 until it leaves it's beta stages...  Even then we would require a long period of planning this upgrade.  

These comments i'm providing i have not backed up myself;  I'm just trusting the comments made by your support team over there.

Comment 7 Nicolas Varney 2009-01-12 10:33:01 UTC
We are impacted on this bug too.

On our clusters, some scripts check logical volume status, and cause memory usage of clvmd to grow...

The memory usage is growing on both nodes, not only on the node that run lvdisplay commands.

The resulting cluster is not stable...

This is a blocking point for us.

Comment 14 Debbie Johnson 2009-04-23 12:43:43 UTC
Good news !

Here are the new values, dated from today 22th :

----8<----
     VIT   RES    SHR
psp341 114m  88m    56m
psp342 114m  88m    56m
psu339 188m  98m    56m  
psu340 178m  88m    56m  
psi225 177m  87m    56m
psi227 113m  87m    56m
---->8----

psi225 and psi227 are new.

The value did not increase... leakage seems to have disappeared.

Comment 17 Chris Ward 2009-07-03 18:14:32 UTC
~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.

Comment 19 errata-xmlrpc 2009-09-02 11:07:06 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1341.html

Comment 21 Christine Caulfield 2014-06-30 11:52:20 UTC
Clear needinfo