Bug 178367 - kernel memory leak while reading from /proc/cluster/[nodes|services]
Summary: kernel memory leak while reading from /proc/cluster/[nodes|services]
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: cman
Version: 4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Robert Peterson
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-01-19 20:39 UTC by Lon Hohberger
Modified: 2009-04-16 20:28 UTC (History)
1 user (show)

Fixed In Version: RHBA-2006-0559
Clone Of:
Environment:
Last Closed: 2006-08-10 21:32:30 UTC
Embargoed:


Attachments (Terms of Use)
Proposed patch for /proc/cluster/services memory leak (17.91 KB, patch)
2006-01-19 23:13 UTC, Robert Peterson
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2006:0559 0 normal SHIPPED_LIVE cman-kernel bug fix update 2006-08-10 04:00:00 UTC

Description Lon Hohberger 2006-01-19 20:39:32 UTC
Description of problem:

If you read from /proc/cluster/services or /proc/cluster/nodes, CMAN appears to
leak memory if you watch /proc/slabinfo, specifically in the size-32 section.


Version-Release number of selected component (if applicable): 15-Jan-2006 CVS /
RHEL4

How reproducible: 100%

Steps to Reproduce:
1. Form a one-node cluster.  Don't start fenced or rgmanager.  Do not load the
DLM module or mount GFS file systems.

2. In one terminal window, run:
     while : ; do cat /proc/cluster/services ; done &> /dev/null
3. In another terminal window, run:
     watch --interval=1 "cat /proc/slabinfo | grep size-32\ "
  
Actual results: slab-32 grows unbounded, suggesting a memory leak when we read
from /proc/cluster/services.

Expected results: ?

Additional info:

I originally thought this was a memory leak in the DLM, which turned out to be
incorrect.

Comment 1 Robert Peterson 2006-01-19 23:13:44 UTC
Created attachment 123464 [details]
Proposed patch for /proc/cluster/services memory leak

This patch contains three things:

1. Patrick Caulfield's code fix for bz 177163 as of 19 January 2006.
2. Bob Peterson's patch for adding /proc/cluster/smsg_history and
/proc/cluster/msg_history for cman-kernel debugging purposes.
3. Bob Peterson's proposed patch to fix the memory leak when using
/proc/cluster/services and /proc/cluster/nodes

Regarding #2:
The fix is mainly for debugging purposes.  The patch as it stands allocates a
bunch of memory for the module that shouldn't be necessary for normal customer
situations.  I'm planning to revise it to make it more useful, i.e. able to
turn it on/off with a customer command and allocate/deallocate the memory as
needed by doing insmod of a companion module or possibly by echoing something
to another file in /proc/cluster/, etc.  This way, if a customer has a problem,
we can request that they turn the patch on and take information from the
failure, while still not impacting other customers.

Comment 2 Robert Peterson 2006-01-26 21:49:54 UTC
Additional notes:

I researched the proper way to use seq_files for /proc to make sure this fix
would work properly.  For example, see:

http://www.kernelnewbies.org/documents/seq_file_howto.txt

Excerpt from that document:
"struct seq_file contains a "void *private" that can be used by the
struct seq_operations functions to hold any private data that needs
to be available to all of these related methods.  For example, the
.start method might allocate some memory and save its address in
seq_file.private so that the .next and .show methods can use it,
then the .stop method would free that memory."

Also, I unit-tested this fix with printk messages to verify it was working as
planned and all kmallocs were paired with their proper kfrees.


Comment 5 Red Hat Bugzilla 2006-08-10 21:32:30 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0559.html



Note You need to log in before you can comment on or make changes to this bug.