Red Hat Bugzilla – Bug 212634
rgmanager times out when using clustat
Last modified: 2009-04-16 16:21:25 EDT
Description of problem:
rgmanager times out when attempting to get service list via clustat. locks also
are in an odd state
also, 'cat /proc/cluster/dlm_locks' reports "Cannot allocate memory" and node03
has dlm_recvd using about 50-95% of the CPU.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
will attach /proc/cluster/dlm_debug and /proc/cluster/dlm_locks (Magma) info
Created attachment 139610 [details]
dlm debug and lock info from /proc
Lenny, when it can't allocate memory, is it userspace? E.g. is there any
process obviously soaking up all memory on the system ?
Memory usage was normal. Not sure if it is the 'cat' complaining about memory or
the /proc fs.
Can you get /proc/slabinfo from the nodes, and if possible, 'ps -auwwx' ?
We do not have a way of reproducing this, but if it comes up again I will get
Lon, I am seeing this now on several clusters. They are all complaining in
/proc/cluster/dlm_debug from clvmd.
I will attach some logs...
Created attachment 142198 [details]
Created attachment 142199 [details]
Created attachment 142200 [details]
Created attachment 142201 [details]
Lenny, I am pretty sure this is a bug in rgmanager which is produced by the
I'll have a build ready soon.
Since the clu_lock_verbose() function does nothing useful, I'm removing it from
RHCS4 (it's already been removed in RHCS5).
Created attachment 143442 [details]
Fixes subtle dlm lock leak created by rgmanager
Created attachment 143443 [details]
Source RPM with this patch + patch for 213312
Binary RPMs (*will* be removed when RHCS 4.5 becomes available):
Fixes in CVS.
Ok, I am running with this now. Let me get some bake time on it before declaring
this the fix.
Same fix(es), based on the 1.9.54 errata (exactly the same as .53, except it
includes an NFS fix)
*** Bug 230830 has been marked as a duplicate of this bug. ***
Alternatively, we will be calling it 'beta' pretty soon.
Can you specify what are "bad" values or increments to dlm_lkb in /proc/slabinfo?
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.