Bug 212634 - rgmanager times out when using clustat
rgmanager times out when using clustat
Status: CLOSED ERRATA
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: rgmanager (Show other bugs)
4
All Linux
medium Severity medium
: ---
: ---
Assigned To: Lon Hohberger
Cluster QE
:
: 230830 (view as bug list)
Depends On:
Blocks: 218112
  Show dependency treegraph
 
Reported: 2006-10-27 15:50 EDT by Lenny Maiorani
Modified: 2009-04-16 16:21 EDT (History)
6 users (show)

See Also:
Fixed In Version: RHBA-2007-0149
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-05-10 17:19:27 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
dlm debug and lock info from /proc (1.21 KB, application/octet-stream)
2006-10-27 15:50 EDT, Lenny Maiorani
no flags Details
/proc/cluster/dlm_debug (1.00 KB, text/plain)
2006-11-27 13:32 EST, Lenny Maiorani
no flags Details
/proc/meminfo (646 bytes, text/plain)
2006-11-27 13:32 EST, Lenny Maiorani
no flags Details
ps -auwwx (13.34 KB, text/plain)
2006-11-27 13:33 EST, Lenny Maiorani
no flags Details
/proc/slabinfo (14.22 KB, text/plain)
2006-11-27 13:33 EST, Lenny Maiorani
no flags Details
Fixes subtle dlm lock leak created by rgmanager (4.29 KB, patch)
2006-12-12 15:40 EST, Lon Hohberger
no flags Details | Diff
Source RPM with this patch + patch for 213312 (186.60 KB, application/x-rpm)
2006-12-12 15:45 EST, Lon Hohberger
no flags Details

  None (edit)
Description Lenny Maiorani 2006-10-27 15:50:49 EDT
Description of problem:
rgmanager times out when attempting to get service list via clustat. locks also
are in an odd state

also, 'cat /proc/cluster/dlm_locks' reports "Cannot allocate memory" and node03
has dlm_recvd using about 50-95% of the CPU.

Version-Release number of selected component (if applicable):
RHEL4U4

How reproducible:
unknown

Steps to Reproduce:
1. unknown
2.
3.
  
Additional info:

will attach /proc/cluster/dlm_debug and /proc/cluster/dlm_locks (Magma) info
Comment 1 Lenny Maiorani 2006-10-27 15:50:49 EDT
Created attachment 139610 [details]
dlm debug and lock info from /proc
Comment 2 Lon Hohberger 2006-11-03 11:14:32 EST
Lenny, when it can't allocate memory, is it userspace?  E.g. is there any
process obviously soaking up all memory on the system ?

Comment 3 Lenny Maiorani 2006-11-03 11:36:04 EST
Memory usage was normal. Not sure if it is the 'cat' complaining about memory or
the /proc fs.
Comment 4 Lon Hohberger 2006-11-03 11:41:56 EST
Can you get /proc/slabinfo from the nodes, and if possible, 'ps -auwwx'  ?
Comment 5 Lenny Maiorani 2006-11-03 11:51:19 EST
We do not have a way of reproducing this, but if it comes up again I will get
this info.
Comment 6 Lenny Maiorani 2006-11-27 13:31:56 EST
Lon, I am seeing this now on several clusters. They are all complaining in
/proc/cluster/dlm_debug from clvmd.

I will attach some logs...
Comment 7 Lenny Maiorani 2006-11-27 13:32:28 EST
Created attachment 142198 [details]
/proc/cluster/dlm_debug
Comment 8 Lenny Maiorani 2006-11-27 13:32:59 EST
Created attachment 142199 [details]
/proc/meminfo
Comment 9 Lenny Maiorani 2006-11-27 13:33:31 EST
Created attachment 142200 [details]
ps -auwwx
Comment 10 Lenny Maiorani 2006-11-27 13:33:58 EST
Created attachment 142201 [details]
/proc/slabinfo
Comment 11 Lon Hohberger 2006-12-11 15:02:38 EST
Lenny, I am pretty sure this is a bug in rgmanager which is produced by the
clu_lock_verbose() function.

I'll have a build ready soon.  
Comment 12 Lon Hohberger 2006-12-11 15:08:17 EST
Since the clu_lock_verbose() function does nothing useful, I'm removing it from
RHCS4 (it's already been removed in RHCS5).
Comment 13 Lon Hohberger 2006-12-12 15:40:56 EST
Created attachment 143442 [details]
Fixes subtle dlm lock leak created by rgmanager
Comment 14 Lon Hohberger 2006-12-12 15:45:51 EST
Created attachment 143443 [details]
Source RPM with this patch + patch for 213312
Comment 16 Lon Hohberger 2006-12-13 13:21:00 EST
Fixes in CVS.
Comment 17 Lenny Maiorani 2007-01-03 12:27:02 EST
Ok, I am running with this now. Let me get some bake time on it before declaring
this the fix.
Comment 18 Lon Hohberger 2007-01-09 16:14:10 EST
Same fix(es), based on the 1.9.54 errata (exactly the same as .53, except it
includes an NFS fix)

http://people.redhat.com/lhh/rgmanager-1.9.54-2.218112hf.src.rpm
http://people.redhat.com/lhh/rgmanager-1.9.54-2.218112hf.x86_64.rpm
http://people.redhat.com/lhh/rgmanager-1.9.54-2.218112hf.i386.rpm

Comment 23 Lon Hohberger 2007-03-05 11:40:32 EST
*** Bug 230830 has been marked as a duplicate of this bug. ***
Comment 27 Lon Hohberger 2007-03-21 14:43:30 EDT
Alternatively, we will be calling it 'beta' pretty soon.
Comment 34 Katriel Traum 2007-04-17 02:07:59 EDT
Can you specify what are "bad" values or increments to dlm_lkb in /proc/slabinfo?
Comment 39 Red Hat Bugzilla 2007-05-10 17:19:27 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0149.html

Note You need to log in before you can comment on or make changes to this bug.