Bug 212634

Summary: rgmanager times out when using clustat
Product: [Retired] Red Hat Cluster Suite Reporter: Lenny Maiorani <lenny>
Component: rgmanagerAssignee: Lon Hohberger <lhh>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: aberoham, cluster-maint, jplans, pdemauro, rkenna, tjaszowski
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0149 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-05-10 21:19:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 218112    
Attachments:
Description Flags
dlm debug and lock info from /proc
none
/proc/cluster/dlm_debug
none
/proc/meminfo
none
ps -auwwx
none
/proc/slabinfo
none
Fixes subtle dlm lock leak created by rgmanager
none
Source RPM with this patch + patch for 213312 none

Description Lenny Maiorani 2006-10-27 19:50:49 UTC
Description of problem:
rgmanager times out when attempting to get service list via clustat. locks also
are in an odd state

also, 'cat /proc/cluster/dlm_locks' reports "Cannot allocate memory" and node03
has dlm_recvd using about 50-95% of the CPU.

Version-Release number of selected component (if applicable):
RHEL4U4

How reproducible:
unknown

Steps to Reproduce:
1. unknown
2.
3.
  
Additional info:

will attach /proc/cluster/dlm_debug and /proc/cluster/dlm_locks (Magma) info

Comment 1 Lenny Maiorani 2006-10-27 19:50:49 UTC
Created attachment 139610 [details]
dlm debug and lock info from /proc

Comment 2 Lon Hohberger 2006-11-03 16:14:32 UTC
Lenny, when it can't allocate memory, is it userspace?  E.g. is there any
process obviously soaking up all memory on the system ?



Comment 3 Lenny Maiorani 2006-11-03 16:36:04 UTC
Memory usage was normal. Not sure if it is the 'cat' complaining about memory or
the /proc fs.


Comment 4 Lon Hohberger 2006-11-03 16:41:56 UTC
Can you get /proc/slabinfo from the nodes, and if possible, 'ps -auwwx'  ?

Comment 5 Lenny Maiorani 2006-11-03 16:51:19 UTC
We do not have a way of reproducing this, but if it comes up again I will get
this info.

Comment 6 Lenny Maiorani 2006-11-27 18:31:56 UTC
Lon, I am seeing this now on several clusters. They are all complaining in
/proc/cluster/dlm_debug from clvmd.

I will attach some logs...

Comment 7 Lenny Maiorani 2006-11-27 18:32:28 UTC
Created attachment 142198 [details]
/proc/cluster/dlm_debug

Comment 8 Lenny Maiorani 2006-11-27 18:32:59 UTC
Created attachment 142199 [details]
/proc/meminfo

Comment 9 Lenny Maiorani 2006-11-27 18:33:31 UTC
Created attachment 142200 [details]
ps -auwwx

Comment 10 Lenny Maiorani 2006-11-27 18:33:58 UTC
Created attachment 142201 [details]
/proc/slabinfo

Comment 11 Lon Hohberger 2006-12-11 20:02:38 UTC
Lenny, I am pretty sure this is a bug in rgmanager which is produced by the
clu_lock_verbose() function.

I'll have a build ready soon.  

Comment 12 Lon Hohberger 2006-12-11 20:08:17 UTC
Since the clu_lock_verbose() function does nothing useful, I'm removing it from
RHCS4 (it's already been removed in RHCS5).

Comment 13 Lon Hohberger 2006-12-12 20:40:56 UTC
Created attachment 143442 [details]
Fixes subtle dlm lock leak created by rgmanager

Comment 14 Lon Hohberger 2006-12-12 20:45:51 UTC
Created attachment 143443 [details]
Source RPM with this patch + patch for 213312

Comment 16 Lon Hohberger 2006-12-13 18:21:00 UTC
Fixes in CVS.

Comment 17 Lenny Maiorani 2007-01-03 17:27:02 UTC
Ok, I am running with this now. Let me get some bake time on it before declaring
this the fix.

Comment 18 Lon Hohberger 2007-01-09 21:14:10 UTC
Same fix(es), based on the 1.9.54 errata (exactly the same as .53, except it
includes an NFS fix)

http://people.redhat.com/lhh/rgmanager-1.9.54-2.218112hf.src.rpm
http://people.redhat.com/lhh/rgmanager-1.9.54-2.218112hf.x86_64.rpm
http://people.redhat.com/lhh/rgmanager-1.9.54-2.218112hf.i386.rpm



Comment 23 Lon Hohberger 2007-03-05 16:40:32 UTC
*** Bug 230830 has been marked as a duplicate of this bug. ***

Comment 27 Lon Hohberger 2007-03-21 18:43:30 UTC
Alternatively, we will be calling it 'beta' pretty soon.

Comment 34 Katriel Traum 2007-04-17 06:07:59 UTC
Can you specify what are "bad" values or increments to dlm_lkb in /proc/slabinfo?

Comment 39 Red Hat Bugzilla 2007-05-10 21:19:27 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0149.html