Bug 200841

Summary: rgmanager on cluster hung wuth "stuck with lock errors produced for 2+ until a reboot
Product: [Retired] Red Hat Cluster Suite Reporter: Scott Cannata <scott.cannata>
Component: rgmanagerAssignee: Lon Hohberger <lhh>
Status: CLOSED DUPLICATE QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: cluster-maint, lenny, teigland
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-10-05 20:06:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
messages from node2
none
messages file from node3
none
rgmanager we are using
none
magma we are using
none
magma plugins we are using none

Description Scott Cannata 2006-07-31 21:27:33 UTC
Description of problem:

Users not able to run clustat or bring services up. /var/log/messages
for 2+ hours produced:

    Jul 30 16:07:45 flsrv02 clurgmgrd[18827]: <warning> NodeID:0000000000000003 
    stuck with lock usrm::vf

Version-Release number of selected component (if applicable):


How reproducible:

Somtimes. Attemped to stop/restart rgmanager apps on all nodes 
system got into same state and one node fenced. Then another 
reboot of all nodes fixed it.


Steps to Reproduce:
1.
2.
3.
  
Actual results:

rgmanager is bad state producing above messages over and over.

Expected results:

rgmanager should come up and not get into this state.


Additional info:

Comment 1 Scott Cannata 2006-07-31 21:27:34 UTC
Created attachment 133359 [details]
messages from node2

Comment 2 Scott Cannata 2006-07-31 21:31:12 UTC
Created attachment 133360 [details]
messages file from node3

Comment 3 Lon Hohberger 2006-08-01 14:15:33 UTC
What version of rgmanager?

Comment 4 Lenny Maiorani 2006-08-02 16:50:15 UTC
U4pre1 that you provided with the magma changes as well.

Comment 5 Lenny Maiorani 2006-08-02 21:08:25 UTC
Created attachment 133519 [details]
rgmanager we are using

Comment 6 Lenny Maiorani 2006-08-02 21:09:21 UTC
Created attachment 133520 [details]
magma we are using

Comment 7 Lenny Maiorani 2006-08-02 21:10:08 UTC
Created attachment 133521 [details]
magma plugins we are using

Comment 8 Lenny Maiorani 2006-08-02 21:13:11 UTC
These patches came from bz #193128

Comment 9 Scott Cannata 2006-08-02 22:41:10 UTC
The "stuck lock" message started *after* the rgmanagers were sent a -9 signal.

WE noticed the stop script used SIGTERM to have a graceful exit and cleanup by
daemon and also noticed the stop script cleans up some lockfiles and pidfiles in
the filesystem.

Lon could this ungraceful way of stopping rgmanager (and then restarting it)
cause the issue ?  My guess is so as it mimicks a coredump/bug type scenario
where the app just abrupty exits with no cleanup.

If this the case, then we induced it here and this is a error in the use model.

Comment 11 Lon Hohberger 2006-10-03 14:36:05 UTC
The DLM should free up the locks after you kill rgmanager with -9, I should
think... but I could be mistaken on that.

Comment 12 David Teigland 2006-10-03 16:54:00 UTC
All the locks should be freed if the program is killed.
A dlm lock dump might help to see if anything is left:
echo "lockspace name" >> /proc/cluster/dlm_locks
cat /proc/cluster/dlm_locks > foo.txt


Comment 13 Lon Hohberger 2006-10-05 19:59:39 UTC
This could be related to #208968, actually

Comment 14 Lon Hohberger 2006-10-05 20:06:36 UTC

*** This bug has been marked as a duplicate of 208968 ***