Bug 200841 - rgmanager on cluster hung wuth "stuck with lock errors produced for 2+ until a reboot
rgmanager on cluster hung wuth "stuck with lock errors produced for 2+ until ...
Status: CLOSED DUPLICATE of bug 208968
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: rgmanager (Show other bugs)
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Lon Hohberger
Cluster QE
Depends On:
  Show dependency treegraph
Reported: 2006-07-31 17:27 EDT by Scott Cannata
Modified: 2009-04-16 16:20 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2006-10-05 16:06:36 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
messages from node2 (3.84 MB, application/octet-stream)
2006-07-31 17:27 EDT, Scott Cannata
no flags Details
messages file from node3 (3.18 MB, application/octet-stream)
2006-07-31 17:31 EDT, Scott Cannata
no flags Details
rgmanager we are using (194.73 KB, application/octet-stream)
2006-08-02 17:08 EDT, Lenny Maiorani
no flags Details
magma we are using (247.79 KB, application/octet-stream)
2006-08-02 17:09 EDT, Lenny Maiorani
no flags Details
magma plugins we are using (36.76 KB, application/octet-stream)
2006-08-02 17:10 EDT, Lenny Maiorani
no flags Details

  None (edit)
Description Scott Cannata 2006-07-31 17:27:33 EDT
Description of problem:

Users not able to run clustat or bring services up. /var/log/messages
for 2+ hours produced:

    Jul 30 16:07:45 flsrv02 clurgmgrd[18827]: <warning> NodeID:0000000000000003 
    stuck with lock usrm::vf

Version-Release number of selected component (if applicable):

How reproducible:

Somtimes. Attemped to stop/restart rgmanager apps on all nodes 
system got into same state and one node fenced. Then another 
reboot of all nodes fixed it.

Steps to Reproduce:
Actual results:

rgmanager is bad state producing above messages over and over.

Expected results:

rgmanager should come up and not get into this state.

Additional info:
Comment 1 Scott Cannata 2006-07-31 17:27:34 EDT
Created attachment 133359 [details]
messages from node2
Comment 2 Scott Cannata 2006-07-31 17:31:12 EDT
Created attachment 133360 [details]
messages file from node3
Comment 3 Lon Hohberger 2006-08-01 10:15:33 EDT
What version of rgmanager?
Comment 4 Lenny Maiorani 2006-08-02 12:50:15 EDT
U4pre1 that you provided with the magma changes as well.
Comment 5 Lenny Maiorani 2006-08-02 17:08:25 EDT
Created attachment 133519 [details]
rgmanager we are using
Comment 6 Lenny Maiorani 2006-08-02 17:09:21 EDT
Created attachment 133520 [details]
magma we are using
Comment 7 Lenny Maiorani 2006-08-02 17:10:08 EDT
Created attachment 133521 [details]
magma plugins we are using
Comment 8 Lenny Maiorani 2006-08-02 17:13:11 EDT
These patches came from bz #193128
Comment 9 Scott Cannata 2006-08-02 18:41:10 EDT
The "stuck lock" message started *after* the rgmanagers were sent a -9 signal.

WE noticed the stop script used SIGTERM to have a graceful exit and cleanup by
daemon and also noticed the stop script cleans up some lockfiles and pidfiles in
the filesystem.

Lon could this ungraceful way of stopping rgmanager (and then restarting it)
cause the issue ?  My guess is so as it mimicks a coredump/bug type scenario
where the app just abrupty exits with no cleanup.

If this the case, then we induced it here and this is a error in the use model.
Comment 11 Lon Hohberger 2006-10-03 10:36:05 EDT
The DLM should free up the locks after you kill rgmanager with -9, I should
think... but I could be mistaken on that.
Comment 12 David Teigland 2006-10-03 12:54:00 EDT
All the locks should be freed if the program is killed.
A dlm lock dump might help to see if anything is left:
echo "lockspace name" >> /proc/cluster/dlm_locks
cat /proc/cluster/dlm_locks > foo.txt
Comment 13 Lon Hohberger 2006-10-05 15:59:39 EDT
This could be related to #208968, actually
Comment 14 Lon Hohberger 2006-10-05 16:06:36 EDT

*** This bug has been marked as a duplicate of 208968 ***

Note You need to log in before you can comment on or make changes to this bug.