Bug 200841

Summary:

rgmanager on cluster hung wuth "stuck with lock errors produced for 2+ until a reboot

Product:

[Retired] Red Hat Cluster Suite

Reporter:

Scott Cannata <scott.cannata>

Component:

rgmanager

Assignee:

Lon Hohberger <lhh>

Status:

CLOSED DUPLICATE

QA Contact:

Cluster QE <mspqa-list>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

CC:

cluster-maint, lenny, teigland

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2006-10-05 20:06:36 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
messages from node2	none
messages file from node3	none
rgmanager we are using	none
magma we are using	none
magma plugins we are using	none

Description Scott Cannata 2006-07-31 21:27:33 UTC

Description of problem:

Users not able to run clustat or bring services up. /var/log/messages
for 2+ hours produced:

    Jul 30 16:07:45 flsrv02 clurgmgrd[18827]: <warning> NodeID:0000000000000003 
    stuck with lock usrm::vf

Version-Release number of selected component (if applicable):


How reproducible:

Somtimes. Attemped to stop/restart rgmanager apps on all nodes 
system got into same state and one node fenced. Then another 
reboot of all nodes fixed it.


Steps to Reproduce:
1.
2.
3.
  
Actual results:

rgmanager is bad state producing above messages over and over.

Expected results:

rgmanager should come up and not get into this state.


Additional info:

Comment 1 Scott Cannata 2006-07-31 21:27:34 UTC

Created attachment 133359 [details]
messages from node2

Comment 2 Scott Cannata 2006-07-31 21:31:12 UTC

Created attachment 133360 [details]
messages file from node3

Comment 3 Lon Hohberger 2006-08-01 14:15:33 UTC

What version of rgmanager?

Comment 4 Lenny Maiorani 2006-08-02 16:50:15 UTC

U4pre1 that you provided with the magma changes as well.

Comment 5 Lenny Maiorani 2006-08-02 21:08:25 UTC

Created attachment 133519 [details]
rgmanager we are using

Comment 6 Lenny Maiorani 2006-08-02 21:09:21 UTC

Created attachment 133520 [details]
magma we are using

Comment 7 Lenny Maiorani 2006-08-02 21:10:08 UTC

Created attachment 133521 [details]
magma plugins we are using

Comment 8 Lenny Maiorani 2006-08-02 21:13:11 UTC

These patches came from bz #193128

Comment 9 Scott Cannata 2006-08-02 22:41:10 UTC

The "stuck lock" message started *after* the rgmanagers were sent a -9 signal.

WE noticed the stop script used SIGTERM to have a graceful exit and cleanup by
daemon and also noticed the stop script cleans up some lockfiles and pidfiles in
the filesystem.

Lon could this ungraceful way of stopping rgmanager (and then restarting it)
cause the issue ?  My guess is so as it mimicks a coredump/bug type scenario
where the app just abrupty exits with no cleanup.

If this the case, then we induced it here and this is a error in the use model.

Comment 11 Lon Hohberger 2006-10-03 14:36:05 UTC

The DLM should free up the locks after you kill rgmanager with -9, I should
think... but I could be mistaken on that.

Comment 12 David Teigland 2006-10-03 16:54:00 UTC

All the locks should be freed if the program is killed.
A dlm lock dump might help to see if anything is left:
echo "lockspace name" >> /proc/cluster/dlm_locks
cat /proc/cluster/dlm_locks > foo.txt

Comment 13 Lon Hohberger 2006-10-05 19:59:39 UTC

This could be related to #208968, actually

Comment 14 Lon Hohberger 2006-10-05 20:06:36 UTC


*** This bug has been marked as a duplicate of 208968 ***