Bug 200841
Summary: | rgmanager on cluster hung wuth "stuck with lock errors produced for 2+ until a reboot | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Scott Cannata <scott.cannata> | ||||||||||||
Component: | rgmanager | Assignee: | Lon Hohberger <lhh> | ||||||||||||
Status: | CLOSED DUPLICATE | QA Contact: | Cluster QE <mspqa-list> | ||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||
Priority: | medium | ||||||||||||||
Version: | 4 | CC: | cluster-maint, lenny, teigland | ||||||||||||
Target Milestone: | --- | ||||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | x86_64 | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2006-10-05 20:06:36 UTC | Type: | --- | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Attachments: |
|
Description
Scott Cannata
2006-07-31 21:27:33 UTC
Created attachment 133359 [details]
messages from node2
Created attachment 133360 [details]
messages file from node3
What version of rgmanager? U4pre1 that you provided with the magma changes as well. Created attachment 133519 [details]
rgmanager we are using
Created attachment 133520 [details]
magma we are using
Created attachment 133521 [details]
magma plugins we are using
These patches came from bz #193128 The "stuck lock" message started *after* the rgmanagers were sent a -9 signal. WE noticed the stop script used SIGTERM to have a graceful exit and cleanup by daemon and also noticed the stop script cleans up some lockfiles and pidfiles in the filesystem. Lon could this ungraceful way of stopping rgmanager (and then restarting it) cause the issue ? My guess is so as it mimicks a coredump/bug type scenario where the app just abrupty exits with no cleanup. If this the case, then we induced it here and this is a error in the use model. The DLM should free up the locks after you kill rgmanager with -9, I should think... but I could be mistaken on that. All the locks should be freed if the program is killed. A dlm lock dump might help to see if anything is left: echo "lockspace name" >> /proc/cluster/dlm_locks cat /proc/cluster/dlm_locks > foo.txt This could be related to #208968, actually *** This bug has been marked as a duplicate of 208968 *** |