Bug 601703
Summary: | User process quitting unexpectedly can leave locks hanging around | ||||||
---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Christine Caulfield <ccaulfie> | ||||
Component: | dlm-kernel | Assignee: | David Teigland <teigland> | ||||
Status: | CLOSED DUPLICATE | QA Contact: | Cluster QE <mspqa-list> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 4 | CC: | cfeist, cluster-maint, edamato, raud | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-07-14 16:10:48 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Christine Caulfield
2010-06-08 13:45:33 UTC
There are really two, possibly three problems here. The first thing is the realisation that unlocks can return EINVAL if the lock is in the wrong state for unlocking. The device unlock code doesn't handle that, it assumes that EINVAL means that the caller got something wrong and so cancels the attempt to clear the lock. This can leave locks lying around when the process exits. The second problem is that the code that handles returned status from unlocks on a remote node also assume that unlocks cannot fail. There is even a comment to this effect in the code. So if EINVAL is received from the remote node it gets ignored and the local copy of a lock is removed from its queue when it shouldn't be. Thirdly, and this is only disputably a problem. How does a lock get into the state where an unlock can cause EINVAL in the first place? This is basically a race where a cancel request and a grant cross on the network so that the master node thinks the lock is granted and the local node doesn't. It's actually even more complicated that that - but, as a description, it'll do. Created attachment 422240 [details]
Patch to fix
The RHEL4.5 equivalent of this patch works for me but I have not yet heard back from the customer.
This patch is for the RHEL4 branch of git, I know it compiles but haven't tested it on RHEL4.8+ yet
recent related work has been happening on bug 645531 this one should probably be close as dup of that If someone has a problem with this, they should be able to work around it using the option from bug 645531. *** This bug has been marked as a duplicate of bug 645531 *** |