Bug 188023
Summary: | active NFS POSIX locks prevent umount from occuring during failover | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Jeff Layton <jlayton> | ||||||||
Component: | clumanager | Assignee: | Lon Hohberger <lhh> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 3 | CC: | cluster-maint, steved, tao | ||||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | RHBA-2006-0505 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2006-08-10 14:13:47 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | 167636 | ||||||||||
Bug Blocks: | |||||||||||
Attachments: |
|
Comment 1
Jeff Layton
2006-04-05 14:11:53 UTC
Since the kernel interface is officially NAK'ed, I suppose we can use the original patch I sent up to fix this as a starting point: https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=124399 Some design questions: 1) do we want to do like this patch does, and only send a SIGKILL to lockd as a last resort (only if all prior umount attempts fail), or do we want to have it send on the first attempt? I the former is probably better, in that it won't be as likely to cause locking issues, but the latter might be better from a predictability standpoint (users can expect that all locks will get dropped whenever the service fails over). 2) Do we want this to be a global, per-service, or per-mount option? Per-service is difficult to document in a user-friendly way. I would only drop locks when *absolutely* necessary. Note that this would mean that - * It's a global option of whether to try killing lockd on unmount, and that * per-device "force unmount" would need to be enabled in order for lockd to be killed So, it's coarse "load the bazooka" at the global level, and "fire the bazooka if this device doesn't unmount" at the device level. Created attachment 127796 [details]
new patch that checks cludb setting
New patch based on Lon's last comments. This one checks for a global cludb
setting (clusvcmgrd%nlm_drop_locks) if $force_umount is set. Then on the last
pass on attempting to unmount the filesystem, we'll send the SIGKILL to lockd.
I've not tested this patch, but I think it will work, though you may want to
add some more indirection and such.
hah! That looks good FYI, as it turns out, more than this is going into U8, at the last minute. We should have lock reclaims on relocation, but it will not work on failover because there's no HA-callout in RHEL3 nfs-utils. I also hit a point where a lock being held somehow seemed to prevent the device from being unmounted - even if I stopped NFS (including lockd) entirely. I don't understand this one, but it's beyond the scope of this bugzilla. Created attachment 128570 [details]
NFS drop / reclaim / etc. patch
Big patch which issues reclaims after killing lockd.
Jeff, would you prefer putting both in or just the big-one? I didn't go over the whole thing, but your patch looks like a superset of mine, so I think yours would be sufficient here. Unless I'm missing something here? From RHCS perspective, this is done. However, there's nothing I can do about 167636 from userspace; it seems that if you kill lockd after taking a lock from an NFS client, there's a good chance that the lock will not correctly get dropped, and calling umount will return EBUSY. See bugzilla 167636 for more details. So, until the kernel side is fixed, this bug will remain open. 167636 is closed -> wontfix An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2006-0505.html |