188023 – active NFS POSIX locks prevent umount from occuring during failover

Bug 188023 - active NFS POSIX locks prevent umount from occuring during failover

Summary: active NFS POSIX locks prevent umount from occuring during failover

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	clumanager
Sub Component:
Version:	3
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Lon Hohberger
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:	167636
Blocks:
TreeView+	depends on / blocked

Reported:	2006-04-05 13:32 UTC by Jeff Layton
Modified:	2014-06-18 07:35 UTC (History)
CC List:	3 users (show)
Fixed In Version:	RHBA-2006-0505
Clone Of:
Environment:
Last Closed:	2006-08-10 14:13:47 UTC
Embargoed:

Attachments	(Terms of Use)
proposed patch for svclib_filesystem (847 bytes, patch) 2006-04-05 14:11 UTC, Jeff Layton	no flags	Details \| Diff
new patch that checks cludb setting (1.46 KB, patch) 2006-04-16 12:27 UTC, Jeff Layton	no flags	Details \| Diff
NFS drop / reclaim / etc. patch (19.49 KB, patch) 2006-05-03 22:45 UTC, Lon Hohberger	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2006:0505	0	normal	SHIPPED_LIVE	clumanager bug fix update	2006-08-10 04:00:00 UTC

Comment 1 Jeff Layton 2006-04-05 14:11:53 UTC

Created attachment 127353 [details]
proposed patch for svclib_filesystem

This patch is contingent on the kernel interface for it (BZ 180524) going in
(though using this script on a kernel without it shouldn't hurt, it just won't
do anything).

This adds a new function to the script to echo the device into the /proc file,
and calls it if $force_umount is set.

This is not tested as of yet, and will need to be before we can hand off to the
customer.

Comment 2 Jeff Layton 2006-04-11 22:56:32 UTC

Since the kernel interface is officially NAK'ed, I suppose we can use the
original patch I sent up to fix this as a starting point:

https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=124399

Some design questions:

1) do we want to do like this patch does, and only send a SIGKILL to lockd as a
last resort (only if all prior umount attempts fail), or do we want to have it
send on the first attempt?

I the former is probably better, in that it won't be as likely to cause locking
issues, but the latter might be better from a predictability standpoint (users
can expect that all locks will get dropped whenever the service fails over).

2) Do we want this to be a global, per-service, or per-mount option?

Comment 3 Lon Hohberger 2006-04-12 17:33:30 UTC

Per-service is difficult to document in a user-friendly way.

I would only drop locks when *absolutely* necessary.

Comment 4 Lon Hohberger 2006-04-12 17:38:04 UTC

Note that this would mean that -

* It's a global option of whether to try killing lockd on unmount, and that
* per-device "force unmount" would need to be enabled in order for lockd to be
killed

So, it's coarse "load the bazooka" at the global level, and "fire the bazooka if
this device doesn't unmount" at the device level.

Comment 5 Jeff Layton 2006-04-16 12:27:53 UTC

Created attachment 127796 [details]
new patch that checks cludb setting

New patch based on Lon's last comments. This one checks for a global cludb
setting (clusvcmgrd%nlm_drop_locks) if $force_umount is set. Then on the last
pass on attempting to unmount the filesystem, we'll send the SIGKILL to lockd.

I've not tested this patch, but I think it will work, though you may want to
add some more indirection and such.

Comment 6 Lon Hohberger 2006-04-18 14:49:32 UTC

hah! That looks good

Comment 7 Lon Hohberger 2006-05-03 17:00:26 UTC

FYI, as it turns out, more than this is going into U8, at the last minute.  We
should have lock reclaims on relocation, but it will not work on failover
because there's no HA-callout in RHEL3 nfs-utils.

I also hit a point where a lock being held somehow seemed to prevent the device
from being unmounted - even if I stopped NFS (including lockd) entirely.  I
don't understand this one, but it's beyond the scope of this bugzilla.

Comment 10 Lon Hohberger 2006-05-03 22:45:29 UTC

Created attachment 128570 [details]
NFS drop / reclaim / etc. patch

Big patch which issues reclaims after killing lockd.

Comment 11 Lon Hohberger 2006-05-04 14:42:52 UTC

Jeff, would you prefer putting both in or just the big-one?

Comment 12 Jeff Layton 2006-05-04 15:27:31 UTC

I didn't go over the whole thing, but your patch looks like a superset of mine,
so I think yours would be sufficient here. Unless I'm missing something here?

Comment 14 Lon Hohberger 2006-05-16 16:17:26 UTC

From RHCS perspective, this is done.

However, there's nothing I can do about 167636 from userspace; it seems that if
you kill lockd after taking a lock from an NFS client, there's a good chance
that the lock will not correctly get dropped, and calling umount will return EBUSY.

See bugzilla 167636 for more details.

So, until the kernel side is fixed, this bug will remain open.

Comment 15 Lon Hohberger 2006-05-25 13:53:13 UTC

167636 is closed -> wontfix

Comment 22 Red Hat Bugzilla 2006-08-10 14:13:47 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0505.html

Note You need to log in before you can comment on or make changes to this bug.