Bug 595547 - [RFE] Support for NFSv4 missing.
Summary: [RFE] Support for NFSv4 missing.
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: resource-agents
Version: 6.0
Hardware: All
OS: Linux
Target Milestone: rc
: ---
Assignee: Lon Hohberger
QA Contact: Cluster QE
: 692221 (view as bug list)
Depends On: 586461 633540
Blocks: 604740
TreeView+ depends on / blocked
Reported: 2010-05-24 22:41 UTC by Perry Myers
Modified: 2018-11-26 17:42 UTC (History)
18 users (show)

Fixed In Version: resource-agents-3.0.12-9.el6
Doc Type: Enhancement
Doc Text:
Clone Of: 586461
: 604740 (view as bug list)
Last Closed: 2011-05-19 14:20:39 UTC
Target Upstream Version:

Attachments (Terms of Use)
Currently not-shipped nfs server agent, updated to include the nfsv4 recovery directory (7.83 KB, text/plain)
2010-06-18 21:43 UTC, Lon Hohberger
no flags Details
Updated (8.54 KB, text/plain)
2010-06-24 15:35 UTC, Lon Hohberger
no flags Details
nfsserver, pass 3 (9.62 KB, text/plain)
2010-06-24 21:08 UTC, Lon Hohberger
no flags Details

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0744 normal SHIPPED_LIVE resource-agents bug fix and enhancement update 2011-05-18 18:09:07 UTC

Comment 1 RHEL Program Management 2010-05-24 22:50:45 UTC
This feature request did not get resolved in time for Feature Freeze
for the current Red Hat Enterprise Linux release and has now been
denied. You may re-open your request by requesting your support
representative to propose it for the next release.

Comment 25 Lon Hohberger 2010-06-16 15:31:28 UTC
Current option is to extend the nfsserver resource agent (currently -not- part of RHEL6 packaging) to support nfsv4.

NFSv4 will be per-cluster.

We need to, in addition, block use of nfsv4 with the nfsclient resource agent if possible.

Comment 26 Jeff Layton 2010-06-16 16:04:37 UTC
I've played around a little today with the v4recovery dir. When a lock is taken, the server drops a directory into that dir. If you shut down nfsd, the contents of that directory persist (as expected). When it starts up, it scrapes the contents of that dir for info about clientid's at the time of the shutdown. If the "new" server doesn't have access to those contents, then the server fails the reclaim:

76	293.110950	NFS	V4 COMP Reply (Call In 75) <EMPTY> PUTFH;OPEN OPEN(10034)

....10034 == NFS4ERR_BAD_RECLAIM

99	305.812153	NFS	V4 COMP Reply (Call In 98) <EMPTY> PUTFH;SAVEFH SAVEFH;OPEN OPEN(10013)

...10013 == NFS4ERR_GRACE. So I think the current plan is a valid one, the only other thing that is special for v4 is that the v4recoverydir will need to float between hosts. The default location for that is:


...that location is changeable though by echoing a new path into:


So far, I've only done this with locks (since it's easier to get them in a deterministic fashion than delegations), but I think delegations will be similar.

Comment 27 Lon Hohberger 2010-06-18 21:43:04 UTC
Created attachment 425260 [details]
Currently not-shipped nfs server agent, updated to include the nfsv4 recovery directory

This agent sits in the same place the old "nfsexport" agent would have sat within the service hierarchy.

  - manages nfsv3 lock recovery using rpc.statd
  - moves the nfsv3 statd directories and nfsv4 v4recovery
    directories around with the service by bind-mounting it
    over /var/lib/nfs
  - uses /etc/init.d/nfs to start everything except
  - starts rpc.statd with the HA-callout program set to
    itself and manages creation of /var/lib/nfs/statd/sm/*

I tested a couple of restarts and failovers using a configuration like this:

<service name="nfstest1">
	<ip address=""/>
	<fs device="/dev/vdb2" mountpoint="/mnt/nfs" name="fs1">
		<nfsserver name="the_nfs_server">
			<nfsclient name="client" options="rw,no_root_squash,no_all_squash,insecure" target="client"/>
			<nfsclient name="ayanami" options="rw,no_root_squash,no_all_squash,insecure" target=""/>

In this case, the export path would be inherited from the parent file system's "mounpoint" attribute and the nfspath (private nfs stuff) is relative to that (defaults to ".clumanager/nfs"; ".clumanager" is for historic purposes; it's where we have stored all failover service stuff in the past).

Comment 28 Jeff Layton 2010-06-22 21:07:02 UTC
If I'm reading the scripts correctly, then rpc_pipefs isn't mounted on failover. That could be a problem. rpc_pipefs is used to handle client-side idmap and gssd upcalls. There were also some vague plans at one point to use it for server side upcalls, but I don't think that has materialized as of yet. In any case, it's probably a good idea to mount that up as well on a failover.

NFSv4 also involves some more userspace daemons:

rpc.idmapd -- needed on client and server for idmapping
rpc.gssd -- client-side GSSAPI auth
rpc.svcgssd -- server-side GSSAPI auth

...you'll probably have to kill all 3 before you can unmount rpc_pipefs. You'll want to start rpc.idmapd on any host that can be a NFSv4 client or server. The other two can just be started on clients and servers where appropriate.

In fact, this may be good reason to consider making the "failover" state filesystem somewhere else other than /var/lib/nfs so that you can avoid mounting and unmounting rpc_pipefs on failover. That way you won't need to deal with any of these daemons.

Comment 30 Lon Hohberger 2010-06-24 14:39:37 UTC
Well, now I understand why we want to relocate everything except rpc_pipefs away from /var/lib/nfs.  It is set up to mount, hard-coded to /var/lib/nfs/rpc_pipefs, as a consequence of running 'modprobe sunrpc':

install sunrpc /sbin/modprobe --first-time --ignore-install sunrpc && { /bin/mount -t rpc_pipefs sunrpc /var/lib/nfs/rpc_pipefs > /dev/null 2>&1 || :; }
remove sunrpc { /bin/umount /var/lib/nfs/rpc_pipefs > /dev/null 2>&1 || :; } ; /sbin/modprobe -r --ignore-remove sunrpc

This is specifically what prevents simply moving around the whole /var/lib/nfs as part of a clustered service.

Comment 31 Lon Hohberger 2010-06-24 15:33:20 UTC
I don't think changing nfsv4recoverydir works:

Starting nfstest1...
<debug>  Running fsck on /dev/vdb2
Running fsck on /dev/vdb2
<info>   mounting /dev/vdb2 on /mnt/nfs
mounting /dev/vdb2 on /mnt/nfs
<err>    mount   /dev/vdb2 /mnt/nfs
mount   /dev/vdb2 /mnt/nfs
<info>   Starting NFS Server the_nfs_server
Starting NFS Server the_nfs_server
<debug>  mount -o bind /mnt/nfs/.clumanager/nfs/statd /var/lib/nfs/statd
mount -o bind /mnt/nfs/.clumanager/nfs/statd /var/lib/nfs/statd
<debug>  cp -a /mnt/nfs/.clumanager/nfs/etab /mnt/nfs/.clumanager/nfs/rmtab /mnt/nfs/.clumanager/nfs/xtab /var/lib/nfs
cp -a /mnt/nfs/.clumanager/nfs/etab /mnt/nfs/.clumanager/nfs/rmtab /mnt/nfs/.clumanager/nfs/xtab /var/lib/nfs
<debug>  restorecon /var/lib/nfs
restorecon /var/lib/nfs
<debug>  rpc.statd is already running
rpc.statd is already running
<err>    Failed to change NFSv4 recovery path
Failed to change NFSv4 recovery path
<err>    Wanted: /mnt/nfs/.clumanager/nfs/v4recoverydir; got /var/lib/nfs/v4recovery
Wanted: /mnt/nfs/.clumanager/nfs/v4recoverydir; got /var/lib/nfs/v4recovery
<err>    Failed to start NFS Server the_nfs_server

Comment 32 Lon Hohberger 2010-06-24 15:35:49 UTC
Created attachment 426617 [details]

Differences from previous version:

 * This version synchronizes /var/lib/nfs/*tab on to shared storage.
 * Attempts to redirect /proc/fs/nfsd/nfsv4recoverydir to shared storage.

Comment 33 Lon Hohberger 2010-06-24 15:36:19 UTC
It also bind mounts /var/lib/nfs/statd (from shared storage) instead of /var/lib/nfs now.

Comment 34 Lon Hohberger 2010-06-24 15:45:25 UTC
(In reply to comment #31)
> I don't think changing nfsv4recoverydir works:

Bug in script.

Comment 35 Lon Hohberger 2010-06-24 17:36:50 UTC
nfsv4recoverydir does not work if the directory contains spaces; I will file a separate bugzilla for this. It behaves as if the directory does not exist.

[root@crackle ~]# ls -ld /mnt/nfs\ two/.clumanager/nfs/v4recovery/
drwxr-xr-x. 2 root root 1024 Jun 24 11:47 /mnt/nfs two/.clumanager/nfs/v4recovery/
[root@crackle ~]# echo /mnt/nfs\ two/.clumanager/nfs/v4recovery > /proc/fs/nfsd/nfsv4recoverydir
[root@crackle ~]# cat /proc/fs/nfsd/nfsv4recoverydir 
[root@crackle ~]# echo "/mnt/nfs two/.clumanager/nfs/v4recovery" > /proc/fs/nfsd/nfsv4recoverydir
[root@crackle ~]# cat /proc/fs/nfsd/nfsv4recoverydir 
[root@crackle ~]# echo "/mnt/nfs%20two/.clumanager/nfs/v4recovery" > /proc/fs/nfsd/nfsv4recoverydir
[root@crackle ~]# cat /proc/fs/nfsd/nfsv4recoverydir 
[root@crackle ~]# echo "/tmp/mnt" > /proc/fs/nfsd/nfsv4recoverydir
[root@crackle ~]# cat /proc/fs/nfsd/nfsv4recoverydir 
[root@crackle ~]# ls /tmp/mnt
ls: cannot access /tmp/mnt: No such file or directory
[root@crackle ~]# mkdir /tmp/mnt
[root@crackle ~]# cat /proc/fs/nfsd/nfsv4recoverydir 
[root@crackle ~]# echo "/tmp/mnt" > /proc/fs/nfsd/nfsv4recoverydir
[root@crackle ~]# cat /proc/fs/nfsd/nfsv4recoverydir 

Comment 36 Lon Hohberger 2010-06-24 17:46:01 UTC
If the directory name up to the first space exists, it will change the v4 recovery directory to that:

[root@crackle ~]# echo $dir
/mnt/tmp two/.foo/bar baz
[root@crackle ~]# cat /proc/fs/nfsd/nfsv4recoverydir 
[root@crackle ~]# [ -d "$dir" ]; echo $?
[root@crackle ~]# echo "$dir" > /proc/fs/nfsd/nfsv4recoverydir 
[root@crackle ~]# cat /proc/fs/nfsd/nfsv4recoverydir
[root@crackle ~]# mkdir /mnt/tmp
[root@crackle ~]# echo "$dir" > /proc/fs/nfsd/nfsv4recoverydir 
[root@crackle ~]# cat /proc/fs/nfsd/nfsv4recoverydir
[root@crackle ~]# echo /var/lib/nfs/v4recovery > /proc/fs/nfsd/nfsv4recoverydir 
[root@crackle ~]# cat /proc/fs/nfsd/nfsv4recoverydir

Comment 38 Lon Hohberger 2010-06-24 20:47:18 UTC
The nfsserver script sends SM_NOTIFY for v3 clients, but the client I was using for V3 (while it did receive the SM_NOTIFY) either ignored it or failed to reclaim the lock.

It will require more testing to see if SM_NOTIFY sending is correct with the nfsserver agent; my V3 client was Fedora 11 and rpc.statd 1.1.2.

Comment 39 Lon Hohberger 2010-06-24 20:50:56 UTC
Upon further testing, the client received it; it just did not reclaim the lock.

Comment 40 Lon Hohberger 2010-06-24 21:08:21 UTC
Created attachment 426699 [details]
nfsserver, pass 3

Comment 42 Jeff Layton 2010-06-29 12:25:56 UTC
Looks good to me. There is a minor typo on line 50 in what I think is a user-visible string. Looks like this doesn't touch rpc_pipefs so that should avoid a lot the possible problems with the earlier script.

Comment 43 Jeff Layton 2010-06-29 12:29:26 UTC
One minor issue, not related to NFSv4:

        # Rip from nfslock
        ocf_log info "Stopping NFS lockd"
        if killkill lockd; then
                ocf_log debug "NFS lockd is stopped"
                ocf_log err "Failed to stop NFS lockd"
                return 1

Sending a SIGKILL to lockd does not actually stop it -- it just makes it drop all of its locks. It looks like killkill() sends 3 SIGKILLs too, which is probably a wasted effort. One should be fine.

Comment 44 Lon Hohberger 2010-06-29 18:05:59 UTC
Yes, and in addition, prior to this block (stop_locking), we stop NFS (excluding rpc.statd) anyway.

So, this entire block can be removed.

Comment 48 Corey Marthaler 2010-09-15 20:20:50 UTC
QA is unable to verify this 6.0 feature until bug 633540 is fixed.

Comment 51 releng-rhel@redhat.com 2010-11-10 22:15:58 UTC
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.

Comment 52 Lon Hohberger 2011-03-30 20:35:58 UTC
*** Bug 692221 has been marked as a duplicate of this bug. ***

Comment 55 errata-xmlrpc 2011-05-19 14:20:39 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.