Bug 595547
| Summary: | [RFE] Support for NFSv4 missing. | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Perry Myers <pmyers> | ||||||||
| Component: | resource-agents | Assignee: | Lon Hohberger <lhh> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||||||
| Severity: | medium | Docs Contact: | |||||||||
| Priority: | high | ||||||||||
| Version: | 6.0 | CC: | cluster-maint, cmarthal, cww, ddumas, djansa, djuran, edamato, herrold, jlayton, lhh, mwaltz, rwheeler, ssaha, steved, swhiteho, syeghiay, tao, zbrown | ||||||||
| Target Milestone: | rc | Keywords: | FutureFeature, Reopened, TestOnly, Triaged | ||||||||
| Target Release: | --- | ||||||||||
| Hardware: | All | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | resource-agents-3.0.12-9.el6 | Doc Type: | Enhancement | ||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | 586461 | ||||||||||
| : | 604740 (view as bug list) | Environment: | |||||||||
| Last Closed: | 2011-05-19 14:20:39 UTC | Type: | --- | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | 586461, 633540 | ||||||||||
| Bug Blocks: | 604740 | ||||||||||
| Attachments: |
|
||||||||||
|
Comment 1
RHEL Program Management
2010-05-24 22:50:45 UTC
Current option is to extend the nfsserver resource agent (currently -not- part of RHEL6 packaging) to support nfsv4. NFSv4 will be per-cluster. We need to, in addition, block use of nfsv4 with the nfsclient resource agent if possible. I've played around a little today with the v4recovery dir. When a lock is taken, the server drops a directory into that dir. If you shut down nfsd, the contents of that directory persist (as expected). When it starts up, it scrapes the contents of that dir for info about clientid's at the time of the shutdown. If the "new" server doesn't have access to those contents, then the server fails the reclaim:
76 293.110950 192.168.1.2 192.168.1.3 NFS V4 COMP Reply (Call In 75) <EMPTY> PUTFH;OPEN OPEN(10034)
....10034 == NFS4ERR_BAD_RECLAIM
99 305.812153 192.168.1.2 192.168.1.3 NFS V4 COMP Reply (Call In 98) <EMPTY> PUTFH;SAVEFH SAVEFH;OPEN OPEN(10013)
...10013 == NFS4ERR_GRACE. So I think the current plan is a valid one, the only other thing that is special for v4 is that the v4recoverydir will need to float between hosts. The default location for that is:
/var/lib/nfs/v4recovery
...that location is changeable though by echoing a new path into:
/proc/fs/nfsd/nfsv4recoverydir.
So far, I've only done this with locks (since it's easier to get them in a deterministic fashion than delegations), but I think delegations will be similar.
Created attachment 425260 [details]
Currently not-shipped nfs server agent, updated to include the nfsv4 recovery directory
This agent sits in the same place the old "nfsexport" agent would have sat within the service hierarchy.
It:
- manages nfsv3 lock recovery using rpc.statd
- moves the nfsv3 statd directories and nfsv4 v4recovery
directories around with the service by bind-mounting it
over /var/lib/nfs
- uses /etc/init.d/nfs to start everything except
statd
- starts rpc.statd with the HA-callout program set to
itself and manages creation of /var/lib/nfs/statd/sm/*
I tested a couple of restarts and failovers using a configuration like this:
<service name="nfstest1">
<ip address="192.168.122.151"/>
<fs device="/dev/vdb2" mountpoint="/mnt/nfs" name="fs1">
<nfsserver name="the_nfs_server">
<nfsclient name="client" options="rw,no_root_squash,no_all_squash,insecure" target="client"/>
<nfsclient name="ayanami" options="rw,no_root_squash,no_all_squash,insecure" target="192.168.122.1"/>
</nfsserver>
</fs>
</service>
In this case, the export path would be inherited from the parent file system's "mounpoint" attribute and the nfspath (private nfs stuff) is relative to that (defaults to ".clumanager/nfs"; ".clumanager" is for historic purposes; it's where we have stored all failover service stuff in the past).
If I'm reading the scripts correctly, then rpc_pipefs isn't mounted on failover. That could be a problem. rpc_pipefs is used to handle client-side idmap and gssd upcalls. There were also some vague plans at one point to use it for server side upcalls, but I don't think that has materialized as of yet. In any case, it's probably a good idea to mount that up as well on a failover. NFSv4 also involves some more userspace daemons: rpc.idmapd -- needed on client and server for idmapping rpc.gssd -- client-side GSSAPI auth rpc.svcgssd -- server-side GSSAPI auth ...you'll probably have to kill all 3 before you can unmount rpc_pipefs. You'll want to start rpc.idmapd on any host that can be a NFSv4 client or server. The other two can just be started on clients and servers where appropriate. In fact, this may be good reason to consider making the "failover" state filesystem somewhere else other than /var/lib/nfs so that you can avoid mounting and unmounting rpc_pipefs on failover. That way you won't need to deal with any of these daemons. Well, now I understand why we want to relocate everything except rpc_pipefs away from /var/lib/nfs. It is set up to mount, hard-coded to /var/lib/nfs/rpc_pipefs, as a consequence of running 'modprobe sunrpc':
...
install sunrpc /sbin/modprobe --first-time --ignore-install sunrpc && { /bin/mount -t rpc_pipefs sunrpc /var/lib/nfs/rpc_pipefs > /dev/null 2>&1 || :; }
...
remove sunrpc { /bin/umount /var/lib/nfs/rpc_pipefs > /dev/null 2>&1 || :; } ; /sbin/modprobe -r --ignore-remove sunrpc
...
This is specifically what prevents simply moving around the whole /var/lib/nfs as part of a clustered service.
I don't think changing nfsv4recoverydir works: Starting nfstest1... <debug> Running fsck on /dev/vdb2 Running fsck on /dev/vdb2 <info> mounting /dev/vdb2 on /mnt/nfs mounting /dev/vdb2 on /mnt/nfs <err> mount /dev/vdb2 /mnt/nfs mount /dev/vdb2 /mnt/nfs <info> Starting NFS Server the_nfs_server Starting NFS Server the_nfs_server <debug> mount -o bind /mnt/nfs/.clumanager/nfs/statd /var/lib/nfs/statd mount -o bind /mnt/nfs/.clumanager/nfs/statd /var/lib/nfs/statd <debug> cp -a /mnt/nfs/.clumanager/nfs/etab /mnt/nfs/.clumanager/nfs/rmtab /mnt/nfs/.clumanager/nfs/xtab /var/lib/nfs cp -a /mnt/nfs/.clumanager/nfs/etab /mnt/nfs/.clumanager/nfs/rmtab /mnt/nfs/.clumanager/nfs/xtab /var/lib/nfs <debug> restorecon /var/lib/nfs restorecon /var/lib/nfs <debug> rpc.statd is already running rpc.statd is already running <err> Failed to change NFSv4 recovery path Failed to change NFSv4 recovery path <err> Wanted: /mnt/nfs/.clumanager/nfs/v4recoverydir; got /var/lib/nfs/v4recovery Wanted: /mnt/nfs/.clumanager/nfs/v4recoverydir; got /var/lib/nfs/v4recovery <err> Failed to start NFS Server the_nfs_server Created attachment 426617 [details]
Updated
Differences from previous version:
* This version synchronizes /var/lib/nfs/*tab on to shared storage.
* Attempts to redirect /proc/fs/nfsd/nfsv4recoverydir to shared storage.
It also bind mounts /var/lib/nfs/statd (from shared storage) instead of /var/lib/nfs now. (In reply to comment #31) > I don't think changing nfsv4recoverydir works: Bug in script. nfsv4recoverydir does not work if the directory contains spaces; I will file a separate bugzilla for this. It behaves as if the directory does not exist. [root@crackle ~]# ls -ld /mnt/nfs\ two/.clumanager/nfs/v4recovery/ drwxr-xr-x. 2 root root 1024 Jun 24 11:47 /mnt/nfs two/.clumanager/nfs/v4recovery/ [root@crackle ~]# echo /mnt/nfs\ two/.clumanager/nfs/v4recovery > /proc/fs/nfsd/nfsv4recoverydir [root@crackle ~]# cat /proc/fs/nfsd/nfsv4recoverydir /mnt/nfs [root@crackle ~]# echo "/mnt/nfs two/.clumanager/nfs/v4recovery" > /proc/fs/nfsd/nfsv4recoverydir [root@crackle ~]# cat /proc/fs/nfsd/nfsv4recoverydir /mnt/nfs [root@crackle ~]# echo "/mnt/nfs%20two/.clumanager/nfs/v4recovery" > /proc/fs/nfsd/nfsv4recoverydir [root@crackle ~]# cat /proc/fs/nfsd/nfsv4recoverydir /mnt/nfs [root@crackle ~]# echo "/tmp/mnt" > /proc/fs/nfsd/nfsv4recoverydir [root@crackle ~]# cat /proc/fs/nfsd/nfsv4recoverydir /mnt/nfs [root@crackle ~]# ls /tmp/mnt ls: cannot access /tmp/mnt: No such file or directory [root@crackle ~]# mkdir /tmp/mnt [root@crackle ~]# cat /proc/fs/nfsd/nfsv4recoverydir /mnt/nfs [root@crackle ~]# echo "/tmp/mnt" > /proc/fs/nfsd/nfsv4recoverydir [root@crackle ~]# cat /proc/fs/nfsd/nfsv4recoverydir /tmp/mnt If the directory name up to the first space exists, it will change the v4 recovery directory to that: [root@crackle ~]# echo $dir /mnt/tmp two/.foo/bar baz [root@crackle ~]# cat /proc/fs/nfsd/nfsv4recoverydir /var/lib/nfs/v4recovery [root@crackle ~]# [ -d "$dir" ]; echo $? 0 [root@crackle ~]# echo "$dir" > /proc/fs/nfsd/nfsv4recoverydir [root@crackle ~]# cat /proc/fs/nfsd/nfsv4recoverydir /var/lib/nfs/v4recovery [root@crackle ~]# mkdir /mnt/tmp [root@crackle ~]# echo "$dir" > /proc/fs/nfsd/nfsv4recoverydir [root@crackle ~]# cat /proc/fs/nfsd/nfsv4recoverydir /mnt/tmp [root@crackle ~]# echo /var/lib/nfs/v4recovery > /proc/fs/nfsd/nfsv4recoverydir [root@crackle ~]# cat /proc/fs/nfsd/nfsv4recoverydir /var/lib/nfs/v4recovery The nfsserver script sends SM_NOTIFY for v3 clients, but the client I was using for V3 (while it did receive the SM_NOTIFY) either ignored it or failed to reclaim the lock. It will require more testing to see if SM_NOTIFY sending is correct with the nfsserver agent; my V3 client was Fedora 11 and rpc.statd 1.1.2. Upon further testing, the client received it; it just did not reclaim the lock. Created attachment 426699 [details]
nfsserver, pass 3
Looks good to me. There is a minor typo on line 50 in what I think is a user-visible string. Looks like this doesn't touch rpc_pipefs so that should avoid a lot the possible problems with the earlier script. One minor issue, not related to NFSv4:
-----------------[snip]------------------
# Rip from nfslock
ocf_log info "Stopping NFS lockd"
if killkill lockd; then
ocf_log debug "NFS lockd is stopped"
else
ocf_log err "Failed to stop NFS lockd"
return 1
fi
-----------------[snip]------------------
Sending a SIGKILL to lockd does not actually stop it -- it just makes it drop all of its locks. It looks like killkill() sends 3 SIGKILLs too, which is probably a wasted effort. One should be fine.
Yes, and in addition, prior to this block (stop_locking), we stop NFS (excluding rpc.statd) anyway. So, this entire block can be removed. http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=92a0afcd3c8f8f4ac214c8758de999c4507b7076 http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=4540f05d7bf48cef31726d9446549ffec07a7511 (and somewhat) http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=2688aaefcb107f2174d334d34dbb7fffb76ac131 QA is unable to verify this 6.0 feature until bug 633540 is fixed. Red Hat Enterprise Linux 6.0 is now available and should resolve the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. *** Bug 692221 has been marked as a duplicate of this bug. *** An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0744.html |