Bug 512377 - Upgrading nfs-utils on NFSv4 server kills downstream NFSv4 clients
Summary: Upgrading nfs-utils on NFSv4 server kills downstream NFSv4 clients
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: nfs-utils
Version: 12
Hardware: All
OS: Linux
high
high
Target Milestone: ---
Assignee: Steve Dickson
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-07-17 15:28 UTC by Jeff Garzik
Modified: 2013-07-03 02:36 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 513496 (view as bug list)
Environment:
Last Closed: 2010-12-05 06:42:22 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
reverse order of version and minorversion setting (183.55 KB, patch)
2009-07-24 16:23 UTC, J. Bruce Fields
no flags Details | Diff
reverse order of version and minorversion setting (1.05 KB, patch)
2009-07-24 16:28 UTC, J. Bruce Fields
no flags Details | Diff
Kernel config (50.72 KB, text/plain)
2009-07-28 01:59 UTC, Jeff Garzik
no flags Details

Description Jeff Garzik 2009-07-17 15:28:09 UTC
Description of problem:
Last night's Fedora 11 nfs-utils upgrade broke NFSv4.

Problem machine:  Fedora 11/x86-64, NFSv4 server (kernel version below)

Attached was a lone NFSv4 client:  Fedora 10/x86-64, kernel 2.6.30 vanilla.

After yum upgraded nfs-utils, syscalls on the NFSv4 client started failing with "Protocol not supported".

On the NFSv4 server, where the nfs-utils package upgrade had just taken place, the log was filling with a steady stream of messages:
Jul 17 05:58:14 pretzel kernel: svc: 10.10.20.30, port=763: unknown version (4 for prog 100003, nfsd)
Jul 17 05:58:15 pretzel kernel: svc: 10.10.20.30, port=763: unknown version (4 for prog 100003, nfsd)
Jul 17 05:58:15 pretzel kernel: svc: 10.10.20.30, port=763: unknown version (4 for prog 100003, nfsd)
Jul 17 05:58:17 pretzel kernel: svc: 10.10.20.30, port=763: unknown version (4 for prog 100003, nfsd)
Jul 17 05:58:17 pretzel kernel: svc: 10.10.20.30, port=763: unknown version (4 for prog 100003, nfsd)

10.10.20.30 is the NFSv4 client mentioned above.


Version-Release number of selected component (if applicable):
nfs-utils-1.2.0-3.fc11.x86_64
kernel-2.6.29.5-191.fc11.x86_64


How reproducible:
unknown

Steps to Reproduce:
1. set up nfsv4 server, nfsv4 client.
2. mount server, on client
3. upgrade nfs-utils on server
  
Actual results:
see above

Expected results:
working client

Additional info:
NFSv4 client /etc/fstab line:
pretzel:/		/g			nfs4	defaults,noatime 0 0

(pretzel == the problematic NFSv4 server)

Comment 1 Jussi Eloranta 2009-07-17 16:30:45 UTC
I ran into the same problem, on 32bit system. This caused a major headache this morning :-(

Comment 2 J. Bruce Fields 2009-07-17 16:58:52 UTC
/proc/fs/nfsd/versions was extended to allow turning on/off minor versions by echoing "+4.1" or "-4.1" to /proc/fs/nsfd/versions.

Unfortunately, pre-2.6.30 kernels just stop parsing at first non-digit, so "-4.1" is interpreted as "-4".  If new nfs-utils (on old kernel) writes "+2", "+3", "+4", then "-4.1", result therefore is to turn off 4.1....

Turning off the minorversion first should work.  Or just not bothering, since the kernel leaves it off by default.

But the interface now seems more delicate than intended.  Better might be to violate the rule against changing upstream user<->kernel api's (nobody user wants 4.1 yet anyway), and add a new /proc/fs/nsfd/v4_minor_versions file instead....

Comment 3 Anthony Messina 2009-07-17 23:27:46 UTC
What is the proposed workaround for F11 until the 2.6.30 kernel is released, besides reverting to NFSv3?  This has killed my entire network with NFSv4 mounted /home directories.

I'm confused as to why this nfs-utils would be pushed to stable without the necessary kernel to support the change for NFSv4.

Comment 4 Jeff Bastian 2009-07-22 15:23:30 UTC
A colleague of mine, Sachin Prabhu, suggested removing the '-N 4.1' argument from the rpc.nfsd line in /etc/rc.d/init.d/nfs; see below.

This fixed it for me.  Of course, NOT disabling 4.1 may introduce new problems...


--- /etc/rc.d/init.d/nfs.ORIG   2009-06-11 13:13:23.000000000 -0500
+++ /etc/rc.d/init.d/nfs        2009-07-22 10:09:44.083889410 -0500
@@ -94,7 +94,7 @@

        echo -n $"Starting NFS daemon: "
        # For now, turn off the nfs41 support
-       daemon rpc.nfsd -N 4.1 $RPCNFSDARGS $RPCNFSDCOUNT
+       daemon rpc.nfsd $RPCNFSDARGS $RPCNFSDCOUNT
        RETVAL=$?
        echo
        [ $RETVAL -ne 0 ] && exit $RETVAL

Comment 5 Jeff Garzik 2009-07-23 21:44:43 UTC
This affects rawhide as well as Fedora 11.

Comment 6 Jeff Garzik 2009-07-23 22:07:19 UTC
Cloned Bug #513496 to represent the F11 version of this bug.

Comment 7 Adam Williamson 2009-07-24 15:30:22 UTC
Jeff: after reading comment #2, I'm having trouble understanding how this bug can, strictly speaking, be said to affect Rawhide. That comment suggests the problem is related to kernels below 2.6.30, and Rawhide has 2.6.31. When you say it 'affects rawhide', what do you mean exactly? In your 'affected' situation, are all nodes running Rawhide, or is it only a problem where some are running Rawhide and some F11?

If we can confirm that this issue somehow affects NFSv4 functionality in a situation where all nodes are running Rawhide, we will set this issue to block f12blocker (so it will be a release-critical bug for f12 final). If not, we won't. We are dropping it from f12alpha (release-critical bugs for f12 alpha) by consensus at today's alpha bug review meeting: even if it entirely breaks NFSv4 in Rawhide, that's not enough to block an alpha release, strictly speaking.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 8 J. Bruce Fields 2009-07-24 16:23:56 UTC
Created attachment 355062 [details]
reverse order of version and minorversion setting

By the way, I'd advise reverting b750909f50fb184cb82344d40a150f0d2760ef21
7bd86b3cfb0d929ce1dae2b937c3ac9048e23644 and c88c4091db87c0fc23ed67e76d63439b59a82369 from nfs-utils.

Alternatively, changing the order of minorversion and version setting, as in the attached patch, should fix the problem.

However I think this suggests that the interface we chose is much too difficult to use in a way that is safely backwards-compatible, and that we should just revert the patches that use the interface (and the corresponding kernel patches that implement it) and start over with a different interface.  (As the new interface is only required if you want to turn on NFSv4.1--a highly experimental feature at this point--I think removing the new interface is acceptable.)

Comment 9 J. Bruce Fields 2009-07-24 16:25:22 UTC
(I should have warned, by the way, that I haven't tested that patch.)

Comment 10 J. Bruce Fields 2009-07-24 16:28:30 UTC
Created attachment 355064 [details]
reverse order of version and minorversion setting

Argh, sorry, uploaded the wrong file; this should be it!

Comment 11 Steve Dickson 2009-07-27 16:23:06 UTC
I just did test the patch  and it does work (with a minor change) 
and I will be updating the F-11 nfs-utils shortly 

> I'm confused as to why this nfs-utils would be pushed to stable without the
> necessary kernel to support the change for NFSv4.
I am a bit confused as well... I know I tested this change before I
committed... but it appears I only tested on post 2.6.30 kernels...

Comment 12 Steve Dickson 2009-07-27 16:30:19 UTC
Jeff, I don't see this problem on rawhide... what kernel version were you using?

Comment 13 Jeff Garzik 2009-07-27 20:08:11 UTC
I am using vanilla (non-Fedora) kernel 2.6.30...

And although we don't bend over backwards to do so, we _do_ support kernels other than the absolute-latest-Fedora.  That attribute becomes important when users avoid kernel updates for various reasons - most notably when a newer kernel causes problems.

Comment 14 Steve Dickson 2009-07-28 01:39:00 UTC
> I am using vanilla (non-Fedora) kernel 2.6.30..
Well in my testing, I did not see the same problem with
post 2.6.30 kernels as I did with 2.6.29 kernels. The 
reason being is 2.6.30 and beyond kernels have the NFSv4.1
support... 

> And although we don't bend over backwards to do so, we _do_ support kernels
> other than the absolute-latest-Fedora.
I can't agree with you more... If the latest nfs-utils package does
not work with non Fedora kernel, its a problem... although I will admit
I do the majority of my testing with Rawhide kernels, since they
seem to be fairly stable...  

> That attribute becomes important when users avoid kernel updates 
> for various reasons - most notably when a newer kernel causes 
> problems.
I agreed... Believe me, the last thing I want is for any of the
packages I maintain is to become an issue for people not to 
download new kernels... actually I work hard to stay out of that
type of limelight...

What is the exact kernel you are using (including the .config file)?
I would like to try and reproduce what you are seeing...

Comment 15 Jeff Garzik 2009-07-28 01:59:55 UTC
Created attachment 355344 [details]
Kernel config

Unfortunately, my 2.6.30 kernel config has disappeared :(

This attachment is known to be the 2.6.30 kernel config + make oldconfig on 2.6.31-rc3.

This is probably not helpful to you, but it's the best I have.  Relevant NFS sections are

CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=m
CONFIG_NFS_V3=y
# CONFIG_NFS_V3_ACL is not set
CONFIG_NFS_V4=y
# CONFIG_NFS_V4_1 is not set
CONFIG_NFSD=m
CONFIG_NFSD_V2_ACL=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_V3_ACL=y
CONFIG_NFSD_V4=y
CONFIG_LOCKD=m
CONFIG_LOCKD_V4=y
CONFIG_EXPORTFS=m
CONFIG_NFS_ACL_SUPPORT=m
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=m
CONFIG_SUNRPC_GSS=m
CONFIG_RPCSEC_GSS_KRB5=m
CONFIG_RPCSEC_GSS_SPKM3=m

My local NFS settings did not change from 2.6.30

Comment 16 Steve Dickson 2009-07-28 12:41:25 UTC
I think I see what the problem is.... NFSv4.1 is not
enabled so your 2.6.30 kernel is like a 2.6.29 kernel 
with regard to NFS... And you're probably using the 
F-11 nfs-utils which explicitly turns off the 4.1 
functionality via the init script, which in turn
cause v4 to be turned of...

Comment 17 Steve Dickson 2009-07-28 12:52:44 UTC
Fixed in nfs-utils-1.2.0-8.fc12

(http://koji.fedoraproject.org/koji/buildinfo?buildID=122788)

Comment 18 Adam Williamson 2009-07-28 22:11:06 UTC
given that jeff clearly is in a non-typical situation, dropping from f12blocker. though it's obviously getting fixed anyway :)

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 19 Robert Story 2009-09-05 15:59:08 UTC
glad to see it fixed in rawhide.. how about f11, where I ran into it this week?

Comment 20 Giorgio Signorini 2009-09-10 08:08:13 UTC
I recently updated my F11 on an NFS server to

  nfs-utils-1.2.0-4.fc11.x86_64
  rpcbind-0.2.0-2.fc11.x86_64
  kernel-devel-2.6.30.5-43.fc11.x86_64

and the server stopped working. Namely, a client listed in /etc/hosts.allow BY NAME is no more be able to mount an exported fs (hosts.deny is "ALL: ALL"), while it seemed to work before the update. Everything is fixed if rpcbind is restarted with -w after server reboot.

Otherwise the only way to get things working is to list the client BY IP in hosts.allow OR in /etc/hosts, which is not very convenient

Comment 21 Giorgio Signorini 2009-09-10 09:57:48 UTC
I forgot to quote the error that appears on the NFS client:

# mount: mount to NFS server 'xxx.yyy.it' failed: RPC Error: Authentication error.

Comment 22 Bug Zapper 2009-11-16 11:00:47 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 24 Bug Zapper 2010-11-04 10:45:01 UTC
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 25 Bug Zapper 2010-12-05 06:42:22 UTC
Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.