Bug 531425
Summary: | newly installed system,NFS daemon start failed after boot | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Liam Li <lili> | ||||||||
Component: | nfs-utils | Assignee: | Steve Dickson <steved> | ||||||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | low | ||||||||||
Version: | rawhide | CC: | atkac, awilliam, chess, dcantrell, jlaska, notting, steved | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2009-11-03 05:45:59 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 473303 | ||||||||||
Attachments: |
|
Raising for communal review during F12Blocker meeting. Still diagnosing the failure, but it seems that after default installs of rawhide ... nfs fails to start. [ boot.log ] ... Starting NFS services: [ OK ] Starting NFS quotas: [ OK ] Starting NFS daemon: [FAILED] Starting sendmail: [ OK ] Starting sm-client: [ OK ] [ OK ] crond: [ OK ] [ OK ] atd: [ OK ] ... # /etc/init.d/nfs status rpc.mountd is stopped nfsd is stopped rpc.rquotad (pid 1032) is running... # /etc/init.d/nfs restart Shutting down NFS mountd: [FAILED] Shutting down NFS daemon: [FAILED] Shutting down NFS quotas: [ OK ] Shutting down NFS services: [FAILED] Starting NFS services: [ OK ] Starting NFS quotas: [ OK ] Starting NFS daemon: [ OK ] Starting NFS mountd: [ OK ] It appears getaddrinfo() is failing which is causing the daemon to exit... still investigating... Ok... from looking at the dmesgs it appears the rpc.nfsd is starting up before NetworkManager has brought up the network... Investigating how to get NetwokrMaanger started early.... Created attachment 366679 [details]
/var/log/messages (RPCNFSDARGS="-d -s")
Attaching /var/log/messages on boot with
RPCNFSDARGS="-d -s"
Two things stand out in that log 1) The "unable to resolve ANYADDR:nfs to inet address: Name or service not known" error from nfsd which means there is no interface configured 2) The fact that the nfsd logs are happening before the NetworkManager logs shows nfsd is being started before NetworkManager I've changed the nfs's initscript a number of times to try and figure out the correct chkconfig combo... with no success... FWIW, I can confirm this bug on a F12 rawhide i386 install on my eeepc. I believe the change at the root of this is: http://cvs.fedoraproject.org/viewvc/rpms/nfs-utils/F-12/nfs.init?r1=1.33&r2=1.34 i.e. an LSB-style init dependency block was added to the nfs service back in June (as a result of https://bugzilla.redhat.com/show_bug.cgi?id=475133 ). As well as introducing the dependencies that were desired by 475133, however, this block specifies that the nfs service should default to being on in runlevels 2-5. I believe that prior to this change the service would have defaulted to being off, so in F11 and earlier installs, the nfs service would not be on. Hence this is the behaviour change that's exposing this race condition in F12. Jesse suggested that the nfs service needs to be enabled for NFS mounts to work, however I don't believe this is entirely the case. If you have NFS mounts specified in /etc/fstab and start them via the 'netfs' service - which I believe is the supported method - they work without the nfs service being active. Putting this summary in the report for Steve's consideration and recommendation. We're leaving this bug on the blocker list for now. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers note that /etc/init.d/rpcsvcgssd looks to have the same issue - the chkconfig line specifies: # chkconfig: - 19 69 which would imply it shouldn't start by default, but the LSB-style block that was added to it more recently specifies: # Default-Start: 2 3 4 5 -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers First of all thank you for this analysis... definitely helps clears thing up... but... removing the LSB-style block from init.d/nfs done not solve the problem... NFS no longer start by default, but which NFS is configured to start up an boot rpc.nfsd still fails because there is no network interface... Note, simply removing the following two lines on a rawhide machine # Default-Start: 2 3 4 5 # Default-Stop: 0 1 6 stops rpc.nfsd from starting by default but rpc.nfsd does not fail when started a boot time.. right, to be more clear: there's definitely a race condition when the nfs service is enabled, but we're not sure if there's something definitive that can be done about that with our current init system. We were evaluating this as an F12Blocker. We believe that fixing the bug which causes it to be enabled by default would be enough to stop this being a blocker, as it would then not be evident on most systems by default. I don't think there's a way this can really be 'fixed' without a parallel init system which dynamically resolves service dependencies, and even then it's tricky, because defining when 'the network is up' (which is the relationship that's trying to be expressed by the service depending on $network) is a fairly nebulous one. But that's a fairly messy issue which isn't a blocker. nfs and rpcsvcgssd unintentionally becoming enabled by default when installed seems to be a blocker, to us. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers WIth a freshly installed F-12 system I did some experimenting: The first time I booted, the NFS service was started (and failed) I removed the Default-Start/Stop lines from /etc/init.d/nfs rebooted. The NFS service still started and still failed. I did a 'chkconfig nfs off' and rebooted leaving the Default-Start/Stop lines out. NFS did not start. I added back the Default-Start/Stop lines and rebooted. Again NFS did not start.. In the end I will be more than willing to remove those lines but I'm wondering if those lines have any effect at all? Or was my experiment askew I also notice I might have the chkconfig: line wrong. I have 'chkconfig: - 60 20' But I thinking it should be chkconfig: - 20 60 Correct? Created attachment 367160 [details] patch for some issues Looking at it, there's definitely issues with the use of 'Default-Start' keyword. As stated in the guideline (https://fedoraproject.org/wiki/Packaging/SysVInitScript#.23_Default-Start:_line): ... For example, if a service starts by default in runlevels 3, 4, and 5 only, the LSB Header in the initscript would specify: # Default-Start: 3 4 5 More commonly, the service does not start by default in any runlevel. In this case, the line should be omitted. ... This patch fixes the scripts, so that the Default-Start line matches the chkconfig: line; this is needed so that the nfs service doesn't start for everyone, for example. As for it failing with no network interface, I don't think that's a regression; I think a install using NetworkManager would, in may cases, have failed this way for many releases. The workaround for this is to set NETWORKWAIT=yes in /etc/sysconfig/network; this makes the NM startup synchronous. Of course, if the cable is disconnected, or you're only using wireless with WPA, it will still fail. Steve: your experiment was somewhat askew, yes. The Default-Start/Stop lines are not read dynamically at every boot. They're used only when the actual mapping of initscripts into /etc/rc*.d is being done or re-done. This isn't done automatically at every boot, or anything; it's done once when the system is installed, and then after that only if something intentionally makes it happen using chkconfig. To get down to brass tacks, to have the system re-read the default settings and set the nfs service to whatever it thinks the default is, do: chkconfig nfs reset you should've added that step in between each modification of the nfs initscript. Of course, one upshot of this is that fixing the initscript won't fix this problem on any already-installed system, the nfs service will remain enabled. It will prevent it from happening on new installs, though. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers Ah... I see 'chkconfig nfs reset' is the magic command that I missed.. Things are working as expected..... So to get this bug of the blocker list I need remove the Default-Start/Stop lines, not just fix them as Bill's patch does... correct? Yes, the lines should be removed entirely. This would match the chkconfig line which was previously used, as it specifies that the services should not be started by default at any runlevel. Bill's patch would cause them to be started by default at runlevels 3, 4 and 5, which we do not want. If you remove the lines from both the /etc/init.d files I identified and get builds with that fix tagged, that will take this bug off the blocker list. Thanks! -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers (In reply to comment #15) > Yes, the lines should be removed entirely. This would match the chkconfig line > which was previously used, as it specifies that the services should not be > started by default at any runlevel. Bill's patch would cause them to be started > by default at runlevels 3, 4 and 5, which we do not want. If you remove the > lines from both the /etc/init.d files I identified and get builds with that fix > tagged, that will take this bug off the blocker list. Thanks! Please read the patch in context - it fixes the files correctly. Some of the modified files *were* already default-start items. sorry, you're quite right, I was mis-reading the patch. steve, bill's patch does indeed do everything right, sorry again bill! -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers Fixed in nfs-utils-1.2.0-18.fc12 (Build: http://koji.fedoraproject.org/koji/taskinfo?taskID=1783956) Do I need to open a ticket with Fedora Release Engineering (aka https://fedorahosted.org/rel-eng/) to get this accepted? Yes, indeed, a tag request is required. Remember you can use 'make tag-request' to do it. Thanks! -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers Tested http://koji.fedoraproject.org/koji/taskinfo?taskID=1783956 and confirmed the fix. Starting HAL daemon: [ OK ] Starting PC/SC smart card daemon (pcscd): [ OK ] Retrigger failed udev events[ OK ] Setting network parameters... [ OK ] Starting NetworkManager daemon: [ OK ] Starting NFS statd: [ OK ] Starting RPC idmapd: [ OK ] Starting sshd: [ OK ] Starting sendmail: [ OK ] Starting sm-client: [ OK ] [ OK ] crond: [ OK ] [ OK ] atd: [ OK ] Fedora release 11.92 (Rawhide) Kernel 2.6.31.5-96.fc12.x86_64 on an x86_64 (/dev/ttyS0) [root@adam adamw]# chkconfig nfs reset [root@adam adamw]# chkconfig --list nfs nfs 0:off 1:off 2:on 3:on 4:on 5:on 6:off [root@adam adamw]# rpm -Uvh Download/nfs-utils-1.2.0-18.fc12.x86_64.rpm Preparing... ########################################### [100%] 1:nfs-utils ########################################### [100%] [root@adam adamw]# chkconfig nfs reset [root@adam adamw]# chkconfig --list nfs nfs 0:off 1:off 2:off 3:off 4:off 5:off 6:off looks good to me! -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers This has been tagged for f12-final, closing. *** Bug 511814 has been marked as a duplicate of this bug. *** |
Created attachment 366404 [details] logs, please see boot.log Description of problem: Install a new system(f12 rawhide), there was a warning about "NFS daemon start failed" during boot every time Version-Release number of selected component (if applicable): nfs-utils-1.2.0-17.fc12 How reproducible: always