Bug 531425

Summary: newly installed system,NFS daemon start failed after boot
Product: [Fedora] Fedora Reporter: Liam Li <lili>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: atkac, awilliam, chess, dcantrell, jlaska, notting, steved
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-11-03 05:45:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 473303    
Attachments:
Description Flags
logs, please see boot.log
none
/var/log/messages (RPCNFSDARGS="-d -s")
none
patch for some issues none

Description Liam Li 2009-10-28 09:17:38 UTC
Created attachment 366404 [details]
logs, please see boot.log

Description of problem:
Install a new system(f12 rawhide), there was a warning about "NFS daemon start failed" during boot every time

Version-Release number of selected component (if applicable):
nfs-utils-1.2.0-17.fc12

How reproducible:

always

Comment 1 James Laska 2009-10-28 20:05:14 UTC
Raising for communal review during F12Blocker meeting.  Still diagnosing the failure, but it seems that after default installs of rawhide ... nfs fails to start.

[ boot.log ]
...
Starting NFS services:  [  OK  ]
Starting NFS quotas: [  OK  ]
Starting NFS daemon: [FAILED]
Starting sendmail: [  OK  ]
Starting sm-client: [  OK  ]
[  OK  ] crond: [  OK  ]
[  OK  ] atd: [  OK  ]
...

# /etc/init.d/nfs status
rpc.mountd is stopped
nfsd is stopped
rpc.rquotad (pid 1032) is running...

# /etc/init.d/nfs restart
Shutting down NFS mountd: [FAILED]
Shutting down NFS daemon: [FAILED]
Shutting down NFS quotas: [  OK  ]
Shutting down NFS services:  [FAILED]
Starting NFS services:  [  OK  ]
Starting NFS quotas: [  OK  ]
Starting NFS daemon: [  OK  ]
Starting NFS mountd: [  OK  ]

Comment 2 Steve Dickson 2009-10-29 15:33:47 UTC
It appears getaddrinfo() is failing which is causing the daemon
to exit... still investigating...

Comment 3 Steve Dickson 2009-10-29 15:39:59 UTC
Ok... from looking at the dmesgs it appears the rpc.nfsd is starting
up before NetworkManager has brought up the network... Investigating
how to get NetwokrMaanger started early....

Comment 4 James Laska 2009-10-29 18:15:37 UTC
Created attachment 366679 [details]
/var/log/messages (RPCNFSDARGS="-d -s")

Attaching /var/log/messages on boot with

RPCNFSDARGS="-d -s"

Comment 5 Steve Dickson 2009-10-29 19:03:15 UTC
Two things stand out in that log

1) The "unable to resolve ANYADDR:nfs to inet address: Name or service not known"
   error from nfsd which means there is no interface configured

2) The fact that the nfsd logs are happening before the NetworkManager 
   logs shows nfsd is being started before NetworkManager

I've changed the nfs's initscript a number of times to try and figure
out the correct chkconfig combo... with no success...

Comment 6 Chess Griffin 2009-10-30 02:50:12 UTC
FWIW, I can confirm this bug on a F12 rawhide i386 install on my eeepc.

Comment 7 Adam Williamson 2009-10-30 20:09:21 UTC
I believe the change at the root of this is:

http://cvs.fedoraproject.org/viewvc/rpms/nfs-utils/F-12/nfs.init?r1=1.33&r2=1.34

i.e. an LSB-style init dependency block was added to the nfs service back in June (as a result of https://bugzilla.redhat.com/show_bug.cgi?id=475133 ). As well as introducing the dependencies that were desired by 475133, however, this block specifies that the nfs service should default to being on in runlevels 2-5.

I believe that prior to this change the service would have defaulted to being off, so in F11 and earlier installs, the nfs service would not be on. Hence this is the behaviour change that's exposing this race condition in F12.

Jesse suggested that the nfs service needs to be enabled for NFS mounts to work, however I don't believe this is entirely the case. If you have NFS mounts specified in /etc/fstab and start them via the 'netfs' service - which I believe is the supported method - they work without the nfs service being active.

Putting this summary in the report for Steve's consideration and recommendation. We're leaving this bug on the blocker list for now.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 8 Adam Williamson 2009-10-30 20:31:15 UTC
note that /etc/init.d/rpcsvcgssd looks to have the same issue - the chkconfig line specifies:

# chkconfig: - 19 69

which would imply it shouldn't start by default, but the LSB-style block that was added to it more recently specifies:

# Default-Start: 2 3 4 5

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 9 Steve Dickson 2009-10-30 21:16:06 UTC
First of all thank you for this analysis... definitely helps clears
thing up... but... removing the LSB-style block from init.d/nfs 
done not solve the problem... 

NFS no longer start by default, but which NFS is configured to 
start up an boot rpc.nfsd still fails because there is no 
network interface... 

Note, simply removing the following two lines on a 
rawhide machine 
    # Default-Start: 2 3 4 5
    # Default-Stop: 0 1 6

stops rpc.nfsd from starting by default but rpc.nfsd does
not fail when started a boot time..

Comment 10 Adam Williamson 2009-10-31 05:17:22 UTC
right, to be more clear: there's definitely a race condition when the nfs service is enabled, but we're not sure if there's something definitive that can be done about that with our current init system. We were evaluating this as an F12Blocker. We believe that fixing the bug which causes it to be enabled by default would be enough to stop this being a blocker, as it would then not be evident on most systems by default.

I don't think there's a way this can really be 'fixed' without a parallel init system which dynamically resolves service dependencies, and even then it's tricky, because defining when 'the network is up' (which is the relationship that's trying to be expressed by the service depending on $network) is a fairly nebulous one. But that's a fairly messy issue which isn't a blocker. nfs and rpcsvcgssd unintentionally becoming enabled by default when installed seems to be a blocker, to us.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 11 Steve Dickson 2009-11-02 12:47:49 UTC
WIth a freshly installed F-12 system I did some experimenting:

The first time I booted, the NFS service was started (and failed)

I removed the  Default-Start/Stop lines from /etc/init.d/nfs 
rebooted. The NFS service still started and still failed.

I did a 'chkconfig nfs off' and rebooted leaving the Default-Start/Stop 
lines out. NFS did not start.

I added back the Default-Start/Stop lines and rebooted. Again 
NFS did not start..

In the end I will be more than willing to remove those lines
but I'm wondering if those lines have any effect at all? Or was
my experiment askew  


I also notice I might have the chkconfig: line wrong.
I have 
    'chkconfig: - 60 20'
But I thinking it should be
     chkconfig: - 20 60

Correct?

Comment 12 Bill Nottingham 2009-11-02 16:30:05 UTC
Created attachment 367160 [details]
patch for some issues

Looking at it, there's definitely issues with the use of 'Default-Start' keyword. As stated in the guideline (https://fedoraproject.org/wiki/Packaging/SysVInitScript#.23_Default-Start:_line):

...
For example, if a service starts by default in runlevels 3, 4, and 5 only, the LSB Header in the initscript would specify:

# Default-Start: 3 4 5

More commonly, the service does not start by default in any runlevel. In this case, the line should be omitted.
...

This patch fixes the scripts, so that the Default-Start line matches the chkconfig: line; this is needed so that the nfs service doesn't start for everyone, for example.

As for it failing with no network interface, I don't think that's a regression;  I think a install using NetworkManager would, in may cases, have failed this way for many releases.

The workaround for this is to set NETWORKWAIT=yes in /etc/sysconfig/network; this makes the NM startup synchronous. Of course, if the cable is disconnected, or you're only using wireless with WPA, it will still fail.

Comment 13 Adam Williamson 2009-11-02 17:34:22 UTC
Steve: your experiment was somewhat askew, yes. The Default-Start/Stop lines are not read dynamically at every boot. They're used only when the actual mapping of initscripts into /etc/rc*.d is being done or re-done. This isn't done automatically at every boot, or anything; it's done once when the system is installed, and then after that only if something intentionally makes it happen using chkconfig.

To get down to brass tacks, to have the system re-read the default settings and set the nfs service to whatever it thinks the default is, do:

chkconfig nfs reset

you should've added that step in between each modification of the nfs initscript. Of course, one upshot of this is that fixing the initscript won't fix this problem on any already-installed system, the nfs service will remain enabled. It will prevent it from happening on new installs, though.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 14 Steve Dickson 2009-11-02 20:24:09 UTC
Ah... I see 'chkconfig nfs reset' is the magic command that
I missed.. Things are working as expected..... 

So to get this bug of the blocker list I need remove the
Default-Start/Stop lines, not just fix them as Bill's 
patch does... correct?

Comment 15 Adam Williamson 2009-11-02 20:45:13 UTC
Yes, the lines should be removed entirely. This would match the chkconfig line which was previously used, as it specifies that the services should not be started by default at any runlevel. Bill's patch would cause them to be started by default at runlevels 3, 4 and 5, which we do not want. If you remove the lines from both the /etc/init.d files I identified and get builds with that fix tagged, that will take this bug off the blocker list. Thanks!

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 16 Bill Nottingham 2009-11-02 21:31:04 UTC
(In reply to comment #15)
> Yes, the lines should be removed entirely. This would match the chkconfig line
> which was previously used, as it specifies that the services should not be
> started by default at any runlevel. Bill's patch would cause them to be started
> by default at runlevels 3, 4 and 5, which we do not want. If you remove the
> lines from both the /etc/init.d files I identified and get builds with that fix
> tagged, that will take this bug off the blocker list. Thanks!

Please read the patch in context - it fixes the files correctly. Some of the modified files *were* already default-start items.

Comment 17 Adam Williamson 2009-11-02 21:50:59 UTC
sorry, you're quite right, I was mis-reading the patch. steve, bill's patch does indeed do everything right, sorry again bill!

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 18 Steve Dickson 2009-11-02 22:45:49 UTC
Fixed in nfs-utils-1.2.0-18.fc12
(Build: http://koji.fedoraproject.org/koji/taskinfo?taskID=1783956)

Do I need to open a ticket with Fedora Release Engineering 
(aka https://fedorahosted.org/rel-eng/) to get this accepted?

Comment 19 Adam Williamson 2009-11-02 23:13:49 UTC
Yes, indeed, a tag request is required. Remember you can use 'make tag-request' to do it. Thanks!

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 20 James Laska 2009-11-03 00:56:19 UTC
Tested http://koji.fedoraproject.org/koji/taskinfo?taskID=1783956 and confirmed the fix.

Starting HAL daemon: [  OK  ]
Starting PC/SC smart card daemon (pcscd): [  OK  ]
Retrigger failed udev events[  OK  ]
Setting network parameters... [  OK  ]
Starting NetworkManager daemon: [  OK  ]
Starting NFS statd: [  OK  ]
Starting RPC idmapd: [  OK  ]
Starting sshd: [  OK  ]
Starting sendmail: [  OK  ]
Starting sm-client: [  OK  ]
[  OK  ] crond: [  OK  ]
[  OK  ] atd: [  OK  ]

Fedora release 11.92 (Rawhide)
Kernel 2.6.31.5-96.fc12.x86_64 on an x86_64 (/dev/ttyS0)

Comment 21 Adam Williamson 2009-11-03 01:02:12 UTC
[root@adam adamw]# chkconfig nfs reset
[root@adam adamw]# chkconfig --list nfs
nfs            	0:off	1:off	2:on	3:on	4:on	5:on	6:off
[root@adam adamw]# rpm -Uvh Download/nfs-utils-1.2.0-18.fc12.x86_64.rpm 
Preparing...                ########################################### [100%]
   1:nfs-utils              ########################################### [100%]
[root@adam adamw]# chkconfig nfs reset
[root@adam adamw]# chkconfig --list nfs
nfs            	0:off	1:off	2:off	3:off	4:off	5:off	6:off

looks good to me!

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 22 Jesse Keating 2009-11-03 05:45:59 UTC
This has been tagged for f12-final, closing.

Comment 23 Steve Dickson 2009-11-12 15:31:07 UTC
*** Bug 511814 has been marked as a duplicate of this bug. ***