Description of problem: If you set custom lockd ports in /etc/modprobe.conf and in /etc/sysconfig/nfs, the lockd ports are respected on bootup, but change to random ports after stopping the nfs, netfs and portmap services, and starting them again in reverse order. Version-Release number of selected component (if applicable): RHEL 4.4 How reproducible: always Steps to Reproduce: 1. add this line to /etc/modprobe.conf: options lockd nlm_udpport=4001 nlm_tcpport=4001 2. make /etc/sysconfig/nfs as follows: STATD_PORT=4000 LOCKD_TCPPORT=4001 LOCKD_UDPPORT=4001 MOUNTD_PORT=4002 RQUOTAD_PORT=762 3. turn on portmap, netfs, nfs, and nfslock services with 'service ... start' 4. enable portmap, netfs, nfs, and nfslock services at bootup with 'chkconfig ... on' 5. note ports that lockd is using with 'rpcinfo -p | grep nlockmgr'. It should look like: # rpcinfo -p | grep nlockmgr 100021 1 udp 4001 nlockmgr 100021 3 udp 4001 nlockmgr 100021 4 udp 4001 nlockmgr 100021 1 tcp 4001 nlockmgr 100021 3 tcp 4001 nlockmgr 100021 4 tcp 4001 nlockmgr which is as it should be. 6. run 'service nfs stop', 'service netfs stop', and 'service portmap stop' 7. run 'service portmap start', 'service netfs start', and 'service nfs start' 8. run 'rpcinfo -p | grep nlockmgr' Actual results: something like (actual port numbers seem randomish) 100021 1 udp 32770 nlockmgr 100021 3 udp 32770 nlockmgr 100021 4 udp 32770 nlockmgr 100021 1 tcp 32904 nlockmgr 100021 3 tcp 32904 nlockmgr 100021 4 tcp 32904 nlockmgr Expected results: 100021 1 udp 4001 nlockmgr 100021 3 udp 4001 nlockmgr 100021 4 udp 4001 nlockmgr 100021 1 tcp 4001 nlockmgr 100021 3 tcp 4001 nlockmgr 100021 4 tcp 4001 nlockmgr Additional info: This is similar to, but I think different from the bug at https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=129138 , because it does not happen with on bootup or even with the restarting of just a single service (as far as I can tell). Also, this problem seems to be still present in 2.6.15-1.2054_FC5smp , but fixed in 2.6.18-1.2868.fc6 Let me know if there is more needed.
Step 4a should be 'reboot machine'.
I just tested a little more; this problem _does occur_ on FC6 running 2.6.18-1.2868.fc6xen (but not in 2.6.18-1.2868.fc6, as I correctly state above), and strangely, does _not_ occur on FC5 running 2.6.18-1.2257.fc5. It of course does occur with 2.6.9-42.0.3.EL.
changed to 'kernel' because it seems to be a kernel issue and not an nfs-utils issue
Actually, this seems to be related to init scripts too. if we add a step 6a) sysctl -w fs.nfs.nlm_udpport=4001; sysctl -w fs.nfs.nlm_tcpport=4001 nlockmgr uses the correct ports
Just to be clear, what nfs-utils version are you using?
Hello, Steve: The systems are using nfs-utils-1.0.6-70.EL4
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
This request was previously evaluated by Red Hat Product Management for inclusion in the current Red Hat Enterprise Linux release, but Red Hat was unable to resolve it in time. This request will be reviewed for a future Red Hat Enterprise Linux release.
I think the issue here is that the "netfs" script doesn't pay any attention to the values in /etc/sysconfig/nfs. When the netfs script runs first and mounts up some NFS filesystems, it starts up lockd automatically and doesn't set the sysctl values.
Setting to 4.7, but I think this behavior spans all releases (including Fedora). I don't think this is a kernel release either, but rather an issue with initscripts. I'd like to look over it a bit more, but the simple fix might be to just have the netfs script source /etc/sysconfig/nfs and set the lockd sysctls.
...err that should read "I don't think this is a kernel issue"
Created attachment 207491 [details] patch 1 -- change nfslock script to use sysctls Starting with fedora... This patch isn't strictly needed, I suppose, but the method used to set lockd ports is pretty antiquated (I'm not actually sure that that method still works). Change it to use the sysctls instead, like the "nfs" startup script does.
Created attachment 208241 [details] patch 2 -- don't reset lockd ports on nfs server shutdown Since lockd is a shared resource with nfs mounts, resetting lockd's ports when we shut down NFS is probably a bad idea. It's not clear to me that that has any benefit. The nfslock script doesn't do it, so let's not do it in the nfs script either.
Changing to nfs-utils bug... The method used by nfslock to set ports actually still seems to be valid, but I think we should probably still move it to use sysctls. There's also no reason to explicitly plug in the lockd kernel mod, so this should take care of that too. patch 2 above seems to fix the reproducer, but I'd like a second opinion on whether leaving the nlm port sysctl set would cause any issues. Note also that the reproducer here is is going to leave the box in a possibly unworkable state as far as locking goes. It restarts the portmapper without restarting statd. lockd won't be able to monitor hosts and attempts to use nfs locking will result in either a hang until statd reregisters with the portmapper, or will return ENOLCK (see bug #204309).
To reproduce this problem, it's not necessary to restart the portmapper. Simply doing this is sufficient: # service nfs stop; service netfs stop; service netfs start; service nfs start
Steve, thoughts on these patches? If they look reasonable we should consider them for fedora first. Let me know if I need to clone this BZ for it...
They look fine...
I've posted some test nfs-utils packages with these patches on my people page: http://people.redhat.com/jlayton ...could you test them somewhere non-critical and let me know if they resolve the issue for you?
Hi Jeff, I've bounced this back to you as a NEEDINFO becauseI don't have access to a test machine right now, but I didn't want to hold up the bug because of that. Is there someone else who can test the fix? If not let me know and I'll try to find an appropriate system. Thanks
Hello, All: I applied the patches to the relevant scripts and have verified that the fix works: # rpcinfo -p | grep nlockmgr 100021 1 udp 4001 nlockmgr 100021 3 udp 4001 nlockmgr 100021 4 udp 4001 nlockmgr 100021 1 tcp 4001 nlockmgr 100021 3 tcp 4001 nlockmgr 100021 4 tcp 4001 nlockmgr # service nfs stop; service netfs stop; service netfs start; service nfs start Shutting down NFS mountd: [ OK ] Shutting down NFS daemon: [ OK ] Shutting down NFS quotas: [ OK ] Shutting down NFS services: [ OK ] Unmounting NFS filesystems: [ OK ] Mounting NFS filesystems: [ OK ] Mounting other filesystems: [ OK ] Starting NFS services: [ OK ] Starting NFS quotas: [ OK ] Starting NFS daemon: [ OK ] Starting NFS mountd: [ OK ] # rpcinfo -p | grep nlockmgr 100021 1 udp 4001 nlockmgr 100021 3 udp 4001 nlockmgr 100021 4 udp 4001 nlockmgr 100021 1 tcp 4001 nlockmgr 100021 3 tcp 4001 nlockmgr 100021 4 tcp 4001 nlockmgr I changed the status to "verified" to reflect this (I *think* this is what I'm supposed to do!) Thanks!
Thanks for testing it. Actually VERIFIED is for our QA group... (confusing eh?)
Patches committed in 1.0.6-85.EL4
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0742.html