Description of problem:
It seems lockd isn't reading the config file options in /etc/sysconfig/nfs
correctly. I'm presently using version nfs-utils-1.0.9-26.el5
So far just once, after upgrading we rebooted the machine. Upon reboot none of
our clients could lock files on their nfs shares.
Steps to Reproduce:
1. /etc/sysconfig/nfs contains:
rpcinfo -p contains:
100021 1 udp 32768 nlockmgr
100021 3 udp 32768 nlockmgr
100021 4 udp 32768 nlockmgr
100021 1 tcp 39476 nlockmgr
100021 3 tcp 39476 nlockmgr
100021 4 tcp 39476 nlockmgr
and sysctl -a | grep nlm contains:
fs.nfs.nlm_tcpport = 48624
fs.nfs.nlm_udpport = 48624
file locking over nfs not working.
I'd expect rpcinfo to agree with whats in sysctl -a. It appears thats not the case.
Additional info: This is the share that Fedora uses for its Buildsystem. I can
help give any testing back that might help, just let me know or stop by
irc.freenode.net in #fedora-admin
I forgot to mention this is a regression from the previous version which didn't
seem to have this issue.
This problem still seems present in 5.2.
This is caused by a change to the startup file, it used to say:
[ -n "$LOCKD_TCPPORT" ] && LOCKDARG="nlm_tcpport=$LOCKD_TCPPORT"
[ -n "$LOCKD_UDPPORT" ] && \
[ -n "$LOCKDARG" ] && \
modprobe lockd $LOCKDARG
So the startup script read LOCKD_TCPPORT and LOCKD_UDPPORT from
/etc/sysconfig/nfs and applied these as parameters to the module as it loads lockd.
The script now reads:
[ -n "$LOCKD_TCPPORT" ] && \
/sbin/sysctl -w fs.nfs.nlm_tcpport=$LOCKD_TCPPORT >/dev/null 2>&1
[ -n "$LOCKD_UDPPORT" ] && \
/sbin/sysctl -w fs.nfs.nlm_udpport=$LOCKD_UDPPORT >/dev/null 2>&1
Sadly these variables in proc do not exist until the module is actually loaded.
So these variables will not get applied when the module finally gets loaded
presumably as a dependancy on the nfs modules getting loaded.
The solution is presumably either to revert to the original way or modprobe the
lockd module in before the sysctl commands.
This is clearly a regression, so I would have thought would be urgent. This
breaks all 5.2 systems with NFS and iptables firewalls in place.
Interestingly Fedora 9, has the original form of this startup script.
Workaround for the moment is to put,
options lockd nlm_udpport=4002 nlm_tcpport=4002
I'm still replying to my own ticket here. Maybe there is more going on here than
meets the eye. When setting the "options" in /etc/modprobe.conf I described we
are still seeing problems on one system where it's locking hasn't properly
shifted to the new ports.
rpcinfo -t srv17ux01 100021
rpcinfo: RPC: Timed out
program 100021 version 0 is not available
It may have worked initially but it seems to have stopped.
Looks like there are two bugs here, the bug in the start up script as above but
also lockd stops responding after a while. Someone has logged this as bug #453094.
lockd dies whether the lockd nlm_udpport and nlm_tcpport are set at all.
Created attachment 311840 [details]
Patch for /etc/init.d/nfs - fixes lockd port assignment
Update your nfs-utils to 1.0.9-33.el5 then apply this patch to /etc/init.d/nfs
to fix the port assignment bug.
Should make your iptables firewall happy again. :o)
Still an issue in 5.2 with nfs-utils-1.0.9-35z.el5_2
Seeing the same issue here, any ETA in having this solved upstream?
Just noticed the variation of this bug on newly installed f10.
I have port 4001 assigned for lockd in /etc/sysconfig/nfs:
tcp protocol follows this directive, but udp does not:
relevant 'rpcinfo -p localhost' output:
100021 1 udp 56418 nlockmgr
100021 3 udp 56418 nlockmgr
100021 4 udp 56418 nlockmgr
100021 1 tcp 4001 nlockmgr
100021 3 tcp 4001 nlockmgr
100021 4 tcp 4001 nlockmgr
Just did the update to nfs-utils-1.1.4-2.fc10.i386 with the same result. I also see version 1.1.4-4 in koji and the changelog has no mention about it.
This is i386 architecture. Did not test on 64-bit.
Also tried to use patch from Comment #7 - no difference.
This is still a problem for me with nfs-utils-1.0.9-35z.el5_2
The workaround in comment #4 seems to work for me, though.
Fixed in nfs-utils-1.0.9-42.el5
*** Bug 474449 has been marked as a duplicate of this bug. ***
This bug hasn't reached ON_Q, so is the release of nfs-utils-1.0.9-42.el5 in the pipeline?
Is the patch attached to this bug the only fix that was applied to nfs-utils-1.0.9-42.el5 (which has not yet been released)?
The patch does not seem to be working for me under EL5. Although fs.nfs.nlm_tcpport and fs.nfs.nlm_udpport are being set, it's apparently happening after the NLM service has already bound to its ports.
Adding "modprobe lockd" to the corresponding place in /etc/init.d/nfslock (rather than /etc/init.d/nfs) seems to work better; if this hasn't been done in nfs-utils-1.0.9-42.el5, it probably should be added.
Note that on one of my servers I saw the symptom described in comment #10; this server NFS-mounts other filesystems using UDP via /etc/fstab, so I suspect that /etc/init.d/netfs (which runs after /etc/init.d/nfslock but before /etc/init.d/nfs) was causing the NLM service to start for UDP, but since there were no filesystems mounted with TCP, the NLM TCP port didn't get allocated until sometime after /etc/init.d/nfs had run and set fs.nfs.nlm_tcpport.
This observation is significant because the patch from comment #7 may appear to be effective if tested only on systems with no NFS client mounts.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.