Bug 313661

Summary: LOCKD_TCPPORT and LOCKD_UDPPORT not respected after service restarts
Product: [Fedora] Fedora Reporter: Jeff Layton <jlayton>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: herrold, k.georgiou, staubach, steved
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-01-24 21:29:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 220798    
Bug Blocks:    

Description Jeff Layton 2007-10-01 10:48:42 UTC
+++ This bug was initially created as a clone of Bug #220798 +++

Description of problem:

If you set custom lockd ports in /etc/modprobe.conf and in /etc/sysconfig/nfs,
the lockd ports are respected on bootup, but change to random ports after
stopping the nfs, netfs and portmap services, and starting them again in reverse
order.

Version-Release number of selected component (if applicable):

RHEL 4.4


How reproducible: 
always



Steps to Reproduce:
1. 
add this line to /etc/modprobe.conf: 
options lockd nlm_udpport=4001 nlm_tcpport=4001

2. 
make /etc/sysconfig/nfs as follows:
STATD_PORT=4000
LOCKD_TCPPORT=4001
LOCKD_UDPPORT=4001
MOUNTD_PORT=4002
RQUOTAD_PORT=762

3. turn on portmap, netfs, nfs, and nfslock services with 'service ... start'

4. enable portmap, netfs, nfs, and nfslock services at bootup with 'chkconfig
... on'

5. note ports that lockd is using with 'rpcinfo -p | grep nlockmgr'. It should
look like:
    # rpcinfo -p | grep nlockmgr
    100021    1   udp   4001  nlockmgr
    100021    3   udp   4001  nlockmgr
    100021    4   udp   4001  nlockmgr
    100021    1   tcp   4001  nlockmgr
    100021    3   tcp   4001  nlockmgr
    100021    4   tcp   4001  nlockmgr
which is as it should be.

6. run 'service nfs stop', 'service netfs stop', and 'service portmap stop'

7. run 'service portmap start', 'service netfs start', and 'service nfs start'

8. run 'rpcinfo -p | grep nlockmgr'
 
Actual results:

something like (actual port numbers seem randomish)

    100021    1   udp  32770  nlockmgr
    100021    3   udp  32770  nlockmgr
    100021    4   udp  32770  nlockmgr
    100021    1   tcp  32904  nlockmgr
    100021    3   tcp  32904  nlockmgr
    100021    4   tcp  32904  nlockmgr

Expected results:
    100021    1   udp   4001  nlockmgr
    100021    3   udp   4001  nlockmgr
    100021    4   udp   4001  nlockmgr
    100021    1   tcp   4001  nlockmgr
    100021    3   tcp   4001  nlockmgr
    100021    4   tcp   4001  nlockmgr

Additional info:

This is similar to, but I think different from the bug at 
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=129138 , because it does
not happen with on bootup or even with the restarting of just a single service
(as far as I can tell).
Also, this problem seems to be still present in 2.6.15-1.2054_FC5smp ,
but fixed in 2.6.18-1.2868.fc6

Let me know if there is more needed.

-- Additional comment from jrobinson852 on 2006-12-26 23:16 EST --
Step 4a should be 'reboot machine'. 

-- Additional comment from jrobinson852 on 2006-12-27 16:33 EST --
I just tested a little more; this problem _does occur_ on FC6 running
2.6.18-1.2868.fc6xen (but not in 2.6.18-1.2868.fc6, as I correctly state above),
and strangely, does _not_ occur on FC5 running  2.6.18-1.2257.fc5. 

It of course does occur with 2.6.9-42.0.3.EL.

-- Additional comment from jrobinson852 on 2006-12-27 16:40 EST --
changed to 'kernel' because it seems to be a kernel issue and not an nfs-utils issue

-- Additional comment from jrobinson852 on 2006-12-29 12:14 EST --
Actually, this seems to be related to init scripts too. if 
we add a step 

6a)
sysctl -w fs.nfs.nlm_udpport=4001; sysctl -w fs.nfs.nlm_tcpport=4001

nlockmgr uses the correct ports

-- Additional comment from steved on 2007-01-02 10:00 EST --
Just to be clear, what nfs-utils version are you using?

-- Additional comment from jrobinson852 on 2007-01-03 02:26 EST --
Hello, Steve:

The systems are using

nfs-utils-1.0.6-70.EL4

-- Additional comment from krafthef on 2007-01-05 14:58 EST --
pm nack: still under review and this came in from a random source.

-- Additional comment from krafthef on 2007-01-05 14:59 EST --
sorry, rescinding nack; moving to 4.6.

-- Additional comment from pm-rhel on 2007-05-09 04:28 EST --
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

-- Additional comment from pm-rhel on 2007-09-07 15:38 EST --
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.

-- Additional comment from jlayton on 2007-09-14 16:53 EST --
I think the issue here is that the "netfs" script doesn't pay any attention to
the values in /etc/sysconfig/nfs. When the netfs script runs first and mounts up
some NFS filesystems, it starts up lockd automatically and doesn't set the
sysctl values.


-- Additional comment from jlayton on 2007-09-26 16:13 EST --
Setting to 4.7, but I think this behavior spans all releases (including Fedora).
I don't think this is a kernel release either, but rather an issue with
initscripts. I'd like to look over it a bit more, but the simple fix might be to
just have the netfs script source /etc/sysconfig/nfs and set the lockd sysctls.

-- Additional comment from jlayton on 2007-09-26 16:15 EST --
...err that should read "I don't think this is a kernel issue"


-- Additional comment from jlayton on 2007-09-26 16:26 EST --
Created an attachment (id=207491)
patch 1 -- change nfslock script to use sysctls

Starting with fedora...

This patch isn't strictly needed, I suppose, but the method used to set lockd
ports is pretty antiquated (I'm not actually sure that that method still
works). Change it to use the sysctls instead, like the "nfs" startup script
does.


-- Additional comment from jlayton on 2007-09-27 07:06 EST --
Created an attachment (id=208241)
patch 2 -- don't reset lockd ports on nfs server shutdown

Since lockd is a shared resource with nfs mounts, resetting lockd's ports when
we shut down NFS is probably a bad idea. It's not clear to me that that has any
benefit. The nfslock script doesn't do it, so let's not do it in the nfs script
either.


-- Additional comment from jlayton on 2007-09-27 07:27 EST --
Changing to nfs-utils bug...

The method used by nfslock to set ports actually still seems to be valid, but I
think we should probably still move it to use sysctls. There's also no reason to
explicitly plug in the lockd kernel mod, so this should take care of that too.

patch 2 above seems to fix the reproducer, but I'd like a second opinion on
whether leaving the nlm port sysctl set would cause any issues.

Note also that the reproducer here is is going to leave the box in a possibly
unworkable state as far as locking goes. It restarts the portmapper without
restarting statd. lockd won't be able to monitor hosts and attempts to use nfs
locking will result in either a hang until statd reregisters with the
portmapper, or will return ENOLCK (see bug #204309).


-- Additional comment from jlayton on 2007-09-27 07:48 EST --
To reproduce this problem, it's not necessary to restart the portmapper. Simply
doing this is sufficient:

# service nfs stop; service netfs stop; service netfs start; service nfs start


-- Additional comment from jlayton on 2007-09-27 07:50 EST --
Steve/Peter any thoughts on the patches in comment #14 and comment #15?

If they look ok, I think these should to go into fedora before we do anything in
RHEL...


-- Additional comment from jlayton on 2007-09-28 20:21 EST --
Steve, thoughts on these patches? If they look reasonable we should consider
them for fedora first. Let me know if I need to clone this BZ for it...


-- Additional comment from steved on 2007-10-01 04:27 EST --
They look fine...

Comment 1 Jeff Layton 2007-10-01 10:51:09 UTC
Going ahead and cloning this for Fedora. I'd like to get this into there soon so
that it has some soak time there before we pull it into RHEL. The changes look
pretty innocuous though...