Bug 220798 - LOCKD_TCPPORT and LOCKD_UDPPORT not respected after service restarts
Summary: LOCKD_TCPPORT and LOCKD_UDPPORT not respected after service restarts
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: nfs-utils
Version: 4.4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Jeff Layton
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 313661
TreeView+ depends on / blocked
 
Reported: 2006-12-27 04:14 UTC by J Robinson
Modified: 2008-07-24 20:00 UTC (History)
3 users (show)

Fixed In Version: RHBA-2008-0742
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-07-24 20:00:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch 1 -- change nfslock script to use sysctls (877 bytes, patch)
2007-09-26 20:26 UTC, Jeff Layton
no flags Details | Diff
patch 2 -- don't reset lockd ports on nfs server shutdown (703 bytes, patch)
2007-09-27 11:06 UTC, Jeff Layton
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0742 0 normal SHIPPED_LIVE nfs-utils bug fix update 2008-07-23 16:47:23 UTC

Description J Robinson 2006-12-27 04:14:51 UTC
Description of problem:

If you set custom lockd ports in /etc/modprobe.conf and in /etc/sysconfig/nfs,
the lockd ports are respected on bootup, but change to random ports after
stopping the nfs, netfs and portmap services, and starting them again in reverse
order.

Version-Release number of selected component (if applicable):

RHEL 4.4


How reproducible: 
always



Steps to Reproduce:
1. 
add this line to /etc/modprobe.conf: 
options lockd nlm_udpport=4001 nlm_tcpport=4001

2. 
make /etc/sysconfig/nfs as follows:
STATD_PORT=4000
LOCKD_TCPPORT=4001
LOCKD_UDPPORT=4001
MOUNTD_PORT=4002
RQUOTAD_PORT=762

3. turn on portmap, netfs, nfs, and nfslock services with 'service ... start'

4. enable portmap, netfs, nfs, and nfslock services at bootup with 'chkconfig
... on'

5. note ports that lockd is using with 'rpcinfo -p | grep nlockmgr'. It should
look like:
    # rpcinfo -p | grep nlockmgr
    100021    1   udp   4001  nlockmgr
    100021    3   udp   4001  nlockmgr
    100021    4   udp   4001  nlockmgr
    100021    1   tcp   4001  nlockmgr
    100021    3   tcp   4001  nlockmgr
    100021    4   tcp   4001  nlockmgr
which is as it should be.

6. run 'service nfs stop', 'service netfs stop', and 'service portmap stop'

7. run 'service portmap start', 'service netfs start', and 'service nfs start'

8. run 'rpcinfo -p | grep nlockmgr'
 
Actual results:

something like (actual port numbers seem randomish)

    100021    1   udp  32770  nlockmgr
    100021    3   udp  32770  nlockmgr
    100021    4   udp  32770  nlockmgr
    100021    1   tcp  32904  nlockmgr
    100021    3   tcp  32904  nlockmgr
    100021    4   tcp  32904  nlockmgr

Expected results:
    100021    1   udp   4001  nlockmgr
    100021    3   udp   4001  nlockmgr
    100021    4   udp   4001  nlockmgr
    100021    1   tcp   4001  nlockmgr
    100021    3   tcp   4001  nlockmgr
    100021    4   tcp   4001  nlockmgr

Additional info:

This is similar to, but I think different from the bug at 
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=129138 , because it does
not happen with on bootup or even with the restarting of just a single service
(as far as I can tell).
Also, this problem seems to be still present in 2.6.15-1.2054_FC5smp ,
but fixed in 2.6.18-1.2868.fc6

Let me know if there is more needed.

Comment 1 J Robinson 2006-12-27 04:16:54 UTC
Step 4a should be 'reboot machine'. 

Comment 2 J Robinson 2006-12-27 21:33:05 UTC
I just tested a little more; this problem _does occur_ on FC6 running
2.6.18-1.2868.fc6xen (but not in 2.6.18-1.2868.fc6, as I correctly state above),
and strangely, does _not_ occur on FC5 running  2.6.18-1.2257.fc5. 

It of course does occur with 2.6.9-42.0.3.EL.

Comment 3 J Robinson 2006-12-27 21:40:41 UTC
changed to 'kernel' because it seems to be a kernel issue and not an nfs-utils issue

Comment 4 J Robinson 2006-12-29 17:14:11 UTC
Actually, this seems to be related to init scripts too. if 
we add a step 

6a)
sysctl -w fs.nfs.nlm_udpport=4001; sysctl -w fs.nfs.nlm_tcpport=4001

nlockmgr uses the correct ports

Comment 5 Steve Dickson 2007-01-02 15:00:10 UTC
Just to be clear, what nfs-utils version are you using?

Comment 6 J Robinson 2007-01-03 07:26:30 UTC
Hello, Steve:

The systems are using

nfs-utils-1.0.6-70.EL4

Comment 9 RHEL Program Management 2007-05-09 08:28:54 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 10 RHEL Program Management 2007-09-07 19:38:15 UTC
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.

Comment 11 Jeff Layton 2007-09-14 20:53:48 UTC
I think the issue here is that the "netfs" script doesn't pay any attention to
the values in /etc/sysconfig/nfs. When the netfs script runs first and mounts up
some NFS filesystems, it starts up lockd automatically and doesn't set the
sysctl values.


Comment 12 Jeff Layton 2007-09-26 20:13:59 UTC
Setting to 4.7, but I think this behavior spans all releases (including Fedora).
I don't think this is a kernel release either, but rather an issue with
initscripts. I'd like to look over it a bit more, but the simple fix might be to
just have the netfs script source /etc/sysconfig/nfs and set the lockd sysctls.

Comment 13 Jeff Layton 2007-09-26 20:15:12 UTC
...err that should read "I don't think this is a kernel issue"


Comment 14 Jeff Layton 2007-09-26 20:26:41 UTC
Created attachment 207491 [details]
patch 1 -- change nfslock script to use sysctls

Starting with fedora...

This patch isn't strictly needed, I suppose, but the method used to set lockd
ports is pretty antiquated (I'm not actually sure that that method still
works). Change it to use the sysctls instead, like the "nfs" startup script
does.

Comment 15 Jeff Layton 2007-09-27 11:06:47 UTC
Created attachment 208241 [details]
patch 2 -- don't reset lockd ports on nfs server shutdown

Since lockd is a shared resource with nfs mounts, resetting lockd's ports when
we shut down NFS is probably a bad idea. It's not clear to me that that has any
benefit. The nfslock script doesn't do it, so let's not do it in the nfs script
either.

Comment 16 Jeff Layton 2007-09-27 11:27:14 UTC
Changing to nfs-utils bug...

The method used by nfslock to set ports actually still seems to be valid, but I
think we should probably still move it to use sysctls. There's also no reason to
explicitly plug in the lockd kernel mod, so this should take care of that too.

patch 2 above seems to fix the reproducer, but I'd like a second opinion on
whether leaving the nlm port sysctl set would cause any issues.

Note also that the reproducer here is is going to leave the box in a possibly
unworkable state as far as locking goes. It restarts the portmapper without
restarting statd. lockd won't be able to monitor hosts and attempts to use nfs
locking will result in either a hang until statd reregisters with the
portmapper, or will return ENOLCK (see bug #204309).


Comment 17 Jeff Layton 2007-09-27 11:48:48 UTC
To reproduce this problem, it's not necessary to restart the portmapper. Simply
doing this is sufficient:

# service nfs stop; service netfs stop; service netfs start; service nfs start


Comment 19 Jeff Layton 2007-09-29 00:21:49 UTC
Steve, thoughts on these patches? If they look reasonable we should consider
them for fedora first. Let me know if I need to clone this BZ for it...


Comment 20 Steve Dickson 2007-10-01 08:27:30 UTC
They look fine... 

Comment 21 Jeff Layton 2007-10-03 18:30:07 UTC
I've posted some test nfs-utils packages with these patches on my people page:

http://people.redhat.com/jlayton

...could you test them somewhere non-critical and let me know if they resolve
the issue for you?


Comment 22 J Robinson 2007-10-11 21:52:15 UTC
Hi Jeff,

I've bounced this back to you as a NEEDINFO becauseI don't have access to a test
machine right now, but I didn't want to hold up the bug because of that. 

Is there someone else who can test the fix? If not let me know and I'll try to
find an appropriate system.

Thanks

Comment 23 J Robinson 2007-10-21 17:29:19 UTC
Hello, All:

I applied the patches to the relevant scripts and have verified that the fix works: 

# rpcinfo -p | grep nlockmgr
    100021    1   udp   4001  nlockmgr
    100021    3   udp   4001  nlockmgr
    100021    4   udp   4001  nlockmgr
    100021    1   tcp   4001  nlockmgr
    100021    3   tcp   4001  nlockmgr
    100021    4   tcp   4001  nlockmgr
                                          
# service nfs stop; service netfs stop; service netfs start; service nfs start
Shutting down NFS mountd:                                  [  OK  ]
Shutting down NFS daemon:                                  [  OK  ]
Shutting down NFS quotas:                                  [  OK  ]
Shutting down NFS services:                                [  OK  ]
Unmounting NFS filesystems:                                [  OK  ]
Mounting NFS filesystems:                                  [  OK  ]
Mounting other filesystems:                                [  OK  ]
Starting NFS services:                                     [  OK  ]
Starting NFS quotas:                                       [  OK  ]
Starting NFS daemon:                                       [  OK  ]
Starting NFS mountd:                                       [  OK  ]

# rpcinfo -p | grep nlockmgr
    100021    1   udp   4001  nlockmgr
    100021    3   udp   4001  nlockmgr
    100021    4   udp   4001  nlockmgr
    100021    1   tcp   4001  nlockmgr
    100021    3   tcp   4001  nlockmgr
    100021    4   tcp   4001  nlockmgr

I changed the status to  "verified" to reflect this (I *think* this is what I'm
supposed to do!)

Thanks!

Comment 24 Jeff Layton 2007-10-21 20:19:42 UTC
Thanks for testing it. Actually VERIFIED is for our QA group... (confusing eh?)


Comment 25 Jeff Layton 2007-11-05 13:26:39 UTC
Patches committed in 1.0.6-85.EL4

Comment 30 errata-xmlrpc 2008-07-24 20:00:08 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0742.html


Note You need to log in before you can comment on or make changes to this bug.