Bug 38395

Summary:	rpc.statd dies shortly after boot for DHCP+NIS clients
Product:	[Retired] Red Hat Linux	Reporter:	Rex Dieter <rdieter>
Component:	nfs-utils	Assignee:	Pete Zaitcev <zaitcev>
Status:	CLOSED NOTABUG	QA Contact:	David Lawrence <dkl>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	6.2
Target Milestone:	---
Target Release:	---
Hardware:	i386
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2001-05-01 20:37:19 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Rex Dieter 2001-04-30 14:32:00 UTC

Redhat 6.2, i386, nfs-utils-0.3.1-6.x.1 (though this same problem occurred 
with nfs-utils-0.2.1 too)

The rpc.statd deamon (as part of the nfslock startup init script) dies 
shortly after boot on all of our machines that are DHCP clients.  I 
believe this is because nfslock starts before the DHCP client grabs a new 
hostname/ip-address, and the change of network configuration 
confuses/kills it.

At this point, manually restarting the nfslock service fixes the problem 
(we get a non-dying rpc.statd), but this, of course, is not an ideal 
solution.

Comment 1 Bob Matthews 2001-04-30 14:43:32 UTC

Which kernel version are you running?

Comment 2 Rex Dieter 2001-04-30 15:20:20 UTC

Ah, sorry I didn't mention kernel versions...
This happened all the way back to kernel-2.2.16-3 and up to kernel-2.2.19-6.2.1

Comment 3 Rex Dieter 2001-05-01 14:27:44 UTC

Just FYI, I watched the boot process carefully this morning, and it turns out 
that the nfs (and nfslock) services do indeed start AFTER the eth0 interface 
is brought up... so my theory about the DHCP changing addresses messing up 
rpc.statd is most likely not correct.

Comment 4 Bob Matthews 2001-05-01 14:46:10 UTC

As of 2.2.19+, lockd is started automatically by the kernel as needed, so you
shouldn't be seeing the traditional "Starting NFS lockd" messages at boot time. 

Can you verify the versions of the kernel and nfs-utils with rpm -q?

Comment 5 Rex Dieter 2001-05-01 15:57:46 UTC

With the latest kernel, no, I did not see starting lockd, but I DO see this:
Starting NFS file locking services:
Starting NFS statd:    OK

Here's the specific versions of rpms you requested:
[root@... RPMS]# rpm -q nfs-utils
nfs-utils-0.3.1-0.6.x.1
[root@...RPMS]# rpm -q kernel
kernel-2.2.17-14
kernel-2.2.19-6.2.1

I might try adjusting the priority of the service so that it starts a bit later
in the boot process.  These services do get started immediately after eth0 is
brought up, and maybe it's not quite completely initialized yet or something...

Comment 6 Bob Matthews 2001-05-01 16:22:30 UTC

I wonder if you might have gotten a bad RPM for nfs-utils.  You shouldn't be
seeing the message "Starting NFS file locking services" at all with
nfs-utils-0.3.1-0.6.x.1.

Can you completely remove the nfs-utils rpm and then reinstall it?

Also, I don't think this is a problem with nfs services starting too soon after
the networking is brought up.  After you see "Starting network services...[  OK 
]", networking should be completing up and ready to run.

Comment 7 Rex Dieter 2001-05-01 17:19:15 UTC

This is the relavent portion of the init script /etc/rc.d/init.d/nfslock, and
from what I can tell, rpc.statd DOES get started regardless of kernel version. 
rpc.lockd is the part that doesn't get displayed with recent kernels (as it
should be):

start() {
        # Start daemons.
        echo "Starting NFS file locking services: "
        if [ "$KERNVER" -lt 24 -a "$KERNREL" -lt 18 ]; then
          echo -n "Starting NFS lockd: "
          daemon rpc.lockd
          echo
        fi
        echo -n "Starting NFS statd: "
        daemon rpc.statd
        RETVAL=$?
        echo
        [ $RETVAL -eq 0 ] && touch /var/lock/subsys/nfslock
        return $RETVAL
}

I verified my current nfs-utils package:
rpm --checksig nfs-utils-0.3.1-0.6.x.1.i386.rpm
nfs-utils-0.3.1-0.6.x.1.i386.rpm: md5 gpg OK

I redownloaded nfs-utils from ftp.redhat.com, and diffed it against my rpm with
no differences found.

Comment 8 Rex Dieter 2001-05-01 17:21:48 UTC

Oh, I forgot to mention, yes, I've tried removing and re-instaling nfs-utils
without a change in behavior.  As a matter of fact, I've tried it on a bunch of
machines we have here (I'd say 5 or 6 of them), and they all still exhibit this
same bad behavior of rpm.statd dying.

Comment 9 Rex Dieter 2001-05-01 18:27:44 UTC

I've been able to narrow the circumstances of rpc.statd's death.  All these
machines in question are also NIS clients.  If the nfslock service is started
before ypbind is up and running (this is what a normal boot does), this is the
scenario where rpc.statd dies.  If rpc.statd is started after ypbind, then all
is well.  I repeated this several times after booted up:
0.  service nfslock stop: OK (or failed if dead already).
1.  service ypbind stop: OK
2.  service nfslock start: OK
3.  service nfslock status: rpc.statd not running
4.  service ypbind start: OK
5.  service nfslock start: OK
6.  service nfslock status: rpc.statd (PID:xxx) running.

Now I'm even more confused.  How/why does rpc.statd depend upon NIS? and why
only for DHCP clients (as I said before static hosts are fine)?

Comment 10 Rex Dieter 2001-05-01 20:37:14 UTC

(most likely) FINAL UPDATE:  I think I can conclude that this problem was due in
large part to actions of my own.  I had lowerred the MINUID for our NIS server
(modifying /var/yp/Makefile) to 20 to accomodate our very old users with low
uid's.  In doing so, the rpcuser account created on the NIS server by the new
nfs-utils package caused this user to be distributed via NIS.  

I have an update script that runs on our client machines (in /etc/rc.d/rc.local)
which I had used to update the nfs-utils.  The %pre script portion of the
install was supposed to create a local user name rpcuser, but it failed because
this user "already exists" at this point.

This explains why the rpc.stat service failed when ypbind was not yet running
and why it worked when ypbind WAS running. 

So, at this point, my only complaint is that the installation of the nfs-utils
rpm gave me no error or indication of a mis-installation (about failing to
properly create the necessary rpcuser account).  Perhaps the %pre script in
nfs-utils that creates the rpcuser ought to be modified to squawk a little if it
fails?