Bug 322101

Summary: service ypbind start fails: checking rpcbind too soon after start
Product: [Fedora] Fedora Reporter: david.hagood
Component: ypbindAssignee: Steve Dickson <steved>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: rawhideCC: adler, jdeslip, orchard, vendor-redhat
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-01-11 22:29:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description david.hagood 2007-10-07 13:31:24 UTC
Description of problem:
The ypbind service is not starting, due to the fact that the ypbind init.d
script is checking for ypbind being registered with rpcbind too quickly after
starting it.

Here's the offending section of the script: numbers at the start of the line
refer to the analysis below:
----------------------------
  echo -n $"Starting NIS service: "
	selinux_on
[1]	daemon ypbind $OTHER_YPBIND_OPTS
	RETVAL=$?
	echo
	if [ $RETVAL -ne 0 ]; then
	    selinux_off
	    logger -t ypbind "failed to start!"
	    return $RETVAL
	fi
	echo -n $"Binding NIS service: "
	# the following fixes problems with the init scripts continuing
	# even when we are really not bound yet to a server, and then things
	# that need NIS fail.
	timeout=$NISTIMEOUT
	while [ $timeout -gt 0 ]; do
[2]	    /usr/sbin/rpcinfo -p | LC_ALL=C fgrep -q ypbind && \
			/usr/bin/ypwhich > /dev/null 2>&1
	    RETVAL=$?
	    if [ $RETVAL -eq 0 ]; then
		break;
	    fi
	    echo -n "..."
	    # ypwhich has a hardcode 15sec timeout
	    # so subtract that from NISTIMEOUT to
	    # to see of we should continue to wait
[3]	    timeout=`expr $timeout - 15` 
	done
-------------
[1] ypbind is started as a daemon at this point, so the script will continue
past this point. At that instant, ypbind is not yet registered with rpcbind

[2] If ypbind is NOT yet registered with rpcbind, the first part of the "&"
condition will fail, and the ypwhich will NOT be executed.

[3] The script *assumes* that the ypwhich call will delay 15 seconds, but since
it was not executed, no delay occurs, and the loop will immediately continue.

As a result, the (default 3) iterations of the loop which are supposed to take
45 seconds happen pretty much instantaneously and the loop terminates. The
script then assumes that the ypbind daemon isn't running properly and kills it.

The simple and stupid solution is to put a "sleep 1" before the loop, to give
ypbind plenty of time to start and register with rpcbind. The better solution
would be to split the check of rpcbind and the ypwhich into 2 lines, and in the
case that rpcbind fails to show ypbind then delay and try again - however, this
will *still* cause the start-up to be delayed by 1 second in the failure case.
Even better would be to really check the elapsed time rather than assuming the
15 second timeout, and actually wait the specified time - this might be done by
starting a "sleep 45" as a backgrounded command and looking for it to terminate.


Version-Release number of selected component (if applicable):
ypbind-1.20.4-2.fc8

How reproducible:
Every time

Steps to Reproduce:
1. service ypbind restart or service ypbind start
  
Actual results:
ypbind starts and then is killed by the startup script. (see discussion)

Expected results:
ypbind runs

Additional info:

Comment 1 Stephen Adler 2007-11-23 17:59:10 UTC
I have run into this same issue after installing fedora core 8. My work around
was to add a sleep 1 statement. The problem is that the failure only occurs (at
least with me) during the bootup stage. After the system booted up, I would log
in as root and execute '/etc/rc.d/init.d/ypbind start' and it would come right
up. Thus it was a bit deceptive as to whether the startup script was busted or
not. Anyway, my sleep 1 added to the script (I put it right after the echo -n
"...") fixed my problem.


Comment 2 Bruce Orchard 2007-12-04 18:53:09 UTC
I did not run into this problem when I first installed Fedora 8 on November 12.
 Since then a number of updates have been installed.  Now when I reboot, the
boot time script that starts ypbind reports failure.  When I start ypbind
manually, it works.  

From watching the boot, I can see the script that is checking whether ypbind
started correctly is not waiting anything like 45 seconds.  I think it didn't
wait at all, but I can't say for sure.  This is running on a computer with 2 CPU's.

ypbind:  ypbind-1.20.4-2.fc8
ypwhich:  yp-tools-2.9-2
rpcinfo:  rpcbind-0.1.4-11.fc8


Comment 3 Eli Wapniarski 2007-12-15 06:22:15 UTC
The Problem persists.

Comment 4 Jack Deslippe 2008-01-09 22:36:29 UTC
I have the same problem as comment #2


Comment 5 Steve Dickson 2008-01-11 22:29:35 UTC
Fixed in ypbind-1.20.4-3.fc9

Comment 6 Jack Deslippe 2008-01-11 22:37:05 UTC
will this be coming to Fedora 8?