Bug 1170400

Summary: race condition with dhcp causes ypbind to fail at boot
Product: Red Hat Enterprise Linux 7 Reporter: Joe Pruett <joey>
Component: ypbindAssignee: Petr Kubat <pkubat>
Status: CLOSED ERRATA QA Contact: Tomas Dolezal <todoleza>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.0CC: hhorak, kvolny, mmuzila, pkubat, todoleza, wcolburn+bugzilla
Target Milestone: rcKeywords: Patch
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ypbind-1.37.1-9.el7 Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 23:07:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1346768, 1393868, 1400961    

Description Joe Pruett 2014-12-04 01:06:35 UTC
Description of problem:
this really involves systemd, dhclient, networkmanager, and ypbind. since ypbind is the visible effect, that is what i chose as the main component.

in a network where the yp domainname is set by dhcp, there is a race condition where ypbind will sometimes be attempted before the domain has been set. this is because dhclient runs dhclient-script asynchronously and returns to networkmanager. this means that nm-online reports true, which allows systemd to fire off the ypbind service.

i added a couple logger calls to /etc/dhcp/dhclient.d/nis.sh to be able to show the order of things, see below.

i'm not sure where the fix would make sense. perhaps waiting for dhclient-script to finish the initial up event before returning to networkmanager?

an obvious workaround is using NISDOMAIN in /etc/sysconfig/network, but that makes dhcp cry :-).

Version-Release number of selected component (if applicable):


How reproducible:
configure network so that nis domain is handed out via dhcp (not hardcoded via NISDOMAIN) and enable ypbind on a dhcp client. sometimes after boot, ypbind will have failed.


Steps to Reproduce:
1. boot system
2. log in as root
3. run ypwhich

Actual results:
ypwhich: Can't communicate with ypbind


Expected results:
name-of-yp-server

Additional info:
here are the relevant log entries showing the incorrect order of things, the "logger" entries are from my tweaks to the dhclient.d/nis.sh script.

Dec  3 16:47:04 test7 NetworkManager[637]: <info> Activation (eth0) successful, device activated.
Dec  3 16:47:04 test7 dbus-daemon: dbus[632]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
Dec  3 16:47:04 test7 dbus[632]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
Dec  3 16:47:04 test7 systemd: Starting Network Manager Script Dispatcher Service...
Dec  3 16:47:04 test7 systemd: Started RPC bind service.
Dec  3 16:47:04 test7 systemd: Started OpenSSH server daemon.
Dec  3 16:47:04 test7 systemd: Starting NFS file locking service....
Dec  3 16:47:04 test7 systemd: Starting NIS/YP (Network Information Service) Clients to NIS Domain Binder...
Dec  3 16:47:04 test7 dbus-daemon: dbus[632]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Dec  3 16:47:04 test7 dbus[632]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Dec  3 16:47:04 test7 systemd: Started Network Manager Script Dispatcher Service.
Dec  3 16:47:05 test7 systemd: Unit iscsi.service cannot be reloaded because it is inactive.
Dec  3 16:47:05 test7 rpc.statd[1495]: Version 1.3.0 starting
Dec  3 16:47:05 test7 sm-notify[1497]: Version 1.3.0 starting
Dec  3 16:47:05 test7 ModemManager[601]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:00/0000:00:03.0': not supported by any plugin
Dec  3 16:47:05 test7 ypbind: domain not found
Dec  3 16:47:05 test7 ypbind-pre-setdomain: Setting NIS domain:
Dec  3 16:47:05 test7 systemd: ypbind.service: control process exited, code=exited status=1
Dec  3 16:47:05 test7 systemd: Failed to start NIS/YP (Network Information Service) Clients to NIS Domain Binder.
Dec  3 16:47:05 test7 systemd: Unit ypbind.service entered failed state.
Dec  3 16:47:05 test7 systemd: Starting Automounts filesystems on demand...
Dec  3 16:47:05 test7 systemd: Starting Permit User Sessions...
Dec  3 16:47:05 test7 systemd: Started NFS file locking service..
Dec  3 16:47:05 test7 systemd: Started Permit User Sessions.
Dec  3 16:47:05 test7 systemd: Starting Command Scheduler...
Dec  3 16:47:05 test7 logger: Starting dhclient NIS script
Dec  3 16:47:05 test7 logger: dhclient NIS script: setting domain

Comment 3 Schlake 2016-08-26 13:51:49 UTC
Sooo, I have this problem.  I was going to file a ticket on it this morning, but I see this two year old ticket just languishing here, which doesn't inspire me with hope.

Have you found a solution to work around the problem?  I've tried addind systemd dependencies, but I can't make it work.  I'm thinking of just rewriting the ypbind service to include a busy-loop that simply waits for the network to really be up before it tries to ypbind...

Comment 4 Joe Pruett 2016-08-26 15:09:24 UTC
(In reply to Schlake from comment #3)
> Sooo, I have this problem.  I was going to file a ticket on it this morning,
> but I see this two year old ticket just languishing here, which doesn't
> inspire me with hope.
> 
> Have you found a solution to work around the problem?  I've tried addind
> systemd dependencies, but I can't make it work.  I'm thinking of just
> rewriting the ypbind service to include a busy-loop that simply waits for
> the network to really be up before it tries to ypbind...

i actually just ran into this again yesterday. i use kickstart for most installs, so i have just resorted to setting the nis domain staticly via kickstart configs. i was installing a non-kickstart machine and had to remember what i had done to get around this.

someone at least seems to have looked at it in june, but it references a bug that isn't public.

Comment 5 Schlake 2016-08-26 15:17:02 UTC
Unfortunately, we have three NIS domains and laptops need to float between them as they move around, so the DHCP functionality is critical for us.  I spotted a typo in my RequiredBy= attempt, so I'm re-kicking now to see that helps.  Otherwise I've written a new version of ypbind-pre-setdomain that does a busy loop for 10 seconds checking for the domainname to be set before giving up.

Thanks though.

Comment 6 Petr Kubat 2016-08-29 06:13:09 UTC
Thank you for taking the time to report this issue to us. We appreciate the feedback and use reports such as this one to guide our efforts at improving our products. That being said, this bug tracking system is not a mechanism for requesting support, and we are not able to guarantee the timeliness or suitability of a resolution.

If this issue is critical or in any way time sensitive, please raise a ticket through the regular Red Hat support channels to ensure it receives the proper attention and prioritization to assure a timely resolution. 

For information on how to contact the Red Hat production support team, please visit:
    https://www.redhat.com/support/process/production/#howto

Comment 13 errata-xmlrpc 2017-08-01 23:07:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2202