Description of problem: I have a system which uses /etc/init.d/network to bringup network interfaces (bridging is involved). The system also is a part of nis domain but ypbind fails to start and network logins are impossible. autofs is also affected as it is supposed to use some nis maps. Version-Release number of selected component (if applicable): systemd-37-3.fc16.x86_64 ypbind-1.33-7.fc16.x86_64 autofs-5.0.6-3.fc16.x86_64 How reproducible: Always when system boots. Steps to Reproduce: 1. disable NetowrkManager 2. enable network service 3. enable ypbind 4. reboot system 5. login as root and issue ypwhich Actual results: cannot communicate with ypbind (or something like that) Expected results: server the system is bound to Additional info: Used to work before systemd times (Fedora 12 for sure).
What does "systemctl status network.service ypbind.service" say?
It was showing that ypbind failed, I will post the whole output in the evening, when I get back home.
It's more likely a problem with ypbind than systemd. Reassigning.
Unless it was started too early (for example before rpcbind). I am able to start it from commandline (using systemctl) without problems.
To check that, please boot with "log_buf_len=1M systemd.log_level=debug systemd.log_target=kmsg" and attach the output of the "dmesg" command here.
Do you see anything ypbind related in /var/log/messages?
Yes, there were a few messages, I will include them here together with systemd log and "systemctl status network.service ypbind.service" later today.
/var/log/messages: Nov 14 17:50:29 vesemir setsebool: The allow_ypbind policy boolean was changed to 1 by root Nov 14 17:50:29 vesemir dbus[1320]: avc: received policyload notice (seqno=2) Nov 14 17:50:29 vesemir dbus-daemon[1320]: dbus[1320]: avc: received policyload notice (seqno=2) Nov 14 17:50:29 vesemir dbus[1320]: [system] Reloaded configuration Nov 14 17:50:29 vesemir dbus-daemon[1320]: dbus[1320]: [system] Reloaded configuration Nov 14 17:50:29 vesemir ypbind[1622]: Cannot register service: RPC: Unable to receive; errno = Connection refused Nov 14 17:50:29 vesemir ypbind[1622]: Unable to register (YPBINDPROG, YPBINDVERS, udp). Nov 14 17:50:29 vesemir ypbind: Binding NIS service Nov 14 17:50:40 vesemir dbus[1320]: [system] Activating via systemd: service name='org.freedesktop.ConsoleKit' unit='console-kit-daemon.service' Nov 14 17:50:40 vesemir dbus-daemon[1320]: dbus[1320]: [system] Activating via systemd: service name='org.freedesktop.ConsoleKit' unit='console-kit-daemon.service' Nov 14 17:50:40 vesemir dbus[1320]: [system] Activating service name='org.freedesktop.PolicyKit1' (using servicehelper) Nov 14 17:50:40 vesemir dbus-daemon[1320]: dbus[1320]: [system] Activating service name='org.freedesktop.PolicyKit1' (using servicehelper) Nov 14 17:50:40 vesemir polkitd[1697]: started daemon version 0.102 using authority implementation `local' version `0.102' Nov 14 17:50:40 vesemir dbus[1320]: [system] Successfully activated service 'org.freedesktop.PolicyKit1' Nov 14 17:50:40 vesemir dbus-daemon[1320]: dbus[1320]: [system] Successfully activated service 'org.freedesktop.PolicyKit1' Nov 14 17:50:40 vesemir dbus[1320]: [system] Successfully activated service 'org.freedesktop.ConsoleKit' Nov 14 17:50:40 vesemir dbus-daemon[1320]: dbus[1320]: [system] Successfully activated service 'org.freedesktop.ConsoleKit' Nov 14 17:51:29 vesemir ypbind-post-waitbind[1623]: rpcinfo: can't contact portmapper: RPC: Timed out Nov 14 17:51:31 vesemir ypbind: Binding took 62 seconds Nov 14 17:51:31 vesemir ypbind: NIS domain: nis.jot23.org, NIS server: ------------------------------------------------------- systemctl status network.service ypbind.service output: network.service - LSB: Bring up/down networking Loaded: loaded (/etc/rc.d/init.d/network) Active: active (running) since Mon, 14 Nov 2011 17:51:32 +0100; 7min ago Process: 1254 ExecStart=/etc/rc.d/init.d/network start (code=exited, status=0/SUCCESS) CGroup: name=systemd:/system/network.service └ 1747 /sbin/dhclient -H vesemir -1 -q -cf /etc/dhcp/dhcl... ypbind.service - NIS/YP (Network Information Service) Clients to NIS Domain Binder Loaded: loaded (/lib/systemd/system/ypbind.service; enabled) Active: failed since Mon, 14 Nov 2011 17:51:31 +0100; 7min ago Process: 1623 ExecStartPost=/usr/lib/ypbind/ypbind-post-waitbind (code=exited, status=0/SUCCESS) Process: 1622 ExecStart=/usr/sbin/ypbind -n $OTHER_YPBIND_OPTS (code=exited, status=1/FAILURE) Process: 1620 ExecStartPre=/usr/sbin/setsebool allow_ypbind=1 (code=exited, status=0/SUCCESS) Process: 1617 ExecStartPre=/usr/lib/ypbind/ypbind-pre-setdomain (code=exited, status=0/SUCCESS) CGroup: name=systemd:/system/ypbind.service ------------------------------------------------------------ When I give systemctl restart ypbind.service from command prompt then the status changes: network.service - LSB: Bring up/down networking Loaded: loaded (/etc/rc.d/init.d/network) Active: active (running) since Mon, 14 Nov 2011 17:51:32 +0100; 8min ago Process: 1254 ExecStart=/etc/rc.d/init.d/network start (code=exited, status=0/SUCCESS) CGroup: name=systemd:/system/network.service └ 1747 /sbin/dhclient -H vesemir -1 -q -cf /etc/dhcp/dhcl... ypbind.service - NIS/YP (Network Information Service) Clients to NIS Domain Binder Loaded: loaded (/lib/systemd/system/ypbind.service; enabled) Active: active (running) since Mon, 14 Nov 2011 18:00:13 +0100; 8s ago Process: 1947 ExecStartPost=/usr/lib/ypbind/ypbind-post-waitbind (code=exited, status=0/SUCCESS) Process: 1944 ExecStartPre=/usr/sbin/setsebool allow_ypbind=1 (code=exited, status=0/SUCCESS) Process: 1941 ExecStartPre=/usr/lib/ypbind/ypbind-pre-setdomain (code=exited, status=0/SUCCESS) Main PID: 1946 (ypbind) CGroup: name=systemd:/system/ypbind.service └ 1946 /usr/sbin/ypbind -n
Created attachment 533577 [details] dmesg output, grepped by 'systemd\[1\]'
This is what is important in dmesg output. [ 8.927416] systemd[1]: Installed new job rpcbind.socket/start as 64 [ 8.927481] systemd[1]: Installed new job rpcbind.service/start as 84 [ 8.927512] systemd[1]: Installed new job ypbind.service/start as 96 [ 16.348618] systemd[1]: rpcbind.socket changed dead -> listening [ 16.348629] systemd[1]: Job rpcbind.socket/start finished, result=done So far so good. [ 21.637868] systemd[1]: Trying to enqueue job ypbind.service/try-restart/replace [ 21.637916] systemd[1]: Installed new job ypbind.service/restart as 355 [ 21.637923] systemd[1]: Enqueued job ypbind.service/try-restart as 355 That's the first problem - ypbind is started the second time - who is restarting it and why? [ 21.670211] systemd[1]: ypbind.service changed dead -> start-pre [ 21.763365] systemd[1]: About to execute: /usr/sbin/ypbind -n $OTHER_YPBIND_OPTS [ 21.841852] systemd[1]: Child 1622 belongs to ypbind.service [ 21.841859] systemd[1]: ypbind.service: main process exited, code=exited, status=1 ypbind doesn't start, which is right, because rpcbind is not started yet. [ 21.847924] systemd[1]: Incoming traffic on rpcbind.socket [ 21.847976] systemd[1]: Trying to enqueue job rpcbind.service/start/replace [ 21.848241] systemd[1]: Enqueued job rpcbind.service/start as 84 [ 21.848254] systemd[1]: rpcbind.socket changed listening -> running rpcbind is started after rpcbind's socket is touched, but it's too late. There is "After=rpcbind.service" in ypbind's unit file, shouldn't rpcbind have been already started when ypbind is executed? I think it should.
(In reply to comment #10) > [ 21.637868] systemd[1]: Trying to enqueue job > ypbind.service/try-restart/replace > [ 21.637916] systemd[1]: Installed new job ypbind.service/restart as 355 > [ 21.637923] systemd[1]: Enqueued job ypbind.service/try-restart as 355 > > That's the first problem - ypbind is started the second time - who is > restarting it and why? Probably /etc/dhcp/dhclient.d/nis.sh > There is "After=rpcbind.service" in ypbind's unit file, shouldn't rpcbind have > been already started when ypbind is executed? I think it should. Right. I don't see nis.sh using "--ignore-dependencies", so it seems that the bug is in systemd when processing the try-restart request for a unit that already has a start job installed and is waiting for its dependencies. I'll try to reproduce it with a simpler test case...
Reproduced with a couple of units: $ cat sleep.service [Unit] Description=Sleep for 1 minute [Service] Type=oneshot ExecStart=/bin/sleep 60 $ cat hello.service [Unit] Description=Hello World After=sleep.service Wants=sleep.service [Service] Type=oneshot ExecStart=/bin/echo "Hello World" and running "systemctl try-restart hello.service" while "systemctl start hello.service" is still waiting.
I'm seeing the same: it's making yp boot fail on all my systems updated from f14 to f16. I seem to have a workaround for now by starting rpcbind.service manally in /etc/dhcp/dhclient.d/nis.sh.
Is this still an issue on a fully updated release or can this bug be closed?
I didn't see any indication of the fix above so I didn't expect it to work. I was right. No, this bug can't be closed, it still does not work. <rant mode on> It is suprising that people in XXI century still believe in miracles. <rant mode off>
Same on FC15 and FC16 32Bit. Here what i did: systemctl start ypbind.service systemctl status ypbind.service ypbind.service - NIS/YP (Network Information Service) Clients to NIS Domain Binder Loaded: loaded (/lib/systemd/system/ypbind.service; enabled) Active: active (running) since Tue, 21 Feb 2012 10:43:03 +0100; 10s ago Process: 2387 ExecStartPost=/usr/lib/ypbind/ypbind-post-waitbind (code=exited, status=0/SUCCESS) Process: 2383 ExecStartPre=/usr/sbin/setsebool allow_ypbind=1 (code=exited, status=1/FAILURE) Process: 2381 ExecStartPre=/usr/lib/ypbind/ypbind-pre-setdomain (code=exited, status=0/SUCCESS) Main PID: 2386 (ypbind) CGroup: name=systemd:/system/ypbind.service └ 2386 /usr/sbin/ypbind -n _________________________________________________________________________ dmesg |grep ypbind [ 22.465608] ypbind[966]: Binding NIS service [ 22.511117] ypbind[964]: Kann den Service nicht registrieren: RPC: Kann nicht empfangen; errno = Verbindungsaufbau abgelehnt [ 22.511183] ypbind[964]: Unable to register (YPBINDPROG, YPBINDVERS, udp). [ 22.512153] systemd[1]: ypbind.service: main process exited, code=exited, status=1 [ 82.674666] ypbind-post-waitbind[965]: rpcinfo: can't contact portmapper: RPC: Timed out [ 84.702656] ypbind[972]: Binding took 62 seconds [ 84.704746] ypbind[974]: NIS domain: XXX, ypbind not registered with rpcbind. [ 84.715488] systemd[1]: Unit ypbind.service entered failed state. _________________________________________________________________________ My workaround is to disable the service ypbind (systemctl disable ypbind.service) and add this to /etc/rc.d/rc.local systemctl start ypbind.service Is there any chance to fix this bug?
*** Bug 645569 has been marked as a duplicate of this bug. ***
The problem is when job_type_merge() in job.c merges a JOB_TRY_RESTART and a JOB_START job, it turns the TRY_RESTART job into a RESTART job, but loses all the dependencies on the JOB_START job. So it tries to do the RESTART immediately, even though none of the dependencies of the JOB_START have been satisfied. The systemd developers have know about this bug since 2010, and there's been no progress toward fixing it, even though there's a clear reproducer in comment #12, so don't hold your breath waiting for a fix.
Created attachment 571888 [details] self-contained test program Save this file as /usr/local/bin/abc.pl and make it executable. Run abc.pl create m do systemctl start m.service Observe that only the a.service is started.
(In reply to comment #19) > The problem is when job_type_merge() in job.c merges a JOB_TRY_RESTART and a > JOB_START job, it turns the TRY_RESTART job into a RESTART job, but loses all > the dependencies on the JOB_START job. So it tries to do the RESTART > immediately, even though none of the dependencies of the JOB_START have been > satisfied. Right. But the fix should not consist of transferring the dependencies into the merged job. The right behaviour is to have the TRY_RESTART job take a shortcut and finish immediately with success - because the service is still inactive, so try-restart should be a NOP. The scheduled START job should not be disturbed.
This should avoid the running of the services out of order: http://cgit.freedesktop.org/systemd/systemd/commit/?id=dd17d38879503f018fdf6bbff822970afcddb6f1 It's still not the optimal solution though (see comment #21).
I built a local systemd with that patch and tried it, but it didn't improve matters with either ypbind or my test program.
Jay, could you test if replacing service ypbind condrestart with systemctl try-restart --no-block ypbind.service in /etc/dhcp/dhclient.d/nis.sh helps? And similarly, in your abc.pl try adding "--no-block" to the try-restart call.
That seems to make them both work, at least on lab3. Lab4 is being more problematic, but its configuration appears to be messed up. In any case, perhaps we should change how systemd treats service condrestart requests so as to undo breakage of existing scripts?
*** Bug 814946 has been marked as a duplicate of this bug. ***
Upstream patch to complete the fix: http://cgit.freedesktop.org/systemd/systemd/commit/?id=e0209d83e7b30153f43b1a633c955f66eb2c2e4a
Will there be an f16 build with this fix?
F16 candidate build: http://koji.fedoraproject.org/koji/taskinfo?taskID=4054950
Thanks! Running it locally now. I can only reproduce the problem intermittently, though; I'll see how it looks in a bit.
systemd-44-9.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/systemd-44-9.fc17
systemd-37-20.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/systemd-37-20.fc16
Package systemd-44-9.fc17: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing systemd-44-9.fc17' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-8127/systemd-44-9.fc17 then log in and leave karma (feedback).
systemd-44-12.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.
systemd-37-25.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.