Bug 1302282
Summary: | systemd fails to keep dhcpd running after repeated restarts by NetworkManager dispatcher.d script | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Răzvan Sandu <rsandu2004> |
Component: | dhcp | Assignee: | Pavel Zhukov <pzhukov> |
Status: | CLOSED ERRATA | QA Contact: | Release Test Team <release-test-team-automation> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | 7.4 | CC: | alexandru.tote, alok.pandey, arpad.kunszt, bcodding, bilias, dossow, lnykryn, mbanas, mganisin, mjw, pradeep.armugam, rsandu2004, scarlet.remilia0, spam, thozza, tomek, tpelka |
Target Milestone: | rc | Keywords: | EasyFix, Patch |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | dhcp-4.2.5-52.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-08-01 20:38:58 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1380362, 1393869 |
Description
Răzvan Sandu
2016-01-27 11:20:49 UTC
Is there any relevant info in journalctl output ? If not, please run dhcpd by hand with: # dhcpd -d and attach the output here. I also need to see the /etc/dhcp/dhcpd.conf I saw the same issue on a machine with more than 5 network interfaces. What seems to happen is that the service is restarted each time an interface comes up. But note the following in https://www.freedesktop.org/software/systemd/man/systemd.unit.html under "StartLimitIntervalSec=, StartLimitBurst=": [...] By default, units which are started more than 5 times within 10 seconds are not permitted to start any more times until the 10 second interval ends. [...] Note that units which are configured for Restart= and which reach the start limit are not attempted to be restarted anymore; however, they may still be restarted manually at a later point, from which point on, the restart logic is again activated. Which explains why manual restarting seems to work. A workaround is to set DefaultStartLimitBurst=16 (or some other large number > than the number of network interfaces) in /etc/systemd/system.conf Better would be to be able to indicate which network interfaces should trigger a restart for the dhcpd.service. But I couldn't figure out how to do that. Or maybe http://pkgs.fedoraproject.org/cgit/rpms/dhcp.git/commit/?h=f24&id=4b570e22a7f2bb580fc95b4fa64474de3f09b3cb could help (In reply to Jiri Popelka from comment #4) > Or maybe > http://pkgs.fedoraproject.org/cgit/rpms/dhcp.git/commit/ > ?h=f24&id=4b570e22a7f2bb580fc95b4fa64474de3f09b3cb > could help I didn't try that, but yes, that could help. It could be improved to check whether it is already waiting on another interface coming up. As it is currently implemented it still seems to restart dhcpd for each and every interface that comes up (just with a 2 second delay). If the script could check whether it is already in the process of waiting to restart dhcp and then not do another restart, but just reset the sleep seconds, that would prevent a lot of dhcpd restarts on machines with lots of interfaces. Of course a mechanism to tell dhcpd which interfaces are interesting/uninteresting would be helpful too. For me, this happens on multihomed machines (CentOS 7 PCs with multiple Ethernet interfaces acting as routers/gateways), when just *some* interfaces (the LAN one) provide dhcpd services. At boot, the dhcpd service doesn't come up, even if it is enabled via systemctl. It can be started manually, after reboot, with no problem. Interfaces are managed via NetworkManager. Since it's a router, all Ethernet interfaces are set to start automatically at boot. The relevant part of the log says: Jun 24 15:46:21 example dhcpd: No subnet declaration for enp5s0 (MY_PRIVATE IP). Jun 24 15:46:21 example dhcpd: ** Ignoring requests on enp5s0. If this is not what Jun 24 15:46:21 example dhcpd: you want, please write a subnet declaration Jun 24 15:46:21 example dhcpd: in your dhcpd.conf file for the network segment Jun 24 15:46:21 example dhcpd: to which interface enp5s0 is attached. ** Jun 24 15:46:21 example dhcpd: Jun 24 15:46:21 example dhcpd: Jun 24 15:46:21 example dhcpd: No subnet declaration for enp3s0 (MY_PUBLIC_IP). Jun 24 15:46:21 example dhcpd: ** Ignoring requests on enp3s0. If this is not what Jun 24 15:46:21 example dhcpd: you want, please write a subnet declaration Jun 24 15:46:21 example dhcpd: in your dhcpd.conf file for the network segment Jun 24 15:46:21 example dhcpd: to which interface enp3s0 is attached. ** Jun 24 15:46:21 example dhcpd: Jun 24 15:46:21 example dhcpd: Sending on Socket/fallback/fallback-net Jun 24 15:46:21 example systemd: start request repeated too quickly for dhcpd.service Jun 24 15:46:21 example systemd: Unit dhcpd.service entered failed state. Jun 24 15:46:21 example systemd: dhcpd.service failed. In the example above, only enp5s0 (LAN interface 192.168.X.1) provide dhcpd service (to internal LAN computers). Both interfaces are set with fixed (manual) IP addresses and start automatically at boot. Last three lines of the log above seems to indicate a systemd problem (dhcpd service not starting, probably because the interfaces are not yet ready when the service tries to start). Best regards, Răzvan Sorry for a mistake in the comment no. 6 above. enp5s0 and enp3s0 are *not* providing dhcpd services, there are no subnet declarations for them in /etc/dhcp/dhcpd.conf However, there is also an enp3s1 present in the system (not seen above), with a static private IP (192.168.XX.1), that actually *has* a subnet declaration in the file and is supposed to provide dhcpd services for the LAN. Răzvan We are also facing this issue on servers with RHEL7.2/Centos7.2. Steps to reproduce: 1. Configure IP for 6 or more network interfaces 2. Ensure that dhcpd service is enabled for startup during boot 3. Reboot the node After the reboot the dhdpd service is not started and lands in failed state. Manual restart of the dhcpd service works fine and brings up the service. Please see the relevant outputs below, let me know if more info/logs are needed. We have a application which depends on the dhcpd service and it also performs the dhcpd (re)start operation during boot. Due to this issue our aplication is also failing to start post boot/reboot. (on nodes configured with with 5 or more network interfaces) This issue is causing problem to our customers. The fix mentioned in "Comment 4" above seems to resolve the issue. (fix - http://pkgs.fedoraproject.org/cgit/rpms/dhcp.git/commit/?h=f24&id=4b570e22a7f2bb580fc95b4fa64474de3f09b3cb ) Is any patch/rpm is available/released containing the above fix? If not, when can we expect a fix for the this? =============================================================================== LOGS =============================================================================== [root@n7 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.2 (Maipo) [root@n7 ~]# [root@n7 ~]# rpm -qa | grep dhcp dhcp-libs-4.2.5-42.el7.x86_64 dhcp-4.2.5-42.el7.x86_64 dhcp-common-4.2.5-42.el7.x86_64 [root@n7 ~]# [root@n7 ~]# systemctl is-enabled dhcpd enabled [root@n7 ~]# [root@n7 ~]# systemctl status dhcpd ● dhcpd.service - DHCPv4 Server Daemon Loaded: loaded (/usr/lib/systemd/system/dhcpd.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2016-07-25 12:30:16 EDT; 1s ago Docs: man:dhcpd(8) man:dhcpd.conf(5) Main PID: 26695 (dhcpd) Status: "Dispatching packets..." CGroup: /system.slice/dhcpd.service └─26695 /usr/sbin/dhcpd -f -cf /etc/dhcp/dhcpd.conf -user dhcpd -group dhcpd --no-pid Jul 25 12:30:16 n7 dhcpd[26695]: Jul 25 12:30:16 n7 dhcpd[26695]: Jul 25 12:30:16 n7 dhcpd[26695]: No subnet declaration for ens3f0 (10.37.46.0). Jul 25 12:30:16 n7 dhcpd[26695]: ** Ignoring requests on ens3f0. If this is not what Jul 25 12:30:16 n7 dhcpd[26695]: you want, please write a subnet declaration Jul 25 12:30:16 n7 dhcpd[26695]: in your dhcpd.conf file for the network segment Jul 25 12:30:16 n7 dhcpd[26695]: to which interface ens3f0 is attached. ** Jul 25 12:30:16 n7 dhcpd[26695]: Jul 25 12:30:16 n7 dhcpd[26695]: Sending on Socket/fallback/fallback-net Jul 25 12:30:16 n7 systemd[1]: Started DHCPv4 Server Daemon. [root@n7 ~]# [root@n7 ~]# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo ... 2: ens3f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000 link/ether 28:92:4a:cd:80:50 brd ff:ff:ff:ff:ff:ff inet 10.37.46.0/24 brd 10.37.46.255 scope global ens3f0 ... 3: ens3f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000 link/ether 28:92:4a:cd:80:54 brd ff:ff:ff:ff:ff:ff inet 10.37.48.0/24 brd 10.37.48.255 scope global ens3f1 ... 4: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether ac:16:2d:7b:ab:c8 brd ff:ff:ff:ff:ff:ff inet 192.168.1.7/16 brd 192.168.255.255 scope global eno1 ... 5: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000 link/ether ac:16:2d:7b:ab:c9 brd ff:ff:ff:ff:ff:ff inet 10.37.40.0/24 brd 10.37.40.255 scope global eno2 ... 6: eno3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000 link/ether ac:16:2d:7b:ab:ca brd ff:ff:ff:ff:ff:ff inet 10.37.42.0/24 brd 10.37.42.255 scope global eno3 ... 7: eno4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000 link/ether ac:16:2d:7b:ab:cb brd ff:ff:ff:ff:ff:ff inet 10.37.44.0/24 brd 10.37.44.255 scope global eno4 ... [root@n7 ~]# ------------------ REBOOT THE NODE ------------------ [root@n7 ~]# uptime 12:38:46 up 1 min, 1 user, load average: 4.37, 2.69, 1.05 [root@n7 ~]# [root@n7 ~]# systemctl status dhcpd ● dhcpd.service - DHCPv4 Server Daemon Loaded: loaded (/usr/lib/systemd/system/dhcpd.service; enabled; vendor preset: disabled) Active: failed (Result: start-limit) since Mon 2016-07-25 12:38:00 EDT; 57s ago Docs: man:dhcpd(8) man:dhcpd.conf(5) Process: 2758 ExecStart=/usr/sbin/dhcpd -f -cf /etc/dhcp/dhcpd.conf -user dhcpd -group dhcpd --no-pid (code=killed, signal=TERM) Main PID: 2758 (code=killed, signal=TERM) Status: "Dispatching packets..." Jul 25 12:38:00 n7 dhcpd[2758]: Sending on Socket/fallback/fallback-net Jul 25 12:38:00 n7 systemd[1]: Started DHCPv4 Server Daemon. Jul 25 12:38:00 n7 systemd[1]: Stopping DHCPv4 Server Daemon... Jul 25 12:38:00 n7 systemd[1]: start request repeated too quickly for dhcpd.service Jul 25 12:38:00 n7 systemd[1]: Failed to start DHCPv4 Server Daemon. Jul 25 12:38:00 n7 systemd[1]: Unit dhcpd.service entered failed state. Jul 25 12:38:00 n7 systemd[1]: dhcpd.service failed. Jul 25 12:38:02 n7 systemd[1]: start request repeated too quickly for dhcpd.service Jul 25 12:38:02 n7 systemd[1]: Failed to start DHCPv4 Server Daemon. Jul 25 12:38:02 n7 systemd[1]: dhcpd.service failed. [root@n7 ~]# Logs snippet from [/var/log/messages] ------------------ Jul 25 12:37:48 n7 NetworkManager[914]: <info> NetworkManager (version 1.0.6-27.el7) is starting... Jul 25 12:37:48 n7 NetworkManager[914]: <info> Read config: /etc/NetworkManager/NetworkManager.conf and conf.d: 00-server.conf, 10-ibft-plugin.conf Jul 25 12:37:48 n7 NetworkManager[914]: <info> ifcfg-rh: new connection /etc/sysconfig/network-scripts/ifcfg-ens3f1 (ae2deb92-5d89-447e-a15e-379b6c107ee8,"ens3f1") Jul 25 12:37:48 n7 NetworkManager[914]: <info> ifcfg-rh: new connection /etc/sysconfig/network-scripts/ifcfg-ens3f0 (da2bb51a-da06-4cb4-a0eb-79363963ff1b,"ens3f0") Jul 25 12:37:48 n7 NetworkManager[914]: <info> ifcfg-rh: new connection /etc/sysconfig/network-scripts/ifcfg-eno4 (bb75b29c-3933-413d-9476-6fb886872b2c,"eno4") Jul 25 12:37:48 n7 NetworkManager[914]: <info> ifcfg-rh: new connection /etc/sysconfig/network-scripts/ifcfg-eno3 (77a1915a-2115-4b5d-a598-854914108ccf,"eno3") Jul 25 12:37:48 n7 NetworkManager[914]: <info> ifcfg-rh: new connection /etc/sysconfig/network-scripts/ifcfg-eno2 (673eb872-5c0b-404c-a1da-ba06c8dd7004,"eno2") Jul 25 12:37:48 n7 NetworkManager[914]: <info> ifcfg-rh: new connection /etc/sysconfig/network-scripts/ifcfg-eno1 (ced7c3c8-512e-44da-9e93-a4e65e2dce67,"System eno1") Jul 25 12:37:51 n7 dbus[918]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' Jul 25 12:37:51 n7 dbus-daemon: dbus[918]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' Jul 25 12:37:51 n7 systemd: Starting Network Manager Script Dispatcher Service... ------------------ Jul 25 12:37:51 n7 dbus[918]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher' Jul 25 12:37:51 n7 dbus-daemon: dbus[918]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher' Jul 25 12:37:51 n7 systemd: Started Network Manager Script Dispatcher Service. ------------------ Jul 25 12:37:51 n7 nm-dispatcher: Dispatching action 'up' for ens3f0 Jul 25 12:37:56 n7 systemd: Started Network Manager Wait Online. Jul 25 12:37:56 n7 systemd: Starting LSB: Bring up/down networking... Jul 25 12:37:58 n7 systemd: Reached target Network is Online. Jul 25 12:37:58 n7 systemd: Starting Network is Online. Jul 25 12:37:58 n7 systemd: Starting DHCPv4 Server Daemon... Jul 25 12:37:59 n7 dhcpd: Internet Systems Consortium DHCP Server 4.2.5 Jul 25 12:37:59 n7 dhcpd: Copyright 2004-2013 Internet Systems Consortium. Jul 25 12:37:59 n7 systemd: Started DHCPv4 Server Daemon ------------------ Jul 25 12:37:59 n7 nm-dispatcher: Dispatching action 'up' for eno1 Jul 25 12:37:59 n7 systemd: Stopping DHCPv4 Server Daemon... Jul 25 12:37:59 n7 systemd: Starting DHCPv4 Server Daemon... Jul 25 12:37:59 n7 dhcpd: Internet Systems Consortium DHCP Server 4.2.5 Jul 25 12:37:59 n7 dhcpd: Copyright 2004-2013 Internet Systems Consortium. Jul 25 12:37:59 n7 systemd: Started DHCPv4 Server Daemon. ------------------ Jul 25 12:37:59 n7 nm-dispatcher: Dispatching action 'up' for eno4 Jul 25 12:37:59 n7 systemd: Stopping DHCPv4 Server Daemon... Jul 25 12:37:59 n7 systemd: Starting DHCPv4 Server Daemon... Jul 25 12:37:59 n7 dhcpd: Internet Systems Consortium DHCP Server 4.2.5 Jul 25 12:37:59 n7 dhcpd: Copyright 2004-2013 Internet Systems Consortium. Jul 25 12:37:59 n7 systemd: Started DHCPv4 Server Daemon. ------------------ Jul 25 12:37:59 n7 nm-dispatcher: Dispatching action 'up' for ens3f1 Jul 25 12:38:00 n7 systemd: Stopping DHCPv4 Server Daemon... Jul 25 12:38:00 n7 systemd: Starting DHCPv4 Server Daemon... Jul 25 12:38:00 n7 dhcpd: Internet Systems Consortium DHCP Server 4.2.5 Jul 25 12:38:00 n7 dhcpd: Copyright 2004-2013 Internet Systems Consortium. Jul 25 12:38:00 n7 systemd: Started DHCPv4 Server Daemon. ------------------ Jul 25 12:38:00 n7 nm-dispatcher: Dispatching action 'up' for eno2 Jul 25 12:38:00 n7 systemd: Stopping DHCPv4 Server Daemon... Jul 25 12:38:00 n7 systemd: Starting DHCPv4 Server Daemon... Jul 25 12:38:00 n7 dhcpd: Internet Systems Consortium DHCP Server 4.2.5 Jul 25 12:38:00 n7 dhcpd: Copyright 2004-2013 Internet Systems Consortium. Jul 25 12:38:00 n7 systemd: Started DHCPv4 Server Daemon. ------------------ Jul 25 12:38:00 n7 nm-dispatcher: Dispatching action 'up' for eno3 Jul 25 12:38:00 n7 systemd: Stopping DHCPv4 Server Daemon... Jul 25 12:38:00 n7 systemd: start request repeated too quickly for dhcpd.service Jul 25 12:38:00 n7 systemd: Failed to start DHCPv4 Server Daemon. Jul 25 12:38:00 n7 systemd: Unit dhcpd.service entered failed state. Jul 25 12:38:00 n7 systemd: dhcpd.service failed. Jul 25 12:38:00 n7 nm-dispatcher: Job for dhcpd.service failed because start of the service was attempted too often. See "systemctl status dhcpd.service" and "journalctl -xe" for details. Jul 25 12:38:00 n7 nm-dispatcher: To force a start use "systemctl reset-failed dhcpd.service" followed by "systemctl start dhcpd.service" again. ------------------ [OUR CUSTOM SERVICE "cmu" depending on dhcpd also fails] Jul 25 12:38:00 n7 cmu: starting tftp server check ... done Jul 25 12:38:02 n7 systemd: start request repeated too quickly for dhcpd.service Jul 25 12:38:02 n7 systemd: Failed to start DHCPv4 Server Daemon. Jul 25 12:38:02 n7 systemd: dhcpd.service failed. Jul 25 12:38:02 n7 cmu: *** dhcpd returned non zero exit code [1] when restarted Jul 25 12:38:02 n7 cmu: --------------------------------------------------------- Jul 25 12:38:02 n7 cmu: Job for dhcpd.service failed because start of the service was attempted too often. See "systemctl status dhcpd.service" and "journalctl -xe" for details. Jul 25 12:38:02 n7 cmu: To force a start use "systemctl reset-failed dhcpd.service" followed by "systemctl start dhcpd.service" again. Jul 25 12:38:02 n7 cmu: --------------------------------------------------------- ------------------ =============================================================================== Hello, It seems the dhcpd service startup does not integrates well in the systemd initialization process, as a whole (need some timeout after interfaces initialization?). This is a very annoying bug on production machines, leading to suboptimal solutions (DHCPD service has to be restarted artificially, via a custom script called from /etc/rc.d/rc.local, some time *after* the whole networking stack has been initialised). It is also a regression from the RHEL/CentOS 6.x state of affairs, since dhcpd service in 6.x started flawlesly if /etc/dhcp/dhcpd.conf was correctly written. I would gladly test a solution for this on my machines, if one occurs. Thanks, Răzvan This is also affecting me. Any movement regarding a fix for this? Just ran into this on a machine at the NFS BAT.. here's our fix: --- /etc.old/NetworkManager/dispatcher.d/12-dhcpd 2015-10-13 12:58:30.000000000 -0400 +++ /etc/NetworkManager/dispatcher.d/12-dhcpd 2016-10-18 07:39:07.148828416 -0400 @@ -6,7 +6,9 @@ # whenever interface is brought up by NM (rhbz #565921) if [ "$STATUS" = "up" ]; then # restart the services + systemctl reset-failed dhcpd.service systemctl -q is-enabled dhcpd.service && systemctl restart dhcpd.service + systemctl reset-failed dhcpd6.service systemctl -q is-enabled dhcpd6.service && systemctl restart dhcpd6.service fi *** Bug 1387995 has been marked as a duplicate of this bug. *** Basically the same solution was suggested as part of bug #1387995 by Lukáš Nykrýn apart from StartLimitInterval and StartLimitBurst tweaking. Additionally, I think that this issue reveals a bug in systemd. When systemd is requested to restart a service, and it cannot do so due to default or configured limits, it shouldn't perform any actions at all and just fail. Stopping a service doesn't do anything near the requested restart. If systemd evaluated the situation properly and just reported error, the issue wouldn't probably arise apart from filling the logs with systemd errors. Should we also open a bug on systemd to get it fixed as well? I forgot to include (also by Lukáš):
> In ideal case dhcpd should offer an reload action.
(In reply to Pavel Šimerda (pavlix) from comment #17) > Additionally, I think that this issue reveals a bug in systemd. When systemd > is requested to restart a service, and it cannot do so due to default or > configured limits, it shouldn't perform any actions at all and just fail. > Stopping a service doesn't do anything near the requested restart. > > If systemd evaluated the situation properly and just reported error, the > issue wouldn't probably arise apart from filling the logs with systemd > errors. > > Should we also open a bug on systemd to get it fixed as well? I don't think so, the restart is like alias for stop action and then the start action. We figure that something is wrong in the latter. Also I don't think it would be wise to just ignore the restart command. I followed fedora's conf of /etc/NetworkManager/dispatcher.d/12-dhcp and it worked fine. See: https://bugzilla.redhat.com/show_bug.cgi?id=1003695 Hello, I may confirm that I still struck this bug on CentOS 7.3 (1611) as of February 16th, 2017. The Fedora systems I have are not affected by this, so probably the solution is to import the already existing fix from Fedora. Regards, Răzvan Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1956 |