Description of problem: systemd on overcloud nodes fail to dhcp-interface after reboot . # systemctl status dhcp-interface -l dhcp-interface - DHCP interface br/storage Loaded: loaded (/usr/lib/systemd/system/dhcp-interface@.service; disabled) Active: failed (Result: exit-code) since Mon 2015-09-28 14:35:54 EDT; 16h ago Process: 3037 ExecStart=/sbin/ifup %I (code=exited, status=1/FAILURE) Process: 3034 ExecStartPre=/usr/local/sbin/dhcp-all-interfaces.sh %I (code=exited, status=0/SUCCESS) Main PID: 3037 (code=exited, status=1/FAILURE) Sep 28 14:35:54 overcloud-blockstorage-2.localdomain systemd[1]: Starting DHCP interface br/storage... Sep 28 14:35:54 overcloud-blockstorage-2.localdomain dhcp-all-interfaces.sh[3034]: cat: /sys/class/net/br/storage/addr_assign_type: No such file or directory Sep 28 14:35:54 overcloud-blockstorage-2.localdomain dhcp-all-interfaces.sh[3034]: Inspecting interface: br/storage...Device has generated MAC, skipping. Sep 28 14:35:54 overcloud-blockstorage-2.localdomain ifup[3037]: /sbin/ifup: configuration for br/storage not found. Sep 28 14:35:54 overcloud-blockstorage-2.localdomain ifup[3037]: Usage: ifup <configuration> Sep 28 14:35:54 overcloud-blockstorage-2.localdomain systemd[1]: dhcp-interface: main process exited, code=exited, status=1/FAILURE Sep 28 14:35:54 overcloud-blockstorage-2.localdomain systemd[1]: Failed to start DHCP interface br/storage. Sep 28 14:35:54 overcloud-blockstorage-2.localdomain systemd[1]: Unit dhcp-interface entered failed state. # ls /etc/sysconfig/network-scripts/ifcfg-* /etc/sysconfig/network-scripts/ifcfg-br-storage /etc/sysconfig/network-scripts/ifcfg-eth1 /etc/sysconfig/network-scripts/ifcfg-vlan20 /etc/sysconfig/network-scripts/ifcfg-eth0 /etc/sysconfig/network-scripts/ifcfg-lo /etc/sysconfig/network-scripts/ifcfg-vlan40 Version-Release number of selected component (if applicable): RHOS 7 Director How reproducible: Always Steps to Reproduce: 1. Setup overcloud nodes using templates . 2. Reboot node 3. Actual results: systemd fails to start dhcp-interface@* service Expected results: systemd is able to start all services . Additional info:
Created attachment 1078232 [details] Screenshot node start
Created attachment 1078234 [details] systemd journal ifcfg
In the initial start of the nodes , i do not find this systemd interface services. The following reboots cause systemd to start this bridge interfaces . # egrep "dhcp|interface" before_systemd_status.txt after_systemd_status.txt after_systemd_status.txt:dhcp-interface -> '/org/freedesktop/systemd1/unit/dhcp_2dinterface_40br_2dex_2eservice' after_systemd_status.txt:dhcp-interface - DHCP interface br/ex after_systemd_status.txt: Loaded: loaded (/usr/lib/systemd/system/dhcp-interface@.service; disabled) after_systemd_status.txt: Process: 1401 ExecStartPre=/usr/local/sbin/dhcp-all-interfaces.sh %I (code=exited, status=0/SUCCESS) after_systemd_status.txt:Sep 29 04:31:54 overcloud-controller-0.localdomain dhcp-all-interfaces.sh[1401]: cat: /sys/class/net/br/ex/addr_assign_type: No such file or directory after_systemd_status.txt:Sep 29 04:31:54 overcloud-controller-0.localdomain dhcp-all-interfaces.sh[1401]: Inspecting interface: br/ex...Device has generated MAC, skipping. after_systemd_status.txt:Sep 29 04:32:06 overcloud-controller-0.localdomain systemd[1]: dhcp-interface: main process exited, code=exited, status=1/FAILURE after_systemd_status.txt:Sep 29 04:32:06 overcloud-controller-0.localdomain systemd[1]: Failed to start DHCP interface br/ex. after_systemd_status.txt:Sep 29 04:32:06 overcloud-controller-0.localdomain systemd[1]: Unit dhcp-interface entered failed state. Regards, Jaison R
Other than observing the services, what is the net effect or issue this causes?
(In reply to chris alfonso from comment #6) > Other than observing the services, what is the net effect or issue this > causes? So far no issues are noticed . network / floating ip / glance / cinder works well .
Jaison, can you still observe this with 7.3?
(In reply to Hugh Brock from comment #10) > Jaison, can you still observe this with 7.3? Still noticed: [root@overcloud-controller-0 ~]# systemctl | grep dhcp ● dhcp-interface loaded failed failed DHCP interface br/ex ● dhcp-interface loaded failed failed DHCP interface br/int ● dhcp-interface loaded failed failed DHCP interface br/tun ● dhcp-interface loaded failed failed DHCP interface ovs/system system-dhcp\x2dinterface.slice loaded active active system-dhcp\x2dinterface.slice [root@overcloud-compute-0 ~]# systemctl | grep dhcp ● dhcp-interface loaded failed failed DHCP interface br/ex ● dhcp-interface loaded failed failed DHCP interface br/int ● dhcp-interface loaded failed failed DHCP interface br/tun dhcp-interface loaded active exited DHCP interface eth0 dhcp-interface loaded active exited DHCP interface eth1 ● dhcp-interface loaded failed failed DHCP interface ovs/system system-dhcp\x2dinterface.slice loaded active active system-dhcp\x2dinterface.slice
This bug did not make the OSP 8.0 release. It is being deferred to OSP 10.
Thanks Pablo. From looking at the sosreport for the affected compute node - sosreport-esjc-ost1-cn01p.localdomain-20171115162956 1) The messages that appear to have linked the issue to this bug, namely: Nov 14 12:39:10 esjc-ost1-cn01p ifup: /sbin/ifup: configuration for ovs/system not found. are only due to logging and not a functional problem, and not the source of the issue. A separate bug should be opened to handle the problem associated with the case. These log messages are really just a red herring. 2) The actual issue can't be determined from the available info. As requested in comment 21 by Dan we need: a) the templates associated with the deployment, specifically the nic config yaml files b) the contents of /etc/os-net-config/config.json on the affected node 3) It looks like network teaming is being used, we'd like to understand how it is configured, hence the request in #2. bfournie-OSX:sosreport-esjc-ost1-cn01p.localdomain-20171115162956 bfournie$ cat etc/sysconfig/network-scripts/ifcfg-team1 # This file is autogenerated by os-net-config DEVICE=team1 ONBOOT=yes HOTPLUG=no NM_CONTROLLED=no DEVICETYPE=ovs TYPE=OVSPort OVS_BRIDGE=br-bond1 DEVICETYPE=ovs TYPE=OVSBond BOND_IFACES="enp7s0 enp8s0" OVS_OPTIONS="bond_mode=active-backup" 4) There are some unexpected logs in /var/log/messages, namely: Nov 14 12:39:10 esjc-ost1-cn01p cloud-init: 2017-11-14 12:39:10,739 - stages.py[WARNING]: Failed to rename devices: [unknown] Error performing rename('enp7s0', 'br-bond1') for 00:25:b5:25:0a:6a, br-bond1: Unexpected error while running command. Nov 14 17:31:25 esjc-ost1-cn01p cloud-init: Command: ['ip', 'link', 'set', 'enp7s0', 'name', 'br-bond1'] Nov 14 17:31:25 esjc-ost1-cn01p cloud-init: Exit code: 2 Nov 14 17:31:25 esjc-ost1-cn01p cloud-init: Reason: - Nov 14 17:31:25 esjc-ost1-cn01p cloud-init: Stdout: - Nov 14 17:31:25 esjc-ost1-cn01p cloud-init: Stderr: RTNETLINK answers: File exists Nov 14 17:31:25 esjc-ost1-cn01p /usr/bin/virt-who: [INFO] @main.py:183 - Using configuration ""esjc-ost1-cn01p.OST-JC1"" ("libvirt" mode) From these messages it appears that Siggy's comment in the case (Sigwald, Siggy on Nov 20 2017 at 09:50 AM -08:00) is relevant and does not appear to have been answered. His comments are: "From this i can only assume that: a) you have a problem with cloud-init b) you have a configuration issue on your network-scripts" Our recommendation is to: a) create a separate BZ to separate this issue from the cosmetic issue in this BZ b) get the nic config files and config.json c) get answers to the questions that Siggy brought up in the case regarding cloud-init
Also, the initial problem that this bug was created for, namely these log messages: Sep 28 14:35:54 overcloud-blockstorage-2.localdomain systemd[1]: Failed to start DHCP interface br/storage. has been resolved with this fix - https://bugzilla.redhat.com/show_bug.cgi?id=1403795 which fixes the problem with escaping '-' in names like br-ex, br-storage etc. See the linked upstream bug - https://bugs.launchpad.net/diskimage-builder/+bug/1649409 and BZ https://bugzilla.redhat.com/show_bug.cgi?id=1403795 (where its noted that these log messages for bridges are cosmetic problems only). Again these log messages are not related to the cases that have been associated with this BZ. We may want to backport fix https://bugzilla.redhat.com/show_bug.cgi?id=1403795 to OSP-10 if we want to fix these cosmetic issues.
Adding Needinfo for requests in Comment 31.
Moving needinfo to case owner
I'm marking this as Triaged, as the title and initial problem have been fixed, and a backport of https://bugzilla.redhat.com/show_bug.cgi?id=1403795 is needed to get this fix into OSP-10 (or we need to determine if this fix is actually in rhos-10-patches as there is no upstream branches for diskimage-builder). For any issues related to case 01974327, please open a new bug.
Eduard - yes, we can backport this fix to OSP-7. However, we'd like to make sure that this fix (https://code.engineering.redhat.com/gerrit/#/c/136890/) will resolve the issue Telefonica is hitting. The reason I ask is that this has been reported as just a cosmetic logging issues (see 7 and 15 above for example). So, assuming that this is the issue that Telefonica is hitting I will start the backport. I've created https://bugzilla.redhat.com/show_bug.cgi?id=1579831 to track the backport.
Hi there, If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text. If this bug does not require doc text, please set the 'requires_doc_text' flag to -. Thanks, Alex
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2671