Bug 1986294

Summary: "Unable to start service network" on OC nodes
Product: Red Hat OpenStack Reporter: Filip Hubík <fhubik>
Component: openstack-tripleo-image-elementsAssignee: James Slagle <jslagle>
Status: CLOSED ERRATA QA Contact: nlevinki <nlevinki>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 16.1 (Train)CC: aschultz, ashtempl, bdobreli, jhajyahy, mburns, wznoinsk
Target Milestone: z7Keywords: AutomationBlocker, Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-image-elements-10.6.2-1.20210528012405.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-09 20:20:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Filip Hubík 2021-07-27 08:24:51 UTC
Description of problem:

OC deployment fails right after ~30 minutes with generic issue related to network.service on all OC nodes:

TASK [tripleo-network-config : Ensure network service is enabled] **************
Monday 26 July 2021  18:35:07 +0000 (0:00:00.103)       0:01:40.076 *********** 
fatal: [ceph-0]: FAILED! => {"changed": false, "msg": "Unable to start service network: Job for network.service failed because the control process exited with error code.\nSee \"systemctl status network.service\" and \"journalctl -xe\" for details.\n"}
fatal: [compute-0]: FAILED! => {"changed": false, "msg": "Unable to start service network: Job for network.service failed because the control process exited with error code.\nSee \"systemctl status network.service\" and \"journalctl -xe\" for details.\n"}
fatal: [controller-0]: FAILED! => {"changed": false, "msg": "Unable to start service network: Job for network.service failed because the control process exited with error code.\nSee \"systemctl status network.service\" and \"journalctl -xe\" for details.\n"}

In detail (controller-0):

controller-0 systemd[1]: Starting LSB: Bring up/down networking...
controller-0 network[1437]: WARN      : [network] You are using 'network' service provided by 'network-scripts', which are now deprecated.
controller-0 network[1437]: WARN      : [network] 'network-scripts' will be removed in one of the next major releases of RHEL.
controller-0 network[1437]: WARN      : [network] It is advised to switch to 'NetworkManager' instead for network management.
controller-0 network[1437]: Bringing up loopback interface:  [  OK  ]
controller-0 network[1437]: Bringing up interface ens3:  [  OK  ]
controller-0 network[1437]: Bringing up interface eth0:  Error: Connection activation failed: No suitable device found for this connection (device ens3 not available because profile is not compatible with device (mismatching interface name)).
controller-0 network[1437]: [FAILED]
controller-0 systemd[1]: network.service: Control process exited, code=exited status=1
controller-0 systemd[1]: network.service: Failed with result 'exit-code'.
controller-0 systemd[1]: Failed to start LSB: Bring up/down networking.
controller-0 systemd[1]: Starting LSB: Bring up/down networking...
controller-0 network[10939]: WARN      : [network] You are using 'network' service provided by 'network-scripts', which are now deprecated.
controller-0 network[10939]: WARN      : [network] 'network-scripts' will be removed in one of the next major releases of RHEL.
controller-0 network[10939]: WARN      : [network] It is advised to switch to 'NetworkManager' instead for network management.
controller-0 network[10939]: Bringing up loopback interface:  [  OK  ]
controller-0 network[10939]: Bringing up interface br-ex:  [  OK  ]
controller-0 network[10939]: Bringing up interface br-isolated:  [  OK  ]
controller-0 network[10939]: Bringing up interface ens3:  RTNETLINK answers: File exists
controller-0 network[10939]: RTNETLINK answers: File exists
controller-0 network[10939]: [  OK  ]
controller-0 ovs-vsctl[11317]: ovs|00001|vsctl|INFO|Called as ovs-vsctl -t 10 -- --if-exists del-port br-isolated ens4 -- add-port br-isolated ens4
controller-0 network[10939]: Bringing up interface ens4:  [  OK  ]
controller-0 ovs-vsctl[11413]: ovs|00001|vsctl|INFO|Called as ovs-vsctl -t 10 -- --if-exists del-port br-ex ens5 -- add-port br-ex ens5
controller-0 network[10939]: Bringing up interface ens5:  [  OK  ]
controller-0 network[10939]: Bringing up interface eth0:  Error: Connection activation failed: No suitable device found for this connection (device lo not available because device is strictly unmanaged).
controller-0 network[10939]: [FAILED]
controller-0 ovs-vsctl[11497]: ovs|00001|vsctl|INFO|Called as ovs-vsctl -t 10 -- --if-exists del-port br-isolated vlan20 -- add-port br-isolated vlan20 tag=20 -- set Interface vlan20 type=internal
controller-0 network[10939]: Bringing up interface vlan20:  [  OK  ]
controller-0 ovs-vsctl[11587]: ovs|00001|vsctl|INFO|Called as ovs-vsctl -t 10 -- --if-exists del-port br-isolated vlan30 -- add-port br-isolated vlan30 tag=30 -- set Interface vlan30 type=internal
controller-0 network[10939]: Bringing up interface vlan30:  [  OK  ]
controller-0 ovs-vsctl[11677]: ovs|00001|vsctl|INFO|Called as ovs-vsctl -t 10 -- --if-exists del-port br-isolated vlan40 -- add-port br-isolated vlan40 tag=40 -- set Interface vlan40 type=internal
controller-0 network[10939]: Bringing up interface vlan40:  [  OK  ]
controller-0 ovs-vsctl[11767]: ovs|00001|vsctl|INFO|Called as ovs-vsctl -t 10 -- --if-exists del-port br-isolated vlan50 -- add-port br-isolated vlan50 tag=50 -- set Interface vlan50 type=internal
controller-0 network[10939]: Bringing up interface vlan50:  [  OK  ]
controller-0 network[10939]: RTNETLINK answers: File exists

controller-0 network[10939]: RTNETLINK answers: File exists
controller-0 systemd[1]: network.service: Control process exited, code=exited status=1
controller-0 systemd[1]: network.service: Failed with result 'exit-code'.
controller-0 systemd[1]: Failed to start LSB: Bring up/down networking.

Version-Release number of selected component (if applicable):

OSP16.1, RHOS-16.1-RHEL-8-20210726.n.2

How reproducible:
100%

Additional info:
rhosp-director-images.noarch 16.1-20210726.2.el8ost

Comment 8 Jad Haj Yahya 2021-07-28 13:46:27 UTC
OC deployment succeeded with subsequent puddle RHOS-16.1-RHEL-8-20210727.n.1

Comment 9 Filip Hubík 2021-07-29 14:04:09 UTC
>OC deployment succeeded with subsequent puddle RHOS-16.1-RHEL-8-20210727.n.1

I am not sure the sole event of CI passing with new content is enough to verify this fix as other changes might be pulled on different layers, but if https://review.opendev.org/c/openstack/tripleo-image-elements/+/791786/1/elements/interface-names/install.d/71-clean-stale-interface is incorporated into rhosp-images* build process now, then I guess yes, as I myself can confirm only by manual removal of /etc/sysconfig/network-scripts/ifcfg-eth0 prior to OC deployment that I am able to reach further OC stages...

Comment 21 errata-xmlrpc 2021-12-09 20:20:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3762