Description of problem: undercloud install fails with : "[2019-05-06 18:53:00,103] (heat-config) [DEBUG] b'[2019-05-06 18:52:58,769] (heat-config) [INFO] disable_configure_safe_defaults=True\\n[2019-05-06 18:52:58,769] (heat-config) [INFO] bridge_name=br-ex\\n[2019-05-06 18:52:58,769] (heat-config) [INFO] interface_name=eth0\\n[2019-05-06 18:52:58,769] (heat-config) [INFO] deploy_server_id=3347bb35-69e8-4bdb-937c-ed398ad10dcb\\n[2019-05-06 18:52:58,769] (heat-config) [INFO] deploy_action=CREATE\\n[2019-05-06 18:52:58,769] (heat-config) [INFO] deploy_stack_id=undercloud-Undercloud-7ri43isnhqer-0-tontryhweod3-NetworkDeployment-us634aqrfdho-TripleOSoftwareDeployment-pyyrni7hf6iz/c7f8b98f-bae0-48dd-b9ad-e40871b8ff84\\n[2019-05-06 18:52:58,769] (heat-config) [INFO] deploy_resource_name=TripleOSoftwareDeployment\\n[2019-05-06 18:52:58,769] (heat-config) [INFO] deploy_signal_transport=NO_SIGNAL\\n[2019-05-06 18:52:58,769] (heat-config) [DEBUG] Running /var/lib/heat-config/heat-config-script/350ce41a-ee83-4e35-a0f7-f7288c25eba6\\n[2019-05-06 18:53:00,089] (heat-config) [INFO] b\\'\\'\\n[2019-05-06 18:53:00,089] (heat-config) [DEBUG] b\\'+ \\\\\\'[\\\\\\' -n \\\\\\'{\"network_config\": [{\"addresses\": [{\"ip_netmask\": \"192.168.24.1/24\"}], \"dns_servers\": [], \"members\": [{\"mtu\": 1500, \"name\": \"interface_name\", \"primary\": true, \"type\": \"interface\"}], \"name\": \"br-ctlplane\", \"ovs_extra\": [\"br-set-external-id br-ctlplane bridge-id br-ctlplane\"], \"routes\": [], \"type\": \"ovs_bridge\", \"use_dhcp\": false}]}\\\\\\' \\\\\\']\\\\\\'\\\\n+ \\\\\\'[\\\\\\' -z True \\\\\\']\\\\\\'\\\\n++ date +%Y-%m-%dT%H:%M:%S\\\\n+ DATETIME=2019-05-06T18:52:58\\\\n+ \\\\\\'[\\\\\\' -f /etc/os-net-config/config.json \\\\\\']\\\\\\'\\\\n+ mkdir -p /etc/os-net-config\\\\n+ echo \\\\\\'{\"network_config\": [{\"addresses\": [{\"ip_netmask\": \"192.168.24.1/24\"}], \"dns_servers\": [], \"members\": [{\"mtu\": 1500, \"name\": \"interface_name\", \"primary\": true, \"type\": \"interface\"}], \"name\": \"br-ctlplane\", \"ovs_extra\": [\"br-set-external-id br-ctlplane bridge-id br-ctlplane\"], \"routes\": [], \"type\": \"ovs_bridge\", \"use_dhcp\": false}]}\\\\\\'\\\\n++ type -t network_config_hook\\\\n+ \\\\\\'[\\\\\\' \\\\\\'\\\\\\' = function \\\\\\']\\\\\\'\\\\n+ sed -i s/bridge_name/br-ex/ /etc/os-net-config/config.json\\\\n+ sed -i s/interface_name/eth0/ /etc/os-net-config/config.json\\\\n+ set +e\\\\n+ os-net-config -c /etc/os-net-config/config.json -v --detailed-exit-codes\\\\n[2019/05/06 06:52:59 PM] [INFO] Using config file at: /etc/os-net-config/config.json\\\\n[2019/05/06 06:52:59 PM] [INFO] Ifcfg net config provider created.\\\\n[2019/05/06 06:52:59 PM] [INFO] Not using any mapping file.\\\\n[2019/05/06 06:52:59 PM] [INFO] Finding active nics\\\\n[2019/05/06 06:52:59 PM] [INFO] eth1 is an embedded active nic\\\\n[2019/05/06 06:52:59 PM] [INFO] eth2 is an embedded active nic\\\\n[2019/05/06 06:52:59 PM] [INFO] eth0 is an embedded active nic\\\\n[2019/05/06 06:52:59 PM] [INFO] lo is not an active nic\\\\n[2019/05/06 06:52:59 PM] [INFO] No DPDK mapping available in path (/var/lib/os-net-config/dpdk_mapping.yaml)\\\\n[2019/05/06 06:52:59 PM] [INFO] Active nics are [\\\\\\'eth0\\\\\\', \\\\\\'eth1\\\\\\', \\\\\\'eth2\\\\\\']\\\\n[2019/05/06 06:52:59 PM] [INFO] nic1 mapped to: eth0\\\\n[2019/05/06 06:52:59 PM] [INFO] nic2 mapped to: eth1\\\\n[2019/05/06 06:52:59 PM] [INFO] nic3 mapped to: eth2\\\\n[2019/05/06 06:52:59 PM] [INFO] adding bridge: br-ctlplane\\\\n[2019/05/06 06:52:59 PM] [INFO] adding interface: eth0\\\\n[2019/05/06 06:52:59 PM] [INFO] applying network configs...\\\\n[2019/05/06 06:52:59 PM] [INFO] running ifdown on interface: eth0\\\\n[2019/05/06 06:52:59 PM] [INFO] running ifdown on bridge: br-ctlplane\\\\n[2019/05/06 06:52:59 PM] [INFO] Writing config /etc/sysconfig/network-scripts/ifcfg-eth0\\\\n[2019/05/06 06:52:59 PM] [INFO] Writing config /etc/sysconfig/network-scripts/ifcfg-br-ctlplane\\\\n[2019/05/06 06:52:59 PM] [INFO] running ifup on bridge: br-ctlplane\\\\n[2019/05/06 06:52:59 PM] [INFO] running ifup on interface: eth0\\\\n[2019/05/06 06:53:00 PM] [ERROR] Failure(s) occurred when applying configuration\\\\n[2019/05/06 06:53:00 PM] [ERROR] stdout: WARN : [ifup] You are using \\\\\\'ifup\\\\\\' script provided by \\\\\\'network-scripts\\\\\\', which are now deprecated.\\\\nWARN : [ifup] \\\\\\'network-scripts\\\\\\' will be removed in one of the next major releases of RHEL.\\\\nWARN : [ifup] It is advised to switch to \\\\\\'NetworkManager\\\\\\' instead - it provides \\\\\\'ifup/ifdown\\\\\\' scripts as well.\\\\nERROR : [/etc/sysconfig/network-scripts/ifup-eth] Device br-ctlplane does not seem to be present, delaying initialization.\\\\n, stderr: \\\\nTraceback (most recent call last):\\\\n File \"/bin/os-net-config\", line 10, in <module>\\\\n sys.exit(main())\\\\n File \"/usr/lib/python3.6/site-packages/os_net_config/cli.py\", line 309, in main\\\\n activate=not opts.no_activate)\\\\n File \"/usr/lib/python3.6/site-packages/os_net_config/impl_ifcfg.py\", line 1704, in apply\\\\n raise os_net_config.ConfigurationError(message)\\\\nos_net_config.ConfigurationError: Failure(s) occurred when applying configuration\\\\n+ RETVAL=1\\\\n+ set -e\\\\n+ [[ 1 == 2 ]]\\\\n+ [[ 1 != 0 ]]\\\\n+ echo \\\\\\'ERROR: os-net-config configuration failed.\\\\\\'\\\\nERROR: os-net-config configuration failed.\\\\n+ exit 1\\\\n\\'\\n[2019-05-06 18:53:00,089] (heat-config) [ERROR] Error running /var/lib/heat-config/heat-config-script/350ce41a-ee83-4e35-a0f7-f7288c25eba6. [1]\\n\\n'" Version-Release number of selected component (if applicable): openstack-tripleo-common.noarch 10.7.1-0.20190504090416.d27b186.el8ost @rhelosp-15.0-trunk openstack-tripleo-common-containers.noarch 10.7.1-0.20190504090416.d27b186.el8ost @rhelosp-15.0-trunk openstack-tripleo-heat-templates.noarch 10.5.1-0.20190506170359.f08bfef.el8ost @rhelosp-15.0-trunk openstack-tripleo-image-elements.noarch 10.4.1-0.20190426080346.7efbd4c.el8ost @rhelosp-15.0-trunk openstack-tripleo-puppet-elements.noarch 10.3.1-0.20190426070355.a359301.el8ost @rhelosp-15.0-trunk openstack-tripleo-validations.noarch 10.4.1-0.20190505180357.9a2732d.el8ost @rhelosp-15.0-trunk puppet-tripleo.noarch 10.4.2-0.20190502220347.02cd12e.el8ost @rhelosp-15.0-trunk os-net-config.noarch 10.4.1-0.20190423124148.f73fdac.el8ost @rhelosp-15.0-trunk Snapshot name: RHOS_TRUNK-15.0-RHEL-8-20190506.n.1 How reproducible: always Steps to Reproduce: 1. openstack undercloud install Additional info: openvswitch was not running.
Might be, I am not sure myself about the ovs service state since I do not have live deployment available, but to speed things up ftr note that it looks like following failed: # UC /var/lib/heat-config/heat-config-script/350ce41a-ee83-4e35-a0f7-f7288c25eba6 ... set +e os-net-config -c /etc/os-net-config/config.json -v --detailed-exit-codes RETVAL=$? set -e if [[ $RETVAL == 2 ]]; then ping_metadata_ip #NOTE: dprince this udev rule can apparently leak DHCP processes? # https://bugs.launchpad.net/tripleo/+bug/1538259 # until we discover the root cause we can simply disable the # rule because networking has already been configured at this point if [ -f /etc/udev/rules.d/99-dhcp-all-interfaces.rules ]; then rm /etc/udev/rules.d/99-dhcp-all-interfaces.rules fi elif [[ $RETVAL != 0 ]]; then >> echo "ERROR: os-net-config configuration failed." >&2 exit 1 fi ... UC $ cat /etc/os-net-config/config.json {"network_config": [{"addresses": [{"ip_netmask": "192.168.24.1/24"}], "dns_servers": [], "members": [{"mtu": 1500, "name": "eth0", "primary": true, "type": "interface"}], "name": "br-ctlplane", "ovs_extra": ["br-set-external-id br-ctlplane bridge-id br-ctlplane"], "routes": [], "type": "ovs_bridge", "use_dhcp": false}]}
I've noticed that openvswitch still doesn't start on reboot still as well.
Are there any available logs from the openvswitch process that will indicate why it failed to start?
Wonder if this is the issue with the network service not starting after reboot - https://bugzilla.redhat.com/show_bug.cgi?id=1702685
Also similar to undercloud reboot issue https://bugzilla.redhat.com/show_bug.cgi?id=1701866 which has the same fixes as https://bugzilla.redhat.com/show_bug.cgi?id=1702685. From the list services in Comment 1 we can see: network.service loaded inactive dead LSB: Bring up/down networking See Emilien's comment here: https://bugzilla.redhat.com/show_bug.cgi?id=1701866#c7 I came to the conclusion that the network service needs to be enabled everywhere until we get os-net-config using NetworkManager, otherwise openvswitch-managed interface won't be started after a reboot. Can you please test with https://review.opendev.org/#/c/656183/ ?
Seems to be an issue with start/restart of network service, not an os-net-config issue. We need to figure out if this fix https://review.opendev.org/#/c/656183/ for https://bugzilla.redhat.com/show_bug.cgi?id=1702685 is included. If its included lets revert it, as this may be the source of the problem since this failure just started occurring.
So I ran into a bug when we had state: started because it was already started (and errored). We'd have to back out that patch and the one before it as well
I did a diff between a previous run and the new one and we're missing network-scripts-openvswitch2.11 which is likely why br-ctlplane never starts even though os-net-config creates the ifcfg-br-ctlplane file.
Thanks for finding this Alex! Its here in this compose from a a couple weeks ago: http://download.lab.bos.redhat.com/rcm-guest/puddles/OpenStack/15.0-RHEL-8/RHOS_TRUNK-15.0-RHEL-8-20190423.n.1/compose/metadata/rpms.json "openvswitch2.11-0:2.11.0-0.20190129gitd3a10db.el8fdb.src": { "network-scripts-openvswitch2.11-0:2.11.0-0.20190129gitd3a10db.el8fdb.ppc64le": { "category": "binary", "path": "OpenStack/ppc64le/os/Packages/network-scripts-openvswitch2.11-2.11.0-0.20190129gitd3a10db.el8fdb.ppc64le.rpm", "sigkey": null }, However we don't see openvswitchXXX or network-scripts-openvswitchXXX in the compose from 5/3 or 5/6.
Changing DFG owner as it looks like a packaging issue.
The packaging issue is resolved, however, the package - network-scripts-openvswitch2.11 is not in current Fast Datapath builds, so we can't resolve this completely until then. For OSP16, we should try to stop using network-scripts.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:2811