Bug 1707268 - Failed to configure interfaces (ovs is not running)
Summary: Failed to configure interfaces (ovs is not running)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-openvswitch
Version: 15.0 (Stein)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: beta
: 15.0 (Stein)
Assignee: Lon Hohberger
QA Contact: RHOS Maint
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-07 07:40 UTC by Attila Fazekas
Modified: 2021-02-08 20:19 UTC (History)
13 users (show)

Fixed In Version: rhosp-openvswitch-2.11-0.3.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-21 11:21:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2019:2811 0 None None None 2019-09-21 11:21:56 UTC

Description Attila Fazekas 2019-05-07 07:40:15 UTC
Description of problem:
undercloud install fails with :

"[2019-05-06 18:53:00,103] (heat-config) [DEBUG] b'[2019-05-06 18:52:58,769] (heat-config) [INFO] disable_configure_safe_defaults=True\\n[2019-05-06 18:52:58,769] (heat-config) [INFO] bridge_name=br-ex\\n[2019-05-06 18:52:58,769] (heat-config) [INFO] interface_name=eth0\\n[2019-05-06 18:52:58,769] (heat-config) [INFO] deploy_server_id=3347bb35-69e8-4bdb-937c-ed398ad10dcb\\n[2019-05-06 18:52:58,769] (heat-config) [INFO] deploy_action=CREATE\\n[2019-05-06 18:52:58,769] (heat-config) [INFO] deploy_stack_id=undercloud-Undercloud-7ri43isnhqer-0-tontryhweod3-NetworkDeployment-us634aqrfdho-TripleOSoftwareDeployment-pyyrni7hf6iz/c7f8b98f-bae0-48dd-b9ad-e40871b8ff84\\n[2019-05-06 18:52:58,769] (heat-config) [INFO] deploy_resource_name=TripleOSoftwareDeployment\\n[2019-05-06 18:52:58,769] (heat-config) [INFO] deploy_signal_transport=NO_SIGNAL\\n[2019-05-06 18:52:58,769] (heat-config) [DEBUG] Running /var/lib/heat-config/heat-config-script/350ce41a-ee83-4e35-a0f7-f7288c25eba6\\n[2019-05-06 18:53:00,089] (heat-config) [INFO] b\\'\\'\\n[2019-05-06 18:53:00,089] (heat-config) [DEBUG] b\\'+ \\\\\\'[\\\\\\' -n \\\\\\'{\"network_config\": [{\"addresses\": [{\"ip_netmask\": \"192.168.24.1/24\"}], \"dns_servers\": [], \"members\": [{\"mtu\": 1500, \"name\": \"interface_name\", \"primary\": true, \"type\": \"interface\"}], \"name\": \"br-ctlplane\", \"ovs_extra\": [\"br-set-external-id br-ctlplane bridge-id br-ctlplane\"], \"routes\": [], \"type\": \"ovs_bridge\", \"use_dhcp\": false}]}\\\\\\' \\\\\\']\\\\\\'\\\\n+ \\\\\\'[\\\\\\' -z True \\\\\\']\\\\\\'\\\\n++ date +%Y-%m-%dT%H:%M:%S\\\\n+ DATETIME=2019-05-06T18:52:58\\\\n+ \\\\\\'[\\\\\\' -f /etc/os-net-config/config.json \\\\\\']\\\\\\'\\\\n+ mkdir -p /etc/os-net-config\\\\n+ echo \\\\\\'{\"network_config\": [{\"addresses\": [{\"ip_netmask\": \"192.168.24.1/24\"}], \"dns_servers\": [], \"members\": [{\"mtu\": 1500, \"name\": \"interface_name\", \"primary\": true, \"type\": \"interface\"}], \"name\": \"br-ctlplane\", \"ovs_extra\": [\"br-set-external-id br-ctlplane bridge-id br-ctlplane\"], \"routes\": [], \"type\": \"ovs_bridge\", \"use_dhcp\": false}]}\\\\\\'\\\\n++ type -t network_config_hook\\\\n+ \\\\\\'[\\\\\\' \\\\\\'\\\\\\' = function \\\\\\']\\\\\\'\\\\n+ sed -i s/bridge_name/br-ex/ /etc/os-net-config/config.json\\\\n+ sed -i s/interface_name/eth0/ /etc/os-net-config/config.json\\\\n+ set +e\\\\n+ os-net-config -c /etc/os-net-config/config.json -v --detailed-exit-codes\\\\n[2019/05/06 06:52:59 PM] [INFO] Using config file at: /etc/os-net-config/config.json\\\\n[2019/05/06 06:52:59 PM] [INFO] Ifcfg net config provider created.\\\\n[2019/05/06 06:52:59 PM] [INFO] Not using any mapping file.\\\\n[2019/05/06 06:52:59 PM] [INFO] Finding active nics\\\\n[2019/05/06 06:52:59 PM] [INFO] eth1 is an embedded active nic\\\\n[2019/05/06 06:52:59 PM] [INFO] eth2 is an embedded active nic\\\\n[2019/05/06 06:52:59 PM] [INFO] eth0 is an embedded active nic\\\\n[2019/05/06 06:52:59 PM] [INFO] lo is not an active nic\\\\n[2019/05/06 06:52:59 PM] [INFO] No DPDK mapping available in path (/var/lib/os-net-config/dpdk_mapping.yaml)\\\\n[2019/05/06 06:52:59 PM] [INFO] Active nics are [\\\\\\'eth0\\\\\\', \\\\\\'eth1\\\\\\', \\\\\\'eth2\\\\\\']\\\\n[2019/05/06 06:52:59 PM] [INFO] nic1 mapped to: eth0\\\\n[2019/05/06 06:52:59 PM] [INFO] nic2 mapped to: eth1\\\\n[2019/05/06 06:52:59 PM] [INFO] nic3 mapped to: eth2\\\\n[2019/05/06 06:52:59 PM] [INFO] adding bridge: br-ctlplane\\\\n[2019/05/06 06:52:59 PM] [INFO] adding interface: eth0\\\\n[2019/05/06 06:52:59 PM] [INFO] applying network configs...\\\\n[2019/05/06 06:52:59 PM] [INFO] running ifdown on interface: eth0\\\\n[2019/05/06 06:52:59 PM] [INFO] running ifdown on bridge: br-ctlplane\\\\n[2019/05/06 06:52:59 PM] [INFO] Writing config /etc/sysconfig/network-scripts/ifcfg-eth0\\\\n[2019/05/06 06:52:59 PM] [INFO] Writing config /etc/sysconfig/network-scripts/ifcfg-br-ctlplane\\\\n[2019/05/06 06:52:59 PM] [INFO] running ifup on bridge: br-ctlplane\\\\n[2019/05/06 06:52:59 PM] [INFO] running ifup on interface: eth0\\\\n[2019/05/06 06:53:00 PM] [ERROR] Failure(s) occurred when applying configuration\\\\n[2019/05/06 06:53:00 PM] [ERROR] stdout: WARN      : [ifup] You are using \\\\\\'ifup\\\\\\' script provided by \\\\\\'network-scripts\\\\\\', which are now deprecated.\\\\nWARN      : [ifup] \\\\\\'network-scripts\\\\\\' will be removed in one of the next major releases of RHEL.\\\\nWARN      : [ifup] It is advised to switch to \\\\\\'NetworkManager\\\\\\' instead - it provides \\\\\\'ifup/ifdown\\\\\\' scripts as well.\\\\nERROR     : [/etc/sysconfig/network-scripts/ifup-eth] Device br-ctlplane does not seem to be present, delaying initialization.\\\\n, stderr: \\\\nTraceback (most recent call last):\\\\n  File \"/bin/os-net-config\", line 10, in <module>\\\\n    sys.exit(main())\\\\n  File \"/usr/lib/python3.6/site-packages/os_net_config/cli.py\", line 309, in main\\\\n    activate=not opts.no_activate)\\\\n  File \"/usr/lib/python3.6/site-packages/os_net_config/impl_ifcfg.py\", line 1704, in apply\\\\n    raise os_net_config.ConfigurationError(message)\\\\nos_net_config.ConfigurationError: Failure(s) occurred when applying configuration\\\\n+ RETVAL=1\\\\n+ set -e\\\\n+ [[ 1 == 2 ]]\\\\n+ [[ 1 != 0 ]]\\\\n+ echo \\\\\\'ERROR: os-net-config configuration failed.\\\\\\'\\\\nERROR: os-net-config configuration failed.\\\\n+ exit 1\\\\n\\'\\n[2019-05-06 18:53:00,089] (heat-config) [ERROR] Error running /var/lib/heat-config/heat-config-script/350ce41a-ee83-4e35-a0f7-f7288c25eba6. [1]\\n\\n'"


Version-Release number of selected component (if applicable):
openstack-tripleo-common.noarch               10.7.1-0.20190504090416.d27b186.el8ost               @rhelosp-15.0-trunk      
openstack-tripleo-common-containers.noarch    10.7.1-0.20190504090416.d27b186.el8ost               @rhelosp-15.0-trunk      
openstack-tripleo-heat-templates.noarch       10.5.1-0.20190506170359.f08bfef.el8ost               @rhelosp-15.0-trunk      
openstack-tripleo-image-elements.noarch       10.4.1-0.20190426080346.7efbd4c.el8ost               @rhelosp-15.0-trunk      
openstack-tripleo-puppet-elements.noarch      10.3.1-0.20190426070355.a359301.el8ost               @rhelosp-15.0-trunk      
openstack-tripleo-validations.noarch          10.4.1-0.20190505180357.9a2732d.el8ost               @rhelosp-15.0-trunk   
puppet-tripleo.noarch                         10.4.2-0.20190502220347.02cd12e.el8ost               @rhelosp-15.0-trunk      
os-net-config.noarch                          10.4.1-0.20190423124148.f73fdac.el8ost               @rhelosp-15.0-trunk  

Snapshot name: RHOS_TRUNK-15.0-RHEL-8-20190506.n.1
    

How reproducible:
always

Steps to Reproduce:
1. openstack undercloud install 


Additional info:
openvswitch was not running.

Comment 3 Filip Hubík 2019-05-07 09:37:58 UTC
Might be, I am not sure myself about the ovs service state since I do not have live deployment available, but to speed things up ftr note that it looks like following failed:

# UC /var/lib/heat-config/heat-config-script/350ce41a-ee83-4e35-a0f7-f7288c25eba6
...
set +e
    os-net-config -c /etc/os-net-config/config.json -v --detailed-exit-codes
    RETVAL=$?
    set -e

    if [[ $RETVAL == 2 ]]; then
        ping_metadata_ip

        #NOTE: dprince this udev rule can apparently leak DHCP processes?
        # https://bugs.launchpad.net/tripleo/+bug/1538259
        # until we discover the root cause we can simply disable the
        # rule because networking has already been configured at this point
        if [ -f /etc/udev/rules.d/99-dhcp-all-interfaces.rules ]; then
            rm /etc/udev/rules.d/99-dhcp-all-interfaces.rules
        fi

    elif [[ $RETVAL != 0 ]]; then
>>      echo "ERROR: os-net-config configuration failed." >&2
        exit 1
    fi
...

UC $ cat /etc/os-net-config/config.json
{"network_config": [{"addresses": [{"ip_netmask": "192.168.24.1/24"}], "dns_servers": [], "members": [{"mtu": 1500, "name": "eth0", "primary": true, "type": "interface"}], "name": "br-ctlplane", "ovs_extra": ["br-set-external-id br-ctlplane bridge-id br-ctlplane"], "routes": [], "type": "ovs_bridge", "use_dhcp": false}]}

Comment 4 Alex Schultz 2019-05-07 15:26:19 UTC
I've noticed that openvswitch still doesn't start on reboot still as well.

Comment 5 Nate Johnston 2019-05-07 17:48:28 UTC
Are there any available logs from the openvswitch process that will indicate why it failed to start?

Comment 6 Bob Fournier 2019-05-07 18:18:31 UTC
Wonder if this is the issue with the network service not starting after reboot - https://bugzilla.redhat.com/show_bug.cgi?id=1702685

Comment 7 Bob Fournier 2019-05-07 18:32:03 UTC
Also similar to undercloud reboot issue https://bugzilla.redhat.com/show_bug.cgi?id=1701866 which has the same fixes as https://bugzilla.redhat.com/show_bug.cgi?id=1702685.

From the list services in Comment 1 we can see:
network.service                                                             loaded    inactive dead      LSB: Bring up/down networking 

See Emilien's comment here: https://bugzilla.redhat.com/show_bug.cgi?id=1701866#c7
I came to the conclusion that the network service needs to be enabled everywhere until we get os-net-config using NetworkManager, otherwise openvswitch-managed interface won't be started after a reboot.

Can you please test with https://review.opendev.org/#/c/656183/ ?

Comment 8 Bob Fournier 2019-05-07 20:03:36 UTC
Seems to be an issue with start/restart of network service, not an os-net-config issue. We need to figure out if this fix https://review.opendev.org/#/c/656183/ for https://bugzilla.redhat.com/show_bug.cgi?id=1702685 is included. If its included lets revert it, as this may be the source of the problem since this failure just started occurring.

Comment 9 Alex Schultz 2019-05-07 20:42:18 UTC
So I ran into a bug when we had state: started because it was already started (and errored).  We'd have to back out that patch and the one before it as well

Comment 10 Alex Schultz 2019-05-07 21:57:03 UTC
I did a diff between a previous run and the new one and we're missing network-scripts-openvswitch2.11 which is likely why br-ctlplane never starts even though os-net-config creates the ifcfg-br-ctlplane file.

Comment 11 Bob Fournier 2019-05-07 22:18:51 UTC
Thanks for finding this Alex!

Its here in this compose from a a couple weeks ago:
http://download.lab.bos.redhat.com/rcm-guest/puddles/OpenStack/15.0-RHEL-8/RHOS_TRUNK-15.0-RHEL-8-20190423.n.1/compose/metadata/rpms.json

"openvswitch2.11-0:2.11.0-0.20190129gitd3a10db.el8fdb.src": {
"network-scripts-openvswitch2.11-0:2.11.0-0.20190129gitd3a10db.el8fdb.ppc64le": {
"category": "binary",
"path": "OpenStack/ppc64le/os/Packages/network-scripts-openvswitch2.11-2.11.0-0.20190129gitd3a10db.el8fdb.ppc64le.rpm",
"sigkey": null
},

However we don't see openvswitchXXX or network-scripts-openvswitchXXX in the compose from 5/3 or 5/6.

Comment 12 Bob Fournier 2019-05-07 22:20:59 UTC
Changing DFG owner as it looks like a packaging issue.

Comment 14 Lon Hohberger 2019-05-08 20:17:04 UTC
The packaging issue is resolved, however, the package - network-scripts-openvswitch2.11 is not in current Fast Datapath builds, so we can't resolve this completely until then.

For OSP16, we should try to stop using network-scripts.

Comment 25 errata-xmlrpc 2019-09-21 11:21:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811


Note You need to log in before you can comment on or make changes to this bug.