Bug 1774581 - OSP16 update fails to get br-int and br-ctlplane back after undercloud update.
Summary: OSP16 update fails to get br-int and br-ctlplane back after undercloud update.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: beta
: 16.0 (Train on RHEL 8.1)
Assignee: Cédric Jeanneret
QA Contact: Sasha Smolyak
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-20 13:43 UTC by Sofer Athlan-Guyot
Modified: 2020-02-06 14:43 UTC (History)
10 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.3.1-0.20191126041653.414d4d9.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-06 14:42:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 695650 0 'None' MERGED Ensure "network" service is enabled 2020-02-05 12:21:10 UTC
OpenStack gerrit 695869 0 'None' MERGED Ensure "network" service is enabled 2020-02-05 12:21:10 UTC
Red Hat Product Errata RHEA-2020:0283 0 None None None 2020-02-06 14:43:22 UTC

Description Sofer Athlan-Guyot 2019-11-20 13:43:51 UTC
Description of problem:

Running a test OSP16 update from passed_phase1 to passed_phase1 I get an issue after undercloud update.  The br-int and br-ctlplane are down causing the rest of the test to fail.


5: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 2e:20:7e:5d:f0:d9 brd ff:ff:ff:ff:ff:ff
6: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 66:c6:4c:c7:63:40 brd ff:ff:ff:ff:ff:ff
8: br-ctlplane: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 52:54:00:4d:6f:bd brd ff:ff:ff:ff:ff:ff

The undercloud logs can be seen there http://staging-jenkins2-qe-playground.usersys.redhat.com/job/DFG-upgrades-updates-16-from-passed_phase1-HA-ipv4/20/artifact/undercloud-0.tar.gz for instance, and the jobs are there:

http://staging-jenkins2-qe-playground.usersys.redhat.com/job/DFG-upgrades-updates-16-from-passed_phase1-HA-ipv4/


Version-Release number of selected component (if applicable): See associated jenkins job for exact puddle used.


How reproducible: Seems to happen each time.

Comment 2 Sofer Athlan-Guyot 2019-11-21 14:31:18 UTC
Look like the side car container fails to start or restart (we have a reboot at the end  of the undercloud procedure)

[stack@undercloud-0 ~]$ sudo podman ps --all | grep -E 'neutron-dnsmasq-qdhcp-ff859b54-ef63-43a3-b104-dd035fc08d11|ironic_neutron_agent|nova_conductor'
e775b23d63e1  undercloud-0.ctlplane.localdomain:8787/rh-osbs/rhosp16-openstack-neutron-dhcp-agent:20191115.1         dumb-init --singl...  About an hour ago  Exited (0) About an hour ago         neutron-dnsmasq-qdhcp-ff859b54-ef63-43a3-b104-dd035fc08d11
4aa767445381  undercloud-0.ctlplane.localdomain:8787/rh-osbs/rhosp16-openstack-nova-conductor:20191115.1             dumb-init --singl...  About an hour ago  Exited (0) About an hour ago         nova_conductor_init_log
fb4ca7e20fa4  undercloud-0.ctlplane.localdomain:8787/rh-osbs/rhosp16-openstack-ironic-neutron-agent:20191115.1       dumb-init --singl...  5 hours ago        Up 5 seconds ago                     ironic_neutron_agent
829f6f3c1964  undercloud-0.ctlplane.localdomain:8787/rh-osbs/rhosp16-openstack-nova-conductor:20191115.1             dumb-init --singl...  5 hours ago        Up 5 seconds ago                     nova_conductor

I'm starting to think that the container doesn't survive the reboot.

Comment 4 Brent Eagles 2019-11-21 16:45:32 UTC
This looks like a repeat or variation of https://bugzilla.redhat.com/show_bug.cgi?id=1666387.

Comment 5 Sofer Athlan-Guyot 2019-11-22 09:27:36 UTC
So I had confirmation that the problem is that networking doesn't come back after reboot of the undercloud, given the resolution of the previous instance of this bug I'm passing it on to DFG:DF.  It currently prevent OSP16 update testing.  Adding blocker flag as well.

Comment 6 Cédric Jeanneret 2019-11-22 09:33:45 UTC
Wondering if this might be linked to the "new" way to start/manage sidecars using systemd, introduced by Alex. Adding him as needinfo() - not 100% sure if it made to osp-16 yet.

Comment 7 Cédric Jeanneret 2019-11-22 09:37:43 UTC
duh, nah, not related to the sidecars - sorry, missread the whole thing.

Would be interesting to know if the "network" service is present and enabled. The patches linked in Brent BZ seem to point to that.

Comment 8 Cédric Jeanneret 2019-11-22 10:07:16 UTC
Could have a look at a live env - while "network" service is present, it's not enabled.

Comment 11 Alex Schultz 2019-11-22 14:23:01 UTC
This is likely because the networking service is not enabled for reboot like we had to do for the overcloud-full images.

Comment 12 Cédric Jeanneret 2019-11-25 11:43:37 UTC
Adding upstream backport.

Comment 13 Emilien Macchi 2019-11-26 11:56:11 UTC
Moving this one to POST, both master & stable/train backports are merged now.

Comment 21 errata-xmlrpc 2020-02-06 14:42:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:0283


Note You need to log in before you can comment on or make changes to this bug.