Description of problem: Running a test OSP16 update from passed_phase1 to passed_phase1 I get an issue after undercloud update. The br-int and br-ctlplane are down causing the rest of the test to fail. 5: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 2e:20:7e:5d:f0:d9 brd ff:ff:ff:ff:ff:ff 6: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 66:c6:4c:c7:63:40 brd ff:ff:ff:ff:ff:ff 8: br-ctlplane: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 52:54:00:4d:6f:bd brd ff:ff:ff:ff:ff:ff The undercloud logs can be seen there http://staging-jenkins2-qe-playground.usersys.redhat.com/job/DFG-upgrades-updates-16-from-passed_phase1-HA-ipv4/20/artifact/undercloud-0.tar.gz for instance, and the jobs are there: http://staging-jenkins2-qe-playground.usersys.redhat.com/job/DFG-upgrades-updates-16-from-passed_phase1-HA-ipv4/ Version-Release number of selected component (if applicable): See associated jenkins job for exact puddle used. How reproducible: Seems to happen each time.
Look like the side car container fails to start or restart (we have a reboot at the end of the undercloud procedure) [stack@undercloud-0 ~]$ sudo podman ps --all | grep -E 'neutron-dnsmasq-qdhcp-ff859b54-ef63-43a3-b104-dd035fc08d11|ironic_neutron_agent|nova_conductor' e775b23d63e1 undercloud-0.ctlplane.localdomain:8787/rh-osbs/rhosp16-openstack-neutron-dhcp-agent:20191115.1 dumb-init --singl... About an hour ago Exited (0) About an hour ago neutron-dnsmasq-qdhcp-ff859b54-ef63-43a3-b104-dd035fc08d11 4aa767445381 undercloud-0.ctlplane.localdomain:8787/rh-osbs/rhosp16-openstack-nova-conductor:20191115.1 dumb-init --singl... About an hour ago Exited (0) About an hour ago nova_conductor_init_log fb4ca7e20fa4 undercloud-0.ctlplane.localdomain:8787/rh-osbs/rhosp16-openstack-ironic-neutron-agent:20191115.1 dumb-init --singl... 5 hours ago Up 5 seconds ago ironic_neutron_agent 829f6f3c1964 undercloud-0.ctlplane.localdomain:8787/rh-osbs/rhosp16-openstack-nova-conductor:20191115.1 dumb-init --singl... 5 hours ago Up 5 seconds ago nova_conductor I'm starting to think that the container doesn't survive the reboot.
This looks like a repeat or variation of https://bugzilla.redhat.com/show_bug.cgi?id=1666387.
So I had confirmation that the problem is that networking doesn't come back after reboot of the undercloud, given the resolution of the previous instance of this bug I'm passing it on to DFG:DF. It currently prevent OSP16 update testing. Adding blocker flag as well.
Wondering if this might be linked to the "new" way to start/manage sidecars using systemd, introduced by Alex. Adding him as needinfo() - not 100% sure if it made to osp-16 yet.
duh, nah, not related to the sidecars - sorry, missread the whole thing. Would be interesting to know if the "network" service is present and enabled. The patches linked in Brent BZ seem to point to that.
Could have a look at a live env - while "network" service is present, it's not enabled.
This is likely because the networking service is not enabled for reboot like we had to do for the overcloud-full images.
Adding upstream backport.
Moving this one to POST, both master & stable/train backports are merged now.
UC and oc update passed: http://staging-jenkins2-qe-playground.usersys.redhat.com/job/DFG-upgrades-updates-16-from-passed_phase1-HA-ipv4/
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:0283