Bug 1801763
Summary: | Network unreachable after minor update: unable to pull rhosp16-openstack-neutron-metadata-agent-ovn image | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Lon Hohberger <lhh> |
Component: | openstack-tripleo-heat-templates | Assignee: | Brent Eagles <beagles> |
Status: | CLOSED ERRATA | QA Contact: | Eduardo Olivares <eolivare> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 16.0 (Train) | CC: | amuller, apevec, bdobreli, beagles, dalvarez, ekuris, eolivare, gregraka, jlibosva, ksambor, lhh, lpeer, majopela, mbaldessari, mburns, mjozefcz, njohnston, pgrist, rsafrono, sathlang, sclewis, scohen, shrjoshi |
Target Milestone: | z1 | Keywords: | Triaged |
Target Release: | 16.0 (Train on RHEL 8.1) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openstack-tripleo-heat-templates-11.3.2-0.20200218132857.ab1079e.el8ost | Doc Type: | No Doc Update |
Doc Text: | Story Points: | --- | |
Clone Of: | 1790467 | Environment: | |
Last Closed: | 2020-03-03 09:45:05 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1790467 | ||
Bug Blocks: |
Description
Lon Hohberger
2020-02-11 15:33:20 UTC
*** Bug 1790467 has been marked as a duplicate of this bug. *** Hi, the review attached should recreate the hiera earlier in the update process. Currently this is only done during the "converge" phase of the update. Adding specific hiera regeneration very as the first thing to do during update. Patch attached need testing and master merged. Thanks. Hi, so I've tested the hiera regeneration based on https://review.opendev.org/#/c/705642/: - start with osp16 GA - regenerate all images using: sudo openstack tripleo container image prepare -e /home/stack/update-container-workaround.yaml --output-env-file /home/stack/update-container-params.yaml with a modified yum_install_buildah.yaml, so that the images are squashed: awk '{if($0 ~/format docker/){print $0 " --sqash"}else{print $0}}' \ /usr/share/ansible/roles/tripleo-modify-image/tasks/yum_install_buildah.yml > /tmp/yum_install_buildah.yml - start the update process with logs at various point in time of the wrapper scripts[1] [root@compute-0 ~]# grep agent-ovn /var/log/extra/*_update_ovn_metadata_haproxy_wrapper_in_container /var/log/extra/after_update_ovn_metadata_haproxy_wrapper_in_container: undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:20200130.1-hotfix \ /var/log/extra/before_update_ovn_metadata_haproxy_wrapper_in_container: undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:20200130.1 \ [root@compute-0 ~]# grep agent-ovn /var/log/extra/*_converge_ovn_metadata_haproxy_wrapper_in_container /var/log/extra/after_converge_ovn_metadata_haproxy_wrapper_in_container: undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:20200130.1-hotfix \ /var/log/extra/before_converge_ovn_metadata_haproxy_wrapper_in_container: undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:20200130.1-hotfix \ We can see that the wrapper as seen in the container[2] is now correctly changed after the update round, and not only after the converge. So this patch should resolve the issue. As I see it's good to go. [1] full workaround script used can be found there http://file.rdu.redhat.com/~sathlang/tripleo/osp16/workaround_new_tag_logs.yaml [2] used if sudo podman inspect ovn_metadata_agent; then sudo podman exec -ti ovn_metadata_agent cat /usr/local/bin/haproxy | sudo tee /var/log/extra/before_update_ovn_metadata_haproxy_wrapper_in_container; fi One note, I'm note able to reproduce exactly that issue, so if someone from networking QE is willing to try the patch that would be appreciated. Here is my sequence of podman container is the "standard" update testing workflow: [root@compute-0 ~]# cat /var/log/extra/podman-before-overcloud-update-container.log | grep -v Exited CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 1bd12ae362a4 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:20200130.1 /bin/bash -c HAPR... About an hour ago Up About an hour ago neutron-haproxy-ovnmeta-93cf7f45-92b2-41fd-a149-3a5864a233d1 5897024baaf2 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-compute:20200130.1 kolla_start 3 hours ago Up 3 hours ago nova_compute 868d6d72b577 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:20200130.1 kolla_start 3 hours ago Up 3 hours ago ovn_metadata_agent 020067ef909f undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ovn-controller:20200130.1 kolla_start 3 hours ago Up 3 hours ago ovn_controller 2563f91084fb undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-compute:20200130.1 kolla_start 3 hours ago Up 3 hours ago nova_migration_target a45a6937d4cc undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cron:20200130.1 kolla_start 3 hours ago Up 3 hours ago logrotate_crond b1485f601600 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-iscsid:20200130.1 kolla_start 3 hours ago Up 3 hours ago iscsid 447a4b8069f5 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-libvirt:20200130.1 kolla_start 3 hours ago Up 3 hours ago nova_libvirt fe7f04034024 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-libvirt:20200130.1 kolla_start 3 hours ago Up 3 hours ago nova_virtlogd [root@compute-0 ~]# cat /var/log/extra/podman-after-overcloud- | grep -v Exited podman-after-overcloud-converge-container.log podman-after-overcloud-converge-images.log podman-after-overcloud-update-container.log podman-after-overcloud-update-images.log [root@compute-0 ~]# cat /var/log/extra/podman-after-overcloud-update-container.log | grep -v Exited CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 25acd990a6e9 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-compute:20200130.1-hotfix kolla_start 2 hours ago Up 2 hours ago nova_compute 279a2f14fee6 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:20200130.1-hotfix kolla_start 2 hours ago Up 2 hours ago ovn_metadata_agent 82bc7afff97f undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ovn-controller:20200130.1-hotfix kolla_start 2 hours ago Up 2 hours ago ovn_controller f93171151401 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-compute:20200130.1-hotfix kolla_start 2 hours ago Up 2 hours ago nova_migration_target 40ac9123bcff undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cron:20200130.1-hotfix kolla_start 2 hours ago Up 2 hours ago logrotate_crond d264ddf9de87 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-iscsid:20200130.1-hotfix kolla_start 2 hours ago Up 2 hours ago iscsid 6300388c0cba undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-libvirt:20200130.1-hotfix kolla_start 2 hours ago Up 2 hours ago nova_libvirt 29553ea63768 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-libvirt:20200130.1-hotfix kolla_start 2 hours ago Up 2 hours ago nova_virtlogd 1bd12ae362a4 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:20200130.1 /bin/bash -c HAPR... 3 hours ago Up 3 hours ago neutron-haproxy-ovnmeta-93cf7f45-92b2-41fd-a149-3a5864a233d1 [root@compute-0 ~]# cat /var/log/extra/podman-after-overcloud- | grep -v Exited podman-after-overcloud-converge-container.log podman-after-overcloud-converge-images.log podman-after-overcloud-update-container.log podman-after-overcloud-update-images.log [root@compute-0 ~]# cat /var/log/extra/podman-after-overcloud-converge-container.log | grep -v Exited CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 25acd990a6e9 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-compute:20200130.1-hotfix kolla_start 2 hours ago Up 2 hours ago nova_compute 279a2f14fee6 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:20200130.1-hotfix kolla_start 2 hours ago Up 2 hours ago ovn_metadata_agent 82bc7afff97f undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ovn-controller:20200130.1-hotfix kolla_start 2 hours ago Up 2 hours ago ovn_controller f93171151401 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-compute:20200130.1-hotfix kolla_start 2 hours ago Up 2 hours ago nova_migration_target 40ac9123bcff undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cron:20200130.1-hotfix kolla_start 2 hours ago Up 2 hours ago logrotate_crond d264ddf9de87 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-iscsid:20200130.1-hotfix kolla_start 2 hours ago Up 2 hours ago iscsid 6300388c0cba undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-libvirt:20200130.1-hotfix kolla_start 2 hours ago Up 2 hours ago nova_libvirt 29553ea63768 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-libvirt:20200130.1-hotfix kolla_start 2 hours ago Up 2 hours ago nova_virtlogd 1bd12ae362a4 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:20200130.1 /bin/bash -c HAPR... 4 hours ago Up 4 hours ago neutron-haproxy-ovnmeta-93cf7f45-92b2-41fd-a149-3a5864a233d1 ### AFTER REBOOT [root@compute-0 ~]# podman ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 925f4e3446b4 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:20200130.1-hotfix /bin/bash -c HAPR... 15 hours ago Up 15 hours ago neutron-haproxy-ovnmeta-164cff3f-069d-40d3-ab8a-b19580aca6f9 25acd990a6e9 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-compute:20200130.1-hotfix kolla_start 18 hours ago Up 15 hours ago nova_compute 279a2f14fee6 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-metadata-agent-ovn:20200130.1-hotfix kolla_start 18 hours ago Up 15 hours ago ovn_metadata_agent 82bc7afff97f undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ovn-controller:20200130.1-hotfix kolla_start 18 hours ago Up 15 hours ago ovn_controller f93171151401 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-compute:20200130.1-hotfix kolla_start 18 hours ago Up 15 hours ago nova_migration_target 40ac9123bcff undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cron:20200130.1-hotfix kolla_start 18 hours ago Up 15 hours ago logrotate_crond d264ddf9de87 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-iscsid:20200130.1-hotfix kolla_start 18 hours ago Up 15 hours ago iscsid 6300388c0cba undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-libvirt:20200130.1-hotfix kolla_start 18 hours ago Up 15 hours ago nova_libvirt 29553ea63768 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-libvirt:20200130.1-hotfix kolla_start 18 hours ago Up 15 hours ago nova_virtlogd We can see that the sidecar container is restarted correctly with the new image after the reboot. I think what is missing here to reproduce is the creation of an vm before the converge step. But again the wrapper script is now properly updated, so this doesn't conflict with the fact that it's working. It would just be good to have the exact reproducer. Thanks, Hi Sofer, First, maybe you did not reproduce the issue because you were using only one compute node? When we reproduced this issue with OVN CI update job, we used two computes and for some reason, the issue only occurs in one of them. Regarding the reproduction/verification using the upstream commit, in the past Brent asked for something similar. I checked with other colleagues from QE and was told we need the change merged downstream in order to test it. I totally agree with your suggestion (testing the change before porting it downstream), but I don't know how to do it. Thanks! Hi, hum, something else must be at work as production ci use 2 computes and we have a vm running on it. Could it be some tempest testing that you trigger on the env that give rise to that error ? Note patch merged upstream, moving to POST. Thanks, Verified on RHOS_TRUNK-16.0-RHEL-8-20200226.n.1 This puddle includes openstack-tripleo-heat-templates-11.3.2-0.20200211065546.d3d6dc3.el8ost.noarch.rpm NOTE: Fixed in Version in this bug is wrong because the fix is included in an earlier RPM (see https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1102176). Verified with CI job https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/network/view/networking-ovn/job/DFG-network-networking-ovn-update-16_director-rhel-virthost-3cont_2comp_2net-ipv4-geneve-composable/68/ Second and third Tempest tests executions after update stages pass. This means port creation and attachment to instances is done successfully. We can see in the logs that neutron-metadata-agent-ovn containers used before update are rhosp16-openstack-neutron-metadata-agent-ovn:20200130.1 (puddle RHOS_TRUNK-16.0-RHEL-8-20200204.n.1). While containers used after update are rhosp16-openstack-neutron-metadata-agent-ovn:20200226.1 (puddle RHOS_TRUNK-16.0-RHEL-8-20200226.n.1). No crashes found in ovn-metadata-agent logs, neither for compute-0 nor for compute-1. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0655 |