Bug 1873742 - Unhealthy ovn_metadata_agent during 13-16.1 FFU
Summary: Unhealthy ovn_metadata_agent during 13-16.1 FFU
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: z3
: 16.1 (Train on RHEL 8.2)
Assignee: RHOS Maint
QA Contact: Eduardo Olivares
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-29 19:38 UTC by Eduardo Olivares
Modified: 2020-12-15 18:37 UTC (History)
10 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.3.2-1.20200914170157.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-12-15 18:36:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 752230 0 None MERGED [train-only] Fix unhealthy ovn_metadata_agent during hybrid state 2020-11-16 10:48:57 UTC
Red Hat Product Errata RHEA-2020:5413 0 None None None 2020-12-15 18:37:03 UTC

Description Eduardo Olivares 2020-08-29 19:38:18 UTC
Description of problem:
The issue happens during the FFU procedure, while the Overcloud FFU Upgrade stage is performed. The topology used is HA-no-ceph (3 computes and 2 controllers)

The script overcloud_upgrade_run-controller-0.sh upgrades the first controller node to osp16.1, disables controller-1 and controller-2 and created a pacemaker single resource cluster with controller-0. Meanwhile, both compute nodes are running osp13.

Workload was created before starting the upgrade procedure. The corresponding VM instance can be successfully reached.

At this moment (with only controller-0 upgraded), new workload is created successfully, but there is no connectivity with its FIP.

Findings:
1) ovn_metadata_agent is unhealthy
38b5aa21f39c        192.168.24.1:8787/rh-osbs/rhosp13-openstack-neutron-metadata-agent-ovn:20200730.1                "dumb-init --singl..."   23 hours ago        Up 7 minutes (unhealthy)                       ovn_metadata_agent                                      
2) neutron-haproxy-ovnmeta sidecar container is not created for the new workload (only a sidecar container exists for the old workload)


The issue is reproduced on a BM server: panther23.lab.eng.tlv2.redhat.com
The issue can be reproduced from the undercloud node at panther23 by running /home/stack/workload_launch.sh (new created workload can be removed running /home/stack/workload_launch.sh cleanup).




Version-Release number of selected component (if applicable):
FFU upgrade performed from osp13: 2020-08-05.1
to osp16.1: RHOS-16.1-RHEL-8-20200821.n.0

How reproducible:
100% - several OVN FFU jobs reproduced it, both with and without DVR enabled.
https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-upgrades-ffu-ffu-upgrade-13-16.1_director-rhel-virthost-3cont_2comp-ipv6-geneve-HA-no-ceph-ovn-dvr/
https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/upgrades/view/ffu/job/DFG-upgrades-ffu-ffu-upgrade-13-16.1_director-rhel-virthost-3cont_2comp-ipv6-geneve-HA-no-ceph-ovn/


Steps to Reproduce:
1. Run an OVN FFU job like the ones pasted above. The issue is reproduced after the first controller node is upgraded, during the Overcloud FFU Upgrade Run stage
2.
3.

Comment 1 Jakub Libosvar 2020-09-08 12:54:38 UTC
This is the same reason as in bug 1871834 - the metadata agent config uses the old VIP while OVN DB is listening on a new one.

[root@controller-0 ~]# ss -putnal | grep 6642
tcp   LISTEN 0      10          [fd00:fd00:fd00:2000::136]:6642            [::]:*                                                                                users:(("ovsdb-server",pid=672283,fd=14))

[root@compute-1 ~]# grep ovn_sb_connection /var/lib/config-data/puppet-generated/neutron/etc/neutron/plugins/networking-ovn/networking-ovn-metadata-agent.ini
ovn_sb_connection=tcp:[fd00:fd00:fd00:2000::ef]:6642

Comment 20 errata-xmlrpc 2020-12-15 18:36:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.3 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:5413


Note You need to log in before you can comment on or make changes to this bug.