Bug 1997351

Summary: [13->16.1] Instance are inaccessible after bootstrap controller upgrade
Product: Red Hat OpenStack Reporter: Khomesh Thakre <kthakre>
Component: openstack-tripleo-heat-templatesAssignee: Cédric Jeanneret <cjeanner>
Status: CLOSED ERRATA QA Contact: Jason Grosso <jgrosso>
Severity: high Docs Contact: James Smith <jamsmith>
Priority: urgent    
Version: 16.1 (Train)CC: apevec, aschultz, astupnik, bcafarel, bdobreli, cjeanner, dalvarez, dmaley, jamsmith, jfrancoa, jgrosso, jhardee, jpichon, jpretori, kurathod, lhh, majopela, mburns, schhabdi, scohen, sgolovat, spower, sputhenp, tbonds
Target Milestone: z7Keywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-11.3.2-1.20210720153311.el8ost Doc Type: Bug Fix
Doc Text:
Before this update, upgrading a Red Hat OpenStack Platform (RHOSP) 13 environment that has been deployed with ML2-OVN, to RHOSP 16.1 caused the upgrade process to fail on the Controller nodes due to an SELinux denial issue. With this update, the correct SELinux label is applied to OVN and resolves the issue. For more information, see the Red Hat Knowledgebase solution link:https://access.redhat.com/solutions/6305361[OVN fails to configure after reboot during OSP-13 -> OSP-16.1 FFU].
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-09 20:20:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Khomesh Thakre 2021-08-25 03:37:22 UTC
Description of problem:

During upgrade to 16.1 with ML2-OVN, instance loss connectivity as neutron and ovn are not in sync.

In order to recover we manually need to run neutron-ovn-db-sync-util to sync ovn db with a neutron.

~~~
# podman exec -it neutron_api /bin/bash
# neutron-ovn-db-sync-util --config-file /usr/share/neutron/neutron-dist.conf --config-dir /usr/share/neutron/server --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugin.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-server --ovn-neutron_sync_mode=repair
~~~

Version-Release number of selected component (if applicable):
Red Hat Openstack Platform 16.1


Actual results:
Dataplane is impacted during control plane upgrade

Expected results:
Dataplane should not be impacted during control plane upgrade

Additional info:

Comment 42 Cédric Jeanneret 2021-09-14 14:55:35 UTC
After some discussions, here's the plan:

adding the "z" flag will create an outage during day-2 operations, and we don't want that.

So we target https://review.opendev.org/c/openstack/puppet-tripleo/+/808774 for wallaby/osp-17 only, and get a train-only patch that will actually change the context during upgrade_tasks (linked patch here).

This same train-only patch will also be added to 16.2.x (see bug 2003732 for more details).

Comment 43 Cédric Jeanneret 2021-09-15 12:39:17 UTC
Since we have a "simple" patch, requesting blocker for that one against 16.1.7.

Comment 53 Jose Luis Franco 2021-10-26 07:15:49 UTC
Verified the BZ manually.

Steps:
 1 . Deploy OSP13 environment with OVN.
 2 . Upgrade Undercloud to OSP16.1
(undercloud) [stack@undercloud-0 ~]$ rpm -qa | grep tripleo-heat-templates
openstack-tripleo-heat-templates-11.3.2-1.20210720153312.el8ost.noarch
 3 . Upgrade Controller-0 to OSP16.1
 4 . Check label applied to /var/lib/openvswitch/ovn directory:
[heat-admin@controller-0 ~]$ ls -ldZ  /var/lib/openvswitch/ovn
drwxr-xr-x. 2 root root system_u:object_r:container_file_t:s0 4096 Oct 26 06:58 /var/lib/openvswitch/ovn
[heat-admin@controller-0 ~]$ ls -lZ  /var/lib/openvswitch/ovn
total 207268
srwxr-x---. 1 root root system_u:object_r:container_file_t:s0         0 Oct 20 16:21 ovn-controller.7.ctl
-rw-r--r--. 1 root root system_u:object_r:container_file_t:s0         2 Oct 20 16:21 ovn-controller.pid
-rw-r--r--. 1 root root system_u:object_r:container_file_t:s0        22 Oct 25 16:13 ovnnb-active.conf
srwxr-x---. 1 root root system_u:object_r:container_file_t:s0         0 Oct 25 16:13 ovnnb_db.ctl
-rw-r-----. 1 root root system_u:object_r:container_file_t:s0   3828370 Oct 26 07:10 ovnnb_db.db
-rw-r-----. 1 root root system_u:object_r:container_file_t:s0   1980794 Oct 18 21:02 ovnnb_db.db.backup5.10.1-64444197
-rw-r-----. 1 root root system_u:object_r:container_file_t:s0     63508 Oct 20 16:18 ovnnb_db.db.backup5.18.0-2806349485
-rw-r--r--. 1 root root system_u:object_r:container_file_t:s0         4 Oct 25 16:13 ovnnb_db.pid
srwxr-x---. 1 root root system_u:object_r:container_file_t:s0         0 Oct 25 16:13 ovnnb_db.sock
-rw-r--r--. 1 root root system_u:object_r:container_file_t:s0        22 Oct 25 16:13 ovnsb-active.conf
srwxr-x---. 1 root root system_u:object_r:container_file_t:s0         0 Oct 25 16:13 ovnsb_db.ctl
-rw-r-----. 1 root root system_u:object_r:container_file_t:s0 177085733 Oct 26 07:10 ovnsb_db.db
-rw-r-----. 1 root root system_u:object_r:container_file_t:s0   6876367 Oct 18 21:02 ovnsb_db.db.backup1.15.1-1164519396
-rw-r-----. 1 root root system_u:object_r:container_file_t:s0    131259 Oct 20 16:18 ovnsb_db.db.backup2.6.0-4271405686
-rw-r--r--. 1 root root system_u:object_r:container_file_t:s0         4 Oct 25 16:13 ovnsb_db.pid
srwxr-x---. 1 root root system_u:object_r:container_file_t:s0         0 Oct 25 16:13 ovnsb_db.sock

Comment 54 Jose Luis Franco 2021-10-26 07:23:07 UTC
Also, no denials found in audit.log from any of the upgraded controllers.

Comment 64 errata-xmlrpc 2021-12-09 20:20:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3762