Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2111871

Summary: overcloud external-update run --stack overcloud --tags ovn caused a network outage
Product: Red Hat OpenStack Reporter: Eduard Barrera <ebarrera>
Component: python-networking-ovnAssignee: Terry Wilson <twilson>
Status: CLOSED ERRATA QA Contact: Maor <mblue>
Severity: high Docs Contact:
Priority: high    
Version: 16.2 (Train)CC: alisci, apevec, astupnik, bcafarel, dalvarez, dcbw, dhill, dhruv, dmaley, jamsmith, jjoyce, jlibosva, jpretori, jveiraca, kthakre, lhh, ltamagno, majopela, mariel, mblue, mburns, mciecier, mmichels, mtomaska, nbourgeo, pgrist, pratshar, ravsingh, rbruzzon, rcernin, rkhan, sathlang, scohen, shtiwari, spower, tvignaud, twilson, twilson, yoliynyk
Target Milestone: z4Keywords: TestOnly, Triaged
Target Release: 16.2 (Train on RHEL 8.4)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
This update fixes a bug that causes connectivity loss after certain updates to RHOSP 16.2.2 and 16.2.3. If you are planning to update to a RHOSP 16.2 release, update to RHOSP 16.2.4 to avoid connectivity loss. + The bug is triggered by a database schema change in OVN 21.12, which is introduced in RHOSP 16.2.2. and 16.2.3. OVN 21.12 contains a new column that is not present in earlier versions. OVN database schema changes should not cause a problem in OpenStack, but this particular change is affected by a bug. + In particular, instance connectivity is lost for a variable amount of time (from 20 seconds to 3 minutes) when you run the following command: + ---- $ openstack overcloud external-update run --stack overcloud --tags ovn ---- + To avoid the bug, do not update to RHOSP 16.2.2. or 16.2.3. Update to RHOSP 16.2.4 instead.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-12-07 19:24:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2089416    
Bug Blocks:    

Description Eduard Barrera 2022-07-28 11:18:53 UTC
Description of problem:

While performing an update to 16.2z3 we experienced a 20 second blackout of the OVN dataplane while performing the OVN update step (openstack overcloud external-update run --stack overcloud --tags ovn). 

The outage happened on all compute nodes at the same time, breaking clusters hosted on the overcloud.

Documentation:

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html-single/keeping_red_hat_openstack_platform_updated/index

"""
3.3. Optional: Updating the ovn-controller container on all overcloud servers

If you deployed your overcloud with the Modular Layer 2 Open Virtual Network mechanism driver (ML2/OVN), update the ovn-controller container to the latest RHOSP 16.2 version. The update occurs on every overcloud server that runs the ovn-controller container.
Important

The following procedure updates the ovn-controller containers on servers that are assigned the Compute role before it updates the ovn-northd service on servers that are assigned the Controller role.
"""

But it seems that ovn-controller running on Controllers were updated too, it seems the ovsdb-server where affected as well


Version-Release number of selected component (if applicable):
OSP 16.2

How reproducible:
Unsure 

Steps to Reproduce:
1. # openstack overcloud external-update run --stack overcloud --tags ovn
2.
3.

Actual results:
Outage for 20 seconds

Expected results:
no outage

Comment 72 Paul Grist 2022-11-02 12:50:05 UTC
Updating status here, based on yesterday's information, this BZ should include https://review.opendev.org/c/openstack/tripleo-heat-templates/+/860473/ to allow for automatic update, otherewise manual intervention is needed

Comment 73 Terry Wilson 2022-11-03 13:25:16 UTC
I've replaced that patch with 3 others that are in the review process: https://review.opendev.org/q/topic:ovn-ofctrl-wait-before-clear Only the THT and puppet-ovn are technically required for now. The ansible one is for keeping feature parity.

Comment 74 Bernard Cafarelli 2022-11-03 13:37:18 UTC
Thanks for the update Terry - good to have the topic to track all 3 of them (and backports)

Comment 90 errata-xmlrpc 2022-12-07 19:24:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.4), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8794