Bug 2111871 - overcloud external-update run --stack overcloud --tags ovn caused a network outage
Summary: overcloud external-update run --stack overcloud --tags ovn caused a network o...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-networking-ovn
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z4
: 16.2 (Train on RHEL 8.4)
Assignee: Terry Wilson
QA Contact: Maor
URL:
Whiteboard:
Depends On: 2089416
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-28 11:18 UTC by Eduard Barrera
Modified: 2023-08-21 16:06 UTC (History)
39 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
This update fixes a bug that causes connectivity loss after certain updates to RHOSP 16.2.2 and 16.2.3. If you are planning to update to a RHOSP 16.2 release, update to RHOSP 16.2.4 to avoid connectivity loss. + The bug is triggered by a database schema change in OVN 21.12, which is introduced in RHOSP 16.2.2. and 16.2.3. OVN 21.12 contains a new column that is not present in earlier versions. OVN database schema changes should not cause a problem in OpenStack, but this particular change is affected by a bug. + In particular, instance connectivity is lost for a variable amount of time (from 20 seconds to 3 minutes) when you run the following command: + ---- $ openstack overcloud external-update run --stack overcloud --tags ovn ---- + To avoid the bug, do not update to RHOSP 16.2.2. or 16.2.3. Update to RHOSP 16.2.4 instead.
Clone Of:
Environment:
Last Closed: 2022-12-07 19:24:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 860473 0 None NEW Set external_ids:ovn-ofctrl-wait-before-clear 2022-11-10 10:49:16 UTC
Red Hat Issue Tracker OSP-17895 0 None None None 2022-07-28 11:22:06 UTC
Red Hat Product Errata RHBA-2022:8794 0 None None None 2022-12-07 19:24:22 UTC

Description Eduard Barrera 2022-07-28 11:18:53 UTC
Description of problem:

While performing an update to 16.2z3 we experienced a 20 second blackout of the OVN dataplane while performing the OVN update step (openstack overcloud external-update run --stack overcloud --tags ovn). 

The outage happened on all compute nodes at the same time, breaking clusters hosted on the overcloud.

Documentation:

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html-single/keeping_red_hat_openstack_platform_updated/index

"""
3.3. Optional: Updating the ovn-controller container on all overcloud servers

If you deployed your overcloud with the Modular Layer 2 Open Virtual Network mechanism driver (ML2/OVN), update the ovn-controller container to the latest RHOSP 16.2 version. The update occurs on every overcloud server that runs the ovn-controller container.
Important

The following procedure updates the ovn-controller containers on servers that are assigned the Compute role before it updates the ovn-northd service on servers that are assigned the Controller role.
"""

But it seems that ovn-controller running on Controllers were updated too, it seems the ovsdb-server where affected as well


Version-Release number of selected component (if applicable):
OSP 16.2

How reproducible:
Unsure 

Steps to Reproduce:
1. # openstack overcloud external-update run --stack overcloud --tags ovn
2.
3.

Actual results:
Outage for 20 seconds

Expected results:
no outage

Comment 72 Paul Grist 2022-11-02 12:50:05 UTC
Updating status here, based on yesterday's information, this BZ should include https://review.opendev.org/c/openstack/tripleo-heat-templates/+/860473/ to allow for automatic update, otherewise manual intervention is needed

Comment 73 Terry Wilson 2022-11-03 13:25:16 UTC
I've replaced that patch with 3 others that are in the review process: https://review.opendev.org/q/topic:ovn-ofctrl-wait-before-clear Only the THT and puppet-ovn are technically required for now. The ansible one is for keeping feature parity.

Comment 74 Bernard Cafarelli 2022-11-03 13:37:18 UTC
Thanks for the update Terry - good to have the topic to track all 3 of them (and backports)

Comment 90 errata-xmlrpc 2022-12-07 19:24:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.4), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8794


Note You need to log in before you can comment on or make changes to this bug.