Bug 2229761 - [OSP16.2 -> 17.1] Packet loss during controller upgrade for OVN
Summary: [OSP16.2 -> 17.1] Packet loss during controller upgrade for OVN
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
urgent
medium
Target Milestone: z1
: 17.1
Assignee: Lukas Bezdicka
QA Contact: Khomesh Thakre
URL:
Whiteboard:
Depends On:
Blocks: 2243277
TreeView+ depends on / blocked
 
Reported: 2023-08-07 15:05 UTC by Khomesh Thakre
Modified: 2023-10-11 14:54 UTC (History)
12 users (show)

Fixed In Version: openstack-tripleo-heat-templates-14.3.1-1.20230519151024.el9ost
Doc Type: Bug Fix
Doc Text:
Before this update, a race condition in the deployment steps for `ovn_controller` and `ovn_dbs` caused `ovn_dbs` to be upgraded before `ovn_controller. If `ovn_controller` is not upgraded before `ovn_dbs`, an error before the restart to the new version causes packet loss. In RHOSP 17.1.1, this issue has been resolved.
Clone Of:
: 2243277 (view as bug list)
Environment:
Last Closed: 2023-09-20 00:29:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 891670 0 None MERGED [ffwd3] Upgrade ovn_controller ahead 2023-08-25 10:25:41 UTC
OpenStack gerrit 892493 0 None MERGED [ffwd3][update] Run OVN external update only once 2024-01-02 11:49:51 UTC
Red Hat Issue Tracker OSP-27266 0 None None None 2023-08-07 15:09:18 UTC
Red Hat Product Errata RHBA-2023:5138 0 None None None 2023-09-20 00:30:28 UTC

Description Khomesh Thakre 2023-08-07 15:05:14 UTC
Description of problem:

During controller upgrade when oven service starts at step 3, sometime ovn dbs starts before ovn controller causing packet loss. 

~~~
2023-08-03 16:30:38 | 2023-08-03 16:30:38.457 340702 INFO tripleoclient.v1.overcloud_upgrade.UpgradeRun [-] Completed Overcloud Major Upgrade Run.[00m
2023-08-03 16:30:38 | 2023-08-03 16:30:38.457 340702 INFO osc_lib.shell [-] END return value: None[00m
2023-08-03 16:30:38 | [Thu Aug  3 16:30:38 UTC 2023] Finished major upgrade for computehci-0,computehci-1,computehci-2,controller-0,controller-1,controller-2,database-0,database-1,database-2,messaging-0,messaging-1,messaging-2,networker-0,networker-1,undercloud hosts
2023-08-03 16:30:38 | 3120 packets transmitted, 3066 received, +15 errors, 1.73077% packet loss, time 3124473ms
2023-08-03 16:30:38 | rtt min/avg/max/mdev = 0.689/2.618/2077.599/41.846 ms, pipe 4
2023-08-03 16:30:38 | Ping loss higher than 1 % detected (2 %) 
~~~

Version-Release number of selected component (if applicable):
RHOSP 17 on rhel 8 (Puddle RHOS-17.1-RHEL-8-20230802.n.1)

How reproducible:
Random issue whenever ovn dbs starts before ovn controller.

Comment 11 Lukas Bezdicka 2023-08-23 14:53:12 UTC
Failed QA - tasks are triggered n times where n is amount of nodes in the stack. In 500node overcloud it would restart ovn_controller 500times on each node.
Testing fix:
https://review.opendev.org/c/openstack/tripleo-heat-templates/+/892493

Comment 22 errata-xmlrpc 2023-09-20 00:29:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:5138


Note You need to log in before you can comment on or make changes to this bug.