Bug 2229761 - [OSP16.2 -> 17.1] Packet loss during controller upgrade for OVN
Summary: [OSP16.2 -> 17.1] Packet loss during controller upgrade for OVN
Keywords:
Status: ON_DEV
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-networking-ovn
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
urgent
medium
Target Milestone: z1
: 17.1
Assignee: Lukas Bezdicka
QA Contact: Khomesh Thakre
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-08-07 15:05 UTC by Khomesh Thakre
Modified: 2023-08-14 14:12 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
There is currently a known issue with a race condition in the deployment steps for `ovn_controller` and `ovn_dbs`, which causes `ovn_dbs` to be upgraded before `ovn_controller`. If `ovn_controller` is not upgraded before `ovn_dbs`, an error before the restart to the new version causes packet loss. There is an estimated one-minute network outage if the race condition occurs during the Open Virtual Network (OVN) upgrade. A fix is expected in a later RHOSP release.
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-27266 0 None None None 2023-08-07 15:09:18 UTC

Description Khomesh Thakre 2023-08-07 15:05:14 UTC
Description of problem:

During controller upgrade when oven service starts at step 3, sometime ovn dbs starts before ovn controller causing packet loss. 

~~~
2023-08-03 16:30:38 | 2023-08-03 16:30:38.457 340702 INFO tripleoclient.v1.overcloud_upgrade.UpgradeRun [-] Completed Overcloud Major Upgrade Run.[00m
2023-08-03 16:30:38 | 2023-08-03 16:30:38.457 340702 INFO osc_lib.shell [-] END return value: None[00m
2023-08-03 16:30:38 | [Thu Aug  3 16:30:38 UTC 2023] Finished major upgrade for computehci-0,computehci-1,computehci-2,controller-0,controller-1,controller-2,database-0,database-1,database-2,messaging-0,messaging-1,messaging-2,networker-0,networker-1,undercloud hosts
2023-08-03 16:30:38 | 3120 packets transmitted, 3066 received, +15 errors, 1.73077% packet loss, time 3124473ms
2023-08-03 16:30:38 | rtt min/avg/max/mdev = 0.689/2.618/2077.599/41.846 ms, pipe 4
2023-08-03 16:30:38 | Ping loss higher than 1 % detected (2 %) 
~~~

Version-Release number of selected component (if applicable):
RHOSP 17 on rhel 8 (Puddle RHOS-17.1-RHEL-8-20230802.n.1)

How reproducible:
Random issue whenever ovn dbs starts before ovn controller.


Note You need to log in before you can comment on or make changes to this bug.