Description of problem: During controller upgrade when oven service starts at step 3, sometime ovn dbs starts before ovn controller causing packet loss. ~~~ 2023-08-03 16:30:38 | 2023-08-03 16:30:38.457 340702 INFO tripleoclient.v1.overcloud_upgrade.UpgradeRun [-] Completed Overcloud Major Upgrade Run.[00m 2023-08-03 16:30:38 | 2023-08-03 16:30:38.457 340702 INFO osc_lib.shell [-] END return value: None[00m 2023-08-03 16:30:38 | [Thu Aug 3 16:30:38 UTC 2023] Finished major upgrade for computehci-0,computehci-1,computehci-2,controller-0,controller-1,controller-2,database-0,database-1,database-2,messaging-0,messaging-1,messaging-2,networker-0,networker-1,undercloud hosts 2023-08-03 16:30:38 | 3120 packets transmitted, 3066 received, +15 errors, 1.73077% packet loss, time 3124473ms 2023-08-03 16:30:38 | rtt min/avg/max/mdev = 0.689/2.618/2077.599/41.846 ms, pipe 4 2023-08-03 16:30:38 | Ping loss higher than 1 % detected (2 %) ~~~ Version-Release number of selected component (if applicable): RHOSP 17 on rhel 8 (Puddle RHOS-17.1-RHEL-8-20230802.n.1) How reproducible: Random issue whenever ovn dbs starts before ovn controller.
Failed QA - tasks are triggered n times where n is amount of nodes in the stack. In 500node overcloud it would restart ovn_controller 500times on each node. Testing fix: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/892493
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1.1 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:5138