Bug 2227236

Summary: OVN DB was corrupted during scale-out procedure
Product: Red Hat OpenStack Reporter: Alex Stupnikov <astupnik>
Component: python-networking-ovnAssignee: Jakub Libosvar <jlibosva>
Status: NEW --- QA Contact: Eran Kuris <ekuris>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 16.1 (Train)CC: anbs, apevec, jlibosva, lhh, majopela, scohen
Target Milestone: ---Flags: jlibosva: needinfo? (anbs)
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alex Stupnikov 2023-07-28 13:17:01 UTC
Description of problem:
Scale out procedure broken communication via FIPs and DHCP address allocation in RHOSP 16.1.9 deployment. In overcloud controller sosreports there is flood of following errors after connection to OVN DB flapped:

2023-07-25 19:54:39.506 26 ERROR ovsdbapp.event Stderr: 'ovsdb-client: failed to connect to "tcp:IP:6642" (Connection timed out)
...
2023-07-25 19:54:40.371 32 ERROR networking_ovn.ovsdb.ovsdb_monitor [-] HashRing is empty, error: Hash Ring returned empty when hashing "b'UUID'". This should never happen in a normal situation, please check the status of your cluster: networking_ovn.common.exceptions.HashRingIsEmpty: Hash Ring returned empty when hashing "b'UUID'". This should never happen in a normal situation, please check the status of your cluster

It looks like something happened with OVN DB, but there are no clear pointers in OVN logs. Communications were restored after running OVN DB sync tool, but it looks like there is an OVN-related problem in our scale-out workflow or some OVN bug.

sosreports from controller nodes are attached to support case.

Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform release 16.1.9 (Train)

How reproducible:
We are not sure if there is reliable way to reproduce this problem.

Additional info:
Will be provided privately