Bug 2090537

Summary: failure in ovndb migration when db is not ready in HA mode
Product: OpenShift Container Platform Reporter: zenghui.shi <zshi>
Component: NetworkingAssignee: zenghui.shi <zshi>
Networking sub component: ovn-kubernetes QA Contact: qiowang
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: qiowang
Version: 4.11Flags: qiowang: needinfo+
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 11:14:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description zenghui.shi 2022-05-26 01:52:24 UTC
Description of problem:
We've seen instances where the migration initially failed because the db
wasn't ready yet. This change adds some retrying to the migration to
handle that situation more gracefully and efficient, as failing this a
couple of times currently results in getting the pod into crashlooop

A sample transient error for initial failures looks like e.G. this:

F0511 04:31:57.928725       1 ovndbmanager.go:44] NBDB Upgrade failed: %!w(*fmt.wrapError=&{failed to get schema version for NBDB, stderr: "ovsdb-client: transaction returned error: {\"details\":\"get_schema request specifies database OVN_Northbound which is not yet available because it has not completed joining its cluster\",\"error\":\"database not available\"}\n", error: OVN command '/usr/bin/ovsdb-client -t 10 get-schema-version unix:/var/run/ovn/ovnnb_db.sock OVN_Northbound' failed: exit status 1 0xc00042a160})

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Comment 6 errata-xmlrpc 2022-08-10 11:14:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.