Description of problem: In raft mode, OVN doesn't provide a good mechanism to run consistency/health checks. Currently, the ovsdb-tool provides for a check-cluster command. But that command only works in offline mode. Any orchestrator that uses OVN is expected to take corrective action in case the RAFT DBs get corrupted. This places a requirement on OVN that it need to provide a mechanism to do consistency checks on the db while it is running. This RFE is requesting that OVN provide such a consistency checker/db health checker. Version-Release number of selected component (if applicable): 20.03 How reproducible: Always Steps to Reproduce: 1. Run a Openshift cluster with ovn-k as the network plugin. 2. There is no tool that can give a good health check/consistency check result. Additional Information: Calling check-cluster on a live cluster will often show errors like: I1012 20:30:31.648730 1 ovndbmanager.go:229] check-cluster returned out: "", stderr: "" W1012 20:30:45.297491 1 ovndbmanager.go:89] Unable to get db server ID for: /etc/ovn/ovnsb_db.db, stderr: ovsdb-tool: syntax error: /etc/ovn/ovnsb_db.db: 625335735 bytes starting at offset 65 have SHA-1 hash ff4ef44ee1817d3482803f9cec049584f1db7a32 but should have hash 2b7967802ce8c93f46d5cca5ea0564f28c07ee46 , err: exit status 1 F1012 20:30:59.280415 1 ovndbmanager.go:200] Error occured during checking of clustered db db: /etc/ovn/ovnsb_db.db,stdout: "", stderr: "ovsdb-tool: syntax error: /etc/ovn/ovnsb_db.db: 625335735 bytes starting at offset 65 have SHA-1 hash ff4ef44ee1817d3482803f9cec049584f1db7a32 but should have hash 2b7967802ce This implies that the ovsdb-tool cannot be used when the raft DB is being continuously written to and the tool is trying to call check-cluster command.
Re-prioritizing this to "low" since OCP does not use RAFT by default anymore.
This issue is being closed as an automatic process due to the issue's age. If you wish to re-open this issue, please do so in Jira (https://issues.redhat.com) in the 'FDP' project. Please be sure to set the component to the latest OVN version where this issue is known to occur. If this is a feature request or improvement, please set the component to 'OVN'.