Description of problem:
In raft mode, OVN doesn't provide a good mechanism to run consistency/health checks.
Currently, the ovsdb-tool provides for a check-cluster command. But that command only works in offline mode. Any orchestrator that uses OVN is expected to take corrective action in case the RAFT DBs get corrupted. This places a requirement on OVN that it need to provide a mechanism to do consistency checks on the db while it is running.
This RFE is requesting that OVN provide such a consistency checker/db health checker.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Run a Openshift cluster with ovn-k as the network plugin.
2. There is no tool that can give a good health check/consistency check result.
Calling check-cluster on a live cluster will often show errors like:
I1012 20:30:31.648730 1 ovndbmanager.go:229] check-cluster returned out: "", stderr: ""
W1012 20:30:45.297491 1 ovndbmanager.go:89] Unable to get db server ID for: /etc/ovn/ovnsb_db.db, stderr: ovsdb-tool: syntax error: /etc/ovn/ovnsb_db.db: 625335735 bytes starting at offset 65 have SHA-1 hash ff4ef44ee1817d3482803f9cec049584f1db7a32 but should have hash 2b7967802ce8c93f46d5cca5ea0564f28c07ee46
, err: exit status 1
F1012 20:30:59.280415 1 ovndbmanager.go:200] Error occured during checking of clustered db db: /etc/ovn/ovnsb_db.db,stdout: "", stderr: "ovsdb-tool: syntax error: /etc/ovn/ovnsb_db.db: 625335735 bytes starting at offset 65 have SHA-1 hash ff4ef44ee1817d3482803f9cec049584f1db7a32 but should have hash 2b7967802ce
This implies that the ovsdb-tool cannot be used when the raft DB is being continuously written to and the tool is trying to call check-cluster command.