The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 1889470 - [RFE] OVN should provide a mechanism to check db consistency/health on a live raft cluster
Summary: [RFE] OVN should provide a mechanism to check db consistency/health on a live...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: OVN
Version: FDP 20.C
Hardware: Unspecified
OS: Unspecified
low
unspecified
Target Milestone: ---
: ---
Assignee: OVN Team
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-19 18:04 UTC by Aniket Bhat
Modified: 2024-02-14 21:11 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-02-14 21:11:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1887585 0 high CLOSED ovn-masters stuck in crashloop after scale test 2021-02-24 15:25:56 UTC
Red Hat Issue Tracker FD-901 0 None None None 2021-11-12 14:59:53 UTC

Description Aniket Bhat 2020-10-19 18:04:33 UTC
Description of problem:

In raft mode, OVN doesn't provide a good mechanism to run consistency/health checks. 
Currently, the ovsdb-tool provides for a check-cluster command. But that command only works in offline mode. Any orchestrator that uses OVN is expected to take corrective action in case the RAFT DBs get corrupted. This places a requirement on OVN that it need to provide a mechanism to do consistency checks on the db while it is running.

This RFE is requesting that OVN provide such a consistency checker/db health checker.

Version-Release number of selected component (if applicable):
20.03

How reproducible:
Always

Steps to Reproduce:
1. Run a Openshift cluster with ovn-k as the network plugin.
2. There is no tool that can give a good health check/consistency check result.

Additional Information:

Calling check-cluster on a live cluster will often show errors like: 


I1012 20:30:31.648730       1 ovndbmanager.go:229] check-cluster returned out: "", stderr: ""
W1012 20:30:45.297491       1 ovndbmanager.go:89] Unable to get db server ID for: /etc/ovn/ovnsb_db.db, stderr: ovsdb-tool: syntax error: /etc/ovn/ovnsb_db.db: 625335735 bytes starting at offset 65 have SHA-1 hash ff4ef44ee1817d3482803f9cec049584f1db7a32 but should have hash 2b7967802ce8c93f46d5cca5ea0564f28c07ee46
, err: exit status 1
F1012 20:30:59.280415       1 ovndbmanager.go:200] Error occured during checking of clustered db db: /etc/ovn/ovnsb_db.db,stdout: "", stderr: "ovsdb-tool: syntax error: /etc/ovn/ovnsb_db.db: 625335735 bytes starting at offset 65 have SHA-1 hash ff4ef44ee1817d3482803f9cec049584f1db7a32 but should have hash 2b7967802ce


This implies that the ovsdb-tool cannot be used when the raft DB is being continuously written to and the tool is trying to call check-cluster command.

Comment 2 Mark Michelson 2024-01-19 15:09:40 UTC
Re-prioritizing this to "low" since OCP does not use RAFT by default anymore.

Comment 3 OVN Bot 2024-02-14 21:11:30 UTC
This issue is being closed as an automatic process due to the issue's age. If you wish to re-open this issue, please do so in Jira (https://issues.redhat.com) in the 'FDP' project. Please be sure to set the component to the latest OVN version where this issue is known to occur. If this is a feature request or improvement, please set the component to 'OVN'.


Note You need to log in before you can comment on or make changes to this bug.