Bug 1953688 - Make cluster status user-friendly no matter which node it is run on
Summary: Make cluster status user-friendly no matter which node it is run on
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovsdb2.15
Version: RHEL 8.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Ilya Maximets
QA Contact: qding
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-26 16:23 UTC by Carlos Goncalves
Modified: 2023-07-13 07:34 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-1278 0 None None None 2021-11-12 14:49:49 UTC

Description Carlos Goncalves 2021-04-26 16:23:59 UTC
This is a follow-up to BZ #1929690 to request improvements to the CLI output of "ovs-appctl cluster/status" command for a better user experience.

Whenever a clustered node becomes offline or in a split-brain situation, the cluster status output appears to reflect that with a combination of server IDs, parenthesis and arrows. The output is far from being intuitive to mortal/non-developer OVS users, and thus likely to mislead users into thinking the cluster is in a healthy state. The impact of not clearly indicating a known network issue is arguably even more critical when troubleshooting production environments.

Below is an example copy-pasted from Michele's comment #10 in BZ #1929690.

We have the following three nodes (running in VMs):
controller-0 172.16.2.241
controller-1 172.16.2.79
controller-2 172.16.2.57

ovn-dbs is clustered across those three nodes and we virsh destroy controller-0.
Then on controller-1 we see the following:
[root@controller-1 ovs-2.13.0]# podman exec -ti ovn_cluster_north_db_server sh
sh-4.4# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
45fb
Name: OVN_Northbound
Cluster ID: 613c (613c0b6e-65af-4810-bb48-c9cbea43d442)
Server ID: 45fb (45fba88d-5980-49ab-b562-1ea6e0db266c)
Address: ssl:172.16.2.79:6643
Status: cluster member
Role: follower
Term: 2
Leader: c7dd
Vote: c7dd

Election timer: 1000
Log: [2, 118]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->c7dd (->9287) <-c7dd
Servers:
    45fb (45fb at ssl:172.16.2.79:6643) (self)
    c7dd (c7dd at ssl:172.16.2.57:6643)
    9287 (9287 at ssl:172.16.2.241:6643)

And above I have no indication that controller-0 (aka 172.16.2.241) is really gone when we query from the surviving, quorate partition.


Note You need to log in before you can comment on or make changes to this bug.