I filed https://bugzilla.redhat.com/show_bug.cgi?id=1779381 against OVN, but the bug occurred with ovn-kubernetes, so I thought it would be worth having a tracker against ovn-kubernetes to follow the issue to resolution.
The question: is this a bug in the shell script, or is this a bug in ovsd argument parsing?
Hi Could QE re-validate is this bug is still a bug (I am ready to bet an arm and a leg that it is not....)? With all the upstreams improvements to OVN, I suspect this issue can be closed. Excuse us for this unconventional way of doing this, but it has slipped everyone's mind this past month. /Alex
Tested on three master nodes cluster using 4.4.0-0.nightly-2020-01-24-141203 , the role information show one Leader and two followers [root@dhcp-41-193 FILE]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.0-0.nightly-2020-01-24-141203 True False 31m Cluster version is 4.4.0-0.nightly-2020-01-24-141203 [root@dhcp-41-193 FILE]# for n in $(oc get pods -n openshift-ovn-kubernetes | grep -v NAME | grep ovnkube-master | cut -f1 -d' ') ; do echo "**** $n ****" ; oc exec -n openshift-ovn-kubernetes -it $n -c nbdb -- ovs-appctl -t /var/run/openvswitch/ovnnb_db.ctl cluster/status OVN_Northbound ; done **** ovnkube-master-npr5w **** b33d Name: OVN_Northbound Cluster ID: 5cab (5caba8a9-c72f-4d00-a898-d190012c1b3b) Server ID: b33d (b33d9d42-e282-4bc1-9867-a56f477ae351) Address: ssl:10.0.134.27:9643 Status: cluster member Role: leader Term: 1 Leader: self Vote: self Election timer: 1000 Log: [2, 1599] Entries not yet committed: 0 Entries not yet applied: 0 Connections: <-e21e ->e21e <-ff13 ->ff13 Servers: e21e (e21e at ssl:10.0.170.46:9643) next_index=1599 match_index=1598 ff13 (ff13 at ssl:10.0.150.83:9643) next_index=1599 match_index=1598 b33d (b33d at ssl:10.0.134.27:9643) (self) next_index=2 match_index=1598 **** ovnkube-master-x9hf8 **** e21e Name: OVN_Northbound Cluster ID: 5cab (5caba8a9-c72f-4d00-a898-d190012c1b3b) Server ID: e21e (e21e5144-6448-4090-a2d2-06b2a5e2014e) Address: ssl:10.0.170.46:9643 Status: cluster member Role: follower Term: 1 Leader: b33d Vote: unknown Election timer: 1000 Log: [2, 1599] Entries not yet committed: 0 Entries not yet applied: 0 Connections: ->0000 <-b33d <-ff13 ->ff13 Servers: e21e (e21e at ssl:10.0.170.46:9643) (self) ff13 (ff13 at ssl:10.0.150.83:9643) b33d (b33d at ssl:10.0.134.27:9643) **** ovnkube-master-zjdnn **** ff13 Name: OVN_Northbound Cluster ID: 5cab (5caba8a9-c72f-4d00-a898-d190012c1b3b) Server ID: ff13 (ff1342d9-76a8-4b53-812a-14a00ca5e18b) Address: ssl:10.0.150.83:9643 Status: cluster member Role: follower Term: 1 Leader: b33d Vote: unknown Election timer: 1000 Log: [2, 1599] Entries not yet committed: 0 Entries not yet applied: 0 Connections: ->0000 ->e21e <-b33d <-e21e Servers: e21e (e21e at ssl:10.0.170.46:9643) ff13 (ff13 at ssl:10.0.150.83:9643) (self) b33d (b33d at ssl:10.0.134.27:9643) [root@dhcp-41-193 FILE]#
Note that when I was seeing this, it did not happen on every install. It was just one occasional failure mode.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581