+++ This bug was initially created as a clone of Bug #1814099 +++ +++ This bug was initially created as a clone of Bug #1814098 +++ Setting monitor-all=true in each node's ovsdb causes each ovn-controller to monitor all chassis events, which reduces load on the southbound database at the expense of a bit more CPU and network activity on each node. This increases the ability to scale. See OVN bug https://bugzilla.redhat.com/1808125 for more details.
Anurag, you should see 1) oc rsh into one of the ovn-controller containers and run 'ovs-vsctl get Open_vSwitch . external-ids | grep monitor-all' you should see ovn-monitor-all=true 2) all ovn-node pods should start, leading to all nodes being Ready
@dcbw, this doesn't seem to present in latest nightly or CI. Can you reference the PR here? $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-04-02-101459 True False 33m Cluster version is 4.5.0-0.nightly-2020-04-02-101459 $ oc rsh -c ovn-controller ovnkube-node-fcs6r sh-4.2# ovs-vsctl get Open_vSwitch . external-ids {hostname="ip-10-0-174-13.ap-northeast-1.compute.internal", ovn-bridge-mappings="physnet:br-local", ovn-encap-ip="10.0.174.13", ovn-encap-type=geneve, ovn-nb="ssl:10.0.133.150:9641,ssl:10.0.156.112:9641,ssl:10.0.169.235:9641", ovn-openflow-probe-interval="180", ovn-remote="ssl:10.0.133.150:9642,ssl:10.0.156.112:9642,ssl:10.0.169.235:9642", ovn-remote-probe-interval="100000", rundir="/var/run/openvswitch", system-id="72e75ee7-269c-43a5-b64f-02652e46bc9d"} sh-4.2# ovs-vsctl get Open_vSwitch . external-ids | grep monitor-all sh-4.2# exit exit command terminated with exit code 1 # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.ci-2020-04-02-142911 True False 20m Cluster version is 4.5.0-0.ci-2020-04-02-142911 # oc exec -c ovn-controller ovnkube-node-mm9st -n openshift-ovn-kubernetes -- ovs-vsctl get Open_vSwitch . external-ids | grep monitor-all #
Sorry this one got convoluted. The original PR this bug was filed for was reverted. But we now have a *new* PR merged for release-4.5 that re-implements this in conjunction with OVN changes. https://github.com/openshift/ovn-kubernetes/pull/126 So if you retest with tomorrow's image, the validation instructions should still be correct and you should see monitor-all in the ovs-vsctl output. You just happened to test this bug during the revert window and before PR #126 landed. And we forgot to update the bug with that status. Sorry!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409