Bug 1814100 - [scale] enable monitor-all to reduce load on southbound database
Summary: [scale] enable monitor-all to reduce load on southbound database
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.5.0
Assignee: Ben Bennett
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks: 1814099
TreeView+ depends on / blocked
 
Reported: 2020-03-17 03:00 UTC by Dan Williams
Modified: 2020-08-04 18:05 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1814099
Environment:
Last Closed: 2020-08-04 18:05:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-08-04 18:05:43 UTC

Description Dan Williams 2020-03-17 03:00:10 UTC
+++ This bug was initially created as a clone of Bug #1814099 +++

+++ This bug was initially created as a clone of Bug #1814098 +++

Setting monitor-all=true in each node's ovsdb causes each ovn-controller to monitor all chassis events, which reduces load on the southbound database at the expense of a bit more CPU and network activity on each node. This increases the ability to scale.

See OVN bug https://bugzilla.redhat.com/1808125 for more details.

Comment 4 Dan Williams 2020-03-23 18:57:08 UTC
Anurag, you should see

1) oc rsh into one of the ovn-controller containers and run 'ovs-vsctl get Open_vSwitch .  external-ids | grep monitor-all' you should see ovn-monitor-all=true
2) all ovn-node pods should start, leading to all nodes being Ready

Comment 5 Anurag saxena 2020-04-02 18:53:44 UTC
@dcbw, this doesn't seem to present in latest nightly or CI. Can you reference the PR here?

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-04-02-101459   True        False         33m     Cluster version is 4.5.0-0.nightly-2020-04-02-101459

$ oc rsh -c ovn-controller ovnkube-node-fcs6r
sh-4.2# ovs-vsctl get Open_vSwitch .  external-ids
{hostname="ip-10-0-174-13.ap-northeast-1.compute.internal", ovn-bridge-mappings="physnet:br-local", ovn-encap-ip="10.0.174.13", ovn-encap-type=geneve, ovn-nb="ssl:10.0.133.150:9641,ssl:10.0.156.112:9641,ssl:10.0.169.235:9641", ovn-openflow-probe-interval="180", ovn-remote="ssl:10.0.133.150:9642,ssl:10.0.156.112:9642,ssl:10.0.169.235:9642", ovn-remote-probe-interval="100000", rundir="/var/run/openvswitch", system-id="72e75ee7-269c-43a5-b64f-02652e46bc9d"}
sh-4.2# ovs-vsctl get Open_vSwitch .  external-ids | grep monitor-all
sh-4.2# exit
exit
command terminated with exit code 1


# oc get clusterversion
NAME      VERSION                        AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.ci-2020-04-02-142911   True        False         20m     Cluster version is 4.5.0-0.ci-2020-04-02-142911

# oc exec -c ovn-controller ovnkube-node-mm9st -n openshift-ovn-kubernetes -- ovs-vsctl get Open_vSwitch .  external-ids | grep monitor-all
#

Comment 6 Dan Williams 2020-04-02 21:08:40 UTC
Sorry this one got convoluted.

The original PR this bug was filed for was reverted. But we now have a *new* PR merged for release-4.5 that re-implements this in conjunction with OVN changes.

https://github.com/openshift/ovn-kubernetes/pull/126

So if you retest with tomorrow's image, the validation instructions should still be correct and you should see monitor-all in the ovs-vsctl output.

You just happened to test this bug during the revert window and before PR #126 landed. And we forgot to update the bug with that status. Sorry!

Comment 9 errata-xmlrpc 2020-08-04 18:05:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.