Bug 1795198 - [ovn] Too frequent health-checks that causes stress on ovsdb-servers
Summary: [ovn] Too frequent health-checks that causes stress on ovsdb-servers
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-networking-ovn
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: beta
: 16.1 (Train on RHEL 8.2)
Assignee: Lucas Alvares Gomes
QA Contact: Roman Safronov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-27 12:30 UTC by Daniel Alvarez Sanchez
Modified: 2020-11-09 06:09 UTC (History)
9 users (show)

Fixed In Version: python3-networking-ovn-7.1.1-0.20200507153427.fd1c0c3.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-29 07:50:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1861092 0 None None None 2020-01-28 10:45:50 UTC
Red Hat Product Errata RHBA-2020:3148 0 None None None 2020-07-29 07:50:53 UTC

Description Daniel Alvarez Sanchez 2020-01-27 12:30:30 UTC
Looks like neutron-server is pinging agents too frequently as per what's observed in the logs. nb-cfg being bumped at a non-fixed rate:


For example, in this part of the log I could find 11 updates in less than 2 minutes:

2020-01-27 12:23:04.247 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49008, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49007) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44
2020-01-27 12:23:05.179 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49009, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49008) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44
2020-01-27 12:23:32.216 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49010, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49009) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44
2020-01-27 12:23:41.248 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49011, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49010) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44
2020-01-27 12:23:42.183 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49012, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49011) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44
2020-01-27 12:24:09.210 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49013, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49012) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44
2020-01-27 12:24:18.252 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49014, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49013) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44
2020-01-27 12:24:19.179 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49015, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49014) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44
2020-01-27 12:24:46.205 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49016, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49015) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44
2020-01-27 12:24:55.254 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49017, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49016) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44
2020-01-27 12:24:56.177 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49018, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49017) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44


This is triggering too frequent writes from *all* metadata-agents and ovn-controllers in the cloud which creates a lot of traffic. At scale, this can be a problem.

Imagine a 500 node deployment, with one update per 10 seconds as in the example above. That will translate into 1K (1 metadata agent + 1 ovn-controller per node) write transactions into the SB database every 10 seconds so 100 transactions per second that trigger a JSON RPC command update to every single client into the cloud.

Comment 7 Roman Safronov 2020-06-09 15:30:54 UTC
Verified on puddle 16.1-RHEL-8/RHOS-16.1-RHEL-8-20200604.n.1  which uses python3-networking-ovn-7.2.1-0.20200603143459.4dfc438.el8ost.noarch.
Created networks, routers and vm instances.
Verified that ovnsb.db  changes every 30 secs only. 
Verified also that running "openstack network agent list"  is not triggering new nb_cfg changes.

Comment 8 Alex McLeod 2020-06-16 12:29:41 UTC
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to '-'.

Comment 10 errata-xmlrpc 2020-07-29 07:50:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3148


Note You need to log in before you can comment on or make changes to this bug.