Created attachment 1679269 [details] ovn-controller.log with debug info Description of problem: When we were running rally scenario tests on OSP16 with OVN driver, system was loaded with neutron resources and networking ovn driver is frequently running (for 5 seconds) liveliness checks which updates "nb_cfg" and "external_ids" columns of SBDB Chassis table. It increments nb_cfg and also add a new timestamp in external_ids. _uuid : 925f3247-7132-48a9-8634-16f90d33f043 encaps : [82d43060-1071-4c06-8c74-1b8424205719] external_ids : {datapath-type="", iface-types="erspan,geneve,gre,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", "neutron:liveness_check_at"="2020-04-14T18:48:21.122794+00:00", ovn-bridge-mappings="datacentre:br-ex", ovn-chassis-mac-mappings="", ovn-cms-options=enable-chassis-as-gw} hostname : "controller-2.redhat.local" name : "059c6ee3-4700-43a0-a652-227d30fbcd1c" nb_cfg : 147126 transport_zones : [] vtep_logical_switches: [] In the compute node, it also updates external_ids with "neutron-metadata-proxy-networks" and "neutron:ovn-metadata-sb-cfg" keys. This triggers recalculation of flows in ovn-controllers resulting in slowness during programming flows for the VMs. I have attached ovn-controller.log, which shows the same 2020-04-16T07:03:05.468Z|00011|poll_loop(stopwatch2)|DBG|wakeup due to [POLLIN] on fd 17 (FIFO pipe:[1086516]) at lib/stopwatch.c:458 (0% CPU usage) 2020-04-16T07:03:05.477Z|00275|poll_loop|DBG|wakeup due to [POLLIN] on fd 19 (172.17.1.54:49426<->172.17.1.54:6642) at lib/stream-fd.c:157 (0% CPU usage) 2020-04-16T07:03:05.477Z|00276|jsonrpc|DBG|tcp:172.17.1.54:6642: received notification, method="update2", params=[["monid","OVN_Southbound"],{"Chassis":{"10d6b5e2-99c7-4a8c-a0be-b8a090b6e1cc":{"modify":{"external_ids":["map",[["neutron:metadata_liveness_check_at","2020-04-16T07:03:05.472814+00:00"]]]}}}}] 2020-04-16T07:03:05.477Z|00286|inc_proc_eng|DBG|node: SB_chassis, changed: 1 2020-04-16T07:03:05.477Z|00289|inc_proc_eng|DBG|node: runtime_data, recompute (triggered) 2020-04-16T07:03:05.478Z|00296|inc_proc_eng|DBG|node: runtime_data, changed: 1 2020-04-16T07:03:05.478Z|00305|inc_proc_eng|DBG|node: flow_output, recompute (triggered) 2020-04-16T07:03:05.478Z|00306|ofctrl|DBG|ofctrl_add_flow flow: sb_uuid=6088cb73-ee83-4275-b7dc-c0eb5fa03fe5, table_id=0, priority=100, in_port=2, actions=move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23],move:NXM_NX_TUN_METADATA0[16..30]->NXM_NX_REG14[0..14],move:NXM_NX_TUN_METADATA0[0..15]->NXM_NX_REG15[0..15],resubmit(,33) This setup has 3 computes and 3 controllers, with HA setup (no DVR). OSP16 puddle version: RHOS_TRUNK-16.0-RHEL-8-20200226.n.1
Thanks for opening this Anil, I'm marking this as duplicated of https://bugzilla.redhat.com/show_bug.cgi?id=1824220 because the later was already triaged by Dumitru. *** This bug has been marked as a duplicate of bug 1824220 ***