Bug 2007501
| Summary: | [OVN Scale]when ovn scale to more than 500 LR, ovn-nbctl --wait=hv sync hang | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | ying xu <yinxu> |
| Component: | ovn-2021 | Assignee: | Mark Michelson <mmichels> |
| Status: | CLOSED NOTABUG | QA Contact: | ying xu <yinxu> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | FDP 21.H | CC: | ctrautma, jiji, mmichels |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-07-28 17:17:04 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
ying xu
2021-09-24 05:45:42 UTC
Thanks for the explanation and the reproducer. In your reproducer, I see some references to variables and functions that I'm not familiar with: * "$nic_test2" variable * "$FUNCNAME" variable * sync_set function * sync_wait function * rlRun function I think that even without knowing what these specific things are, we should be able to adapt the reproducer. When posting reproducers in the future, please try to provide these supplemental variables/functions. I am setting up a sandbox test to see if I can reproduce this, and I'm going to see if this same delay happens with upstream OVN master. I suspect that when you add the NAT as the final step, it results in a very inefficient ovn-northd loop as it tries to install flows relating to the NAT. There have been many optimizations in northd that will be present in OVN 21.09 when it is released. Specifically, I think this commit from Lorenzo may help fix the problem: https://github.com/ovn-org/ovn/commit/b3af6c8c442d824ad7646350adab40adb2d646f0 I will report back what my initial findings are from my sandbox tests. >* "$FUNCNAME" variable
Sorry, I didn't realize this was a BASH built-in variable. You can ignore this.
So far, I have been unable to reproduce the issue. I took the reproducer that you provided and transformed it into the following:
$ cat ./slow.sh
#!/bin/bash
ovn-nbctl set NB_GLOBAL . options:northd_probe_interval=180000
ovn-nbctl set connection . inactivity_probe=180000
ovs-vsctl set open . external_ids:ovn-openflow-probe-interval=180
ovs-vsctl set open . external_ids:ovn-remote-probe-interval=180000
ovn-sbctl set connection . inactivity_probe=180000
ovn-nbctl ls-add public
# r1
i=1
for m in `seq 0 9`;do
for n in `seq 1 99`;do
ovn-nbctl lr-add r${i}
ovn-nbctl lrp-add r${i} r${i}_public 00:de:ad:ff:$m:$n 172.16.$m.$n/16
ovn-nbctl lrp-add r${i} r${i}_s${i} 00:de:ad:fe:$m:$n 173.$m.$n.1/24
ovn-nbctl lr-nat-add r${i} dnat_and_snat 172.16.${m}.$((n+100)) 173.$m.$n.2
ovn-nbctl lrp-set-gateway-chassis r${i}_public hv1
# s1
ovn-nbctl ls-add s${i}
# s1 - r1
ovn-nbctl lsp-add s${i} s${i}_r${i}
ovn-nbctl lsp-set-type s${i}_r${i} router
ovn-nbctl lsp-set-addresses s${i}_r${i} "00:de:ad:fe:$m:$n 173.$m.$n.1"
ovn-nbctl lsp-set-options s${i}_r${i} router-port=r${i}_s${i}
# s1 - vm1
ovn-nbctl lsp-add s$i vm$i
ovn-nbctl lsp-set-addresses vm$i "00:de:ad:01:$m:$n 173.$m.$n.2"
ovn-nbctl lsp-add public public_r${i}
ovn-nbctl lsp-set-type public_r${i} router
ovn-nbctl lsp-set-addresses public_r${i} router
ovn-nbctl lsp-set-options public_r${i} router-port=r${i}_public nat-addresses=router
let i++
if [ $i -gt 600 ];then
break;
fi
done
if [ $i -gt 600 ];then
break;
fi
done
ovn-nbctl lsp-add public ln_p1
ovn-nbctl lsp-set-addresses ln_p1 unknown
ovn-nbctl lsp-set-type ln_p1 localnet
ovn-nbctl lsp-set-options ln_p1 network_name=nattest
ovn-nbctl show
ovn-sbctl show
#add host vm1
ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal
ovs-vsctl set Interface vm1 external_ids:iface-id=vm1
ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal
ovs-vsctl set Interface vm2 external_ids:iface-id=vm2
ovs-vsctl add-port br-int vm3 -- set interface vm3 type=internal
ovs-vsctl set Interface vm3 external_ids:iface-id=vm3
ovn-nbctl lr-nat-del r1 dnat_and_snat 172.16.0.101
ovn-nbctl lr-nat-add r1 dnat_and_snat 172.16.0.101 173.0.1.2 vm1 00:00:00:01:02:03
ovn-nbctl lr-nat-list r1
--------------------------------
The key differences:
1) This does not create any network namespaces or ip links. Simply adding a port to br-int and setting external_ids:iface-id to the logical switch port is enough to get flows to install.
2) I removed the sync_set and sync_wait calls since I don't know what those do.
3) I changed the "rlRun" invocations to just call ovn-nbctl directly since I do not know what rlRun does.
4) I didn't create a nat_test bridge, since again this does not affect how flows get installed. [1]
5) I removed a line that was trying to create the r{$i}_public logical router port a second time.
--------------------------------
I ran the script above in a sandbox environment (`make sandbox` from the OVN source). After the script completes, I immediately run
$ time ovn-nbctl --wait=hv sync
real 0m51.013s
user 0m0.001s
sys 0m0.004s
If I then run the command again, I see
$ time ovn-nbctl --wait=hv sync
real 0m11.409s
user 0m0.003s
sys 0m0.005s
And the times stay consistent after that. This seems to indicate that ovn-northd or ovn-controller is busy when I attempt the first sync. The sync takes longer because I have to wait for OVN to even start processing the sync command. After that, the cluster goes idle, so each additional sync is quicker and more consistent.
For reference, I checked to ensure the OpenFlow flows were being installed as expected:
$ ovs-ofctl dump-flows br-int | wc -l
849586
Let's try to work out the differences in our setups:
1) How many hypervisors are in your cluster? In my sandbox, there is just one.
2) Did you actually only bind vm1, vm2, and vm3, or did you actually bind more during your test?
If you can provide some help, I can try to work on this more.
One final question: is this a regression with FDP 21.H? Did this same test pass properly with FDP 21.G?
[1] I wasn't 100% sure if this would affect things, so I did attempt adding the nat_test bridge, along with an ovn-egress-iface port. This had no effect during my test.
(In reply to Mark Michelson from comment #5) > So far, I have been unable to reproduce the issue. I took the reproducer > that you provided and transformed it into the following: > > $ cat ./slow.sh > #!/bin/bash > > ovn-nbctl set NB_GLOBAL . options:northd_probe_interval=180000 > ovn-nbctl set connection . inactivity_probe=180000 > ovs-vsctl set open . external_ids:ovn-openflow-probe-interval=180 > ovs-vsctl set open . external_ids:ovn-remote-probe-interval=180000 > ovn-sbctl set connection . inactivity_probe=180000 > > ovn-nbctl ls-add public > > # r1 > i=1 > for m in `seq 0 9`;do > for n in `seq 1 99`;do > ovn-nbctl lr-add r${i} > ovn-nbctl lrp-add r${i} r${i}_public 00:de:ad:ff:$m:$n 172.16.$m.$n/16 > ovn-nbctl lrp-add r${i} r${i}_s${i} 00:de:ad:fe:$m:$n 173.$m.$n.1/24 > ovn-nbctl lr-nat-add r${i} dnat_and_snat 172.16.${m}.$((n+100)) 173.$m.$n.2 > ovn-nbctl lrp-set-gateway-chassis r${i}_public hv1 > > # s1 > ovn-nbctl ls-add s${i} > > # s1 - r1 > ovn-nbctl lsp-add s${i} s${i}_r${i} > ovn-nbctl lsp-set-type s${i}_r${i} router > ovn-nbctl lsp-set-addresses s${i}_r${i} "00:de:ad:fe:$m:$n 173.$m.$n.1" > ovn-nbctl lsp-set-options s${i}_r${i} router-port=r${i}_s${i} > > # s1 - vm1 > ovn-nbctl lsp-add s$i vm$i > ovn-nbctl lsp-set-addresses vm$i "00:de:ad:01:$m:$n 173.$m.$n.2" > > ovn-nbctl lsp-add public public_r${i} > ovn-nbctl lsp-set-type public_r${i} router > ovn-nbctl lsp-set-addresses public_r${i} router > ovn-nbctl lsp-set-options public_r${i} router-port=r${i}_public > nat-addresses=router > let i++ > if [ $i -gt 600 ];then > break; > fi > done > if [ $i -gt 600 ];then > break; > fi > done > > ovn-nbctl lsp-add public ln_p1 > ovn-nbctl lsp-set-addresses ln_p1 unknown > ovn-nbctl lsp-set-type ln_p1 localnet > ovn-nbctl lsp-set-options ln_p1 network_name=nattest > > ovn-nbctl show > ovn-sbctl show > > #add host vm1 > ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal > ovs-vsctl set Interface vm1 external_ids:iface-id=vm1 > > ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal > ovs-vsctl set Interface vm2 external_ids:iface-id=vm2 > > ovs-vsctl add-port br-int vm3 -- set interface vm3 type=internal > ovs-vsctl set Interface vm3 external_ids:iface-id=vm3 > > ovn-nbctl lr-nat-del r1 dnat_and_snat 172.16.0.101 > ovn-nbctl lr-nat-add r1 dnat_and_snat 172.16.0.101 173.0.1.2 vm1 > 00:00:00:01:02:03 > ovn-nbctl lr-nat-list r1 > -------------------------------- > > The key differences: > 1) This does not create any network namespaces or ip links. Simply adding a > port to br-int and setting external_ids:iface-id to the logical switch port > is enough to get flows to install. > 2) I removed the sync_set and sync_wait calls since I don't know what those > do. > 3) I changed the "rlRun" invocations to just call ovn-nbctl directly since I > do not know what rlRun does. > 4) I didn't create a nat_test bridge, since again this does not affect how > flows get installed. [1] > 5) I removed a line that was trying to create the r{$i}_public logical > router port a second time. > > -------------------------------- > > I ran the script above in a sandbox environment (`make sandbox` from the OVN > source). After the script completes, I immediately run > > $ time ovn-nbctl --wait=hv sync > > real 0m51.013s > user 0m0.001s > sys 0m0.004s > > If I then run the command again, I see > > $ time ovn-nbctl --wait=hv sync > > real 0m11.409s > user 0m0.003s > sys 0m0.005s > > And the times stay consistent after that. This seems to indicate that > ovn-northd or ovn-controller is busy when I attempt the first sync. The sync > takes longer because I have to wait for OVN to even start processing the > sync command. After that, the cluster goes idle, so each additional sync is > quicker and more consistent. > > For reference, I checked to ensure the OpenFlow flows were being installed > as expected: > > $ ovs-ofctl dump-flows br-int | wc -l > 849586 > > Let's try to work out the differences in our setups: > 1) How many hypervisors are in your cluster? In my sandbox, there is just > one. I am sorry for the confused configuration.you can skip them. I have two systems, server is for ovn-central and controller,client is for another controller. I think you should add the nat_test just like my reproducer > 2) Did you actually only bind vm1, vm2, and vm3, or did you actually bind > more during your test? yes,I only bind vm0 to vm3, 4 vms in total. > > If you can provide some help, I can try to work on this more. > > One final question: is this a regression with FDP 21.H? Did this same test > pass properly with FDP 21.G? > > [1] I wasn't 100% sure if this would affect things, so I did attempt adding > the nat_test bridge, along with an ovn-egress-iface port. This had no effect > during my test. and I can see error in log,I am not sure if this would affect. 2021-09-29T01:57:10.489Z|01319|ofctrl|INFO|OpenFlow error: OFPT_ERROR (OF1.5) (xid=0xeaa31): OFPBFC_TIMEOUT OFPT_BUNDLE_CONTROL (OF1.5) (xid=0xeaa31): bundle_id=0x4e4 type=OPEN_REQUEST flags=atomic ordered > I have two systems, server is for ovn-central and controller,client is for > another controller. > I think you should add the nat_test just like my reproducer If I add a minimal nat_test bridge, then my script now looks like this: $ cat slow.sh #!/bin/bash ovs-vsctl add-br nat_test ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=nattest:nat_test ovs-vsctl add-port nat_test external_port ovs-vsctl set interface external_port external-ids:ovn-egress-iface=true ovn-nbctl set NB_GLOBAL . options:northd_probe_interval=180000 ovn-nbctl set connection . inactivity_probe=180000 ovs-vsctl set open . external_ids:ovn-openflow-probe-interval=180 ovs-vsctl set open . external_ids:ovn-remote-probe-interval=180000 ovn-sbctl set connection . inactivity_probe=180000 ovn-nbctl ls-add public # r1 i=1 for m in `seq 0 9`;do for n in `seq 1 99`;do ovn-nbctl lr-add r${i} ovn-nbctl lrp-add r${i} r${i}_public 00:de:ad:ff:$m:$n 172.16.$m.$n/16 ovn-nbctl lrp-add r${i} r${i}_s${i} 00:de:ad:fe:$m:$n 173.$m.$n.1/24 ovn-nbctl lr-nat-add r${i} dnat_and_snat 172.16.${m}.$((n+100)) 173.$m.$n.2 ovn-nbctl lrp-set-gateway-chassis r${i}_public hv1 # s1 ovn-nbctl ls-add s${i} # s1 - r1 ovn-nbctl lsp-add s${i} s${i}_r${i} ovn-nbctl lsp-set-type s${i}_r${i} router ovn-nbctl lsp-set-addresses s${i}_r${i} "00:de:ad:fe:$m:$n 173.$m.$n.1" ovn-nbctl lsp-set-options s${i}_r${i} router-port=r${i}_s${i} # s1 - vm1 ovn-nbctl lsp-add s$i vm$i ovn-nbctl lsp-set-addresses vm$i "00:de:ad:01:$m:$n 173.$m.$n.2" ovn-nbctl lsp-add public public_r${i} ovn-nbctl lsp-set-type public_r${i} router ovn-nbctl lsp-set-addresses public_r${i} router ovn-nbctl lsp-set-options public_r${i} router-port=r${i}_public nat-addresses=router let i++ if [ $i -gt 600 ];then break; fi done if [ $i -gt 600 ];then break; fi done ovn-nbctl lsp-add public ln_p1 ovn-nbctl lsp-set-addresses ln_p1 unknown ovn-nbctl lsp-set-type ln_p1 localnet ovn-nbctl lsp-set-options ln_p1 network_name=nattest ovn-nbctl show ovn-sbctl show #add host vm1 ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal ovs-vsctl set Interface vm1 external_ids:iface-id=vm1 ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal ovs-vsctl set Interface vm2 external_ids:iface-id=vm2 ovs-vsctl add-port br-int vm3 -- set interface vm3 type=internal ovs-vsctl set Interface vm3 external_ids:iface-id=vm3 ##set provide network ovs-vsctl add-br nat_test ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=nattest:nat_test ovs-vsctl add-port nat_test vm0 -- set interface vm0 type=internal ovs-vsctl set Interface vm0 external_ids:iface-id=vm0 ovn-nbctl lr-nat-del r1 dnat_and_snat 172.16.0.101 ovn-nbctl lr-nat-add r1 dnat_and_snat 172.16.0.101 173.0.1.2 vm1 00:00:00:01:02:03 ovn-nbctl lr-nat-list r1 -------------------------------------------- With this script, I see the same results I saw when trying to reproduce the issue yesterday. > > 2) Did you actually only bind vm1, vm2, and vm3, or did you actually bind > > more during your test? > > yes,I only bind vm0 to vm3, 4 vms in total. You can see in my updated script above I'm binding the same VMs. vm1, vm2, and vm3 are added to br-int, and vm0 is added to nat_test. > > > > > > If you can provide some help, I can try to work on this more. > > > > One final question: is this a regression with FDP 21.H? Did this same test > > pass properly with FDP 21.G? Could you please provide an answer to this question? If the behavior was the same in 21.G, then there is no need for us to update the 21.H build to fix this issue. We can just fix this before 21.I is released. > > > and I can see error in log,I am not sure if this would affect. > 2021-09-29T01:57:10.489Z|01319|ofctrl|INFO|OpenFlow error: OFPT_ERROR > (OF1.5) (xid=0xeaa31): OFPBFC_TIMEOUT > OFPT_BUNDLE_CONTROL (OF1.5) (xid=0xeaa31): > bundle_id=0x4e4 type=OPEN_REQUEST flags=atomic ordered For the record, I do not see these in my test. BUNDLE_CONTROL messages are used to install the flows from ovn-controller into ovs. The fact that it's timing out likely means: a) The number of flows we're trying to install is VERY high. b) ovs-vswitchd is too busy to be able to handle the requests from OVN. When you run this test, what do you see when you run `ovs-ofctl dump-flows br-int | wc -l` ? If you run the command multiple times, does it produce different results? I asked for some help from the rest of the core OVN team. One suggestion is that ovn-northd might be crashing for some reason. Do you see any core dumps on the system running ovn-northd? If so, getting a core dump (or even just a backtrace) from one of the crashed processes would be helpful. (In reply to Mark Michelson from comment #7) > > I have two systems, server is for ovn-central and controller,client is for > > another controller. > > I think you should add the nat_test just like my reproducer > > If I add a minimal nat_test bridge, then my script now looks like this: > > $ cat slow.sh > #!/bin/bash > > ovs-vsctl add-br nat_test > ovs-vsctl set Open_vSwitch . > external-ids:ovn-bridge-mappings=nattest:nat_test > ovs-vsctl add-port nat_test external_port > ovs-vsctl set interface external_port external-ids:ovn-egress-iface=true > > ovn-nbctl set NB_GLOBAL . options:northd_probe_interval=180000 > ovn-nbctl set connection . inactivity_probe=180000 > ovs-vsctl set open . external_ids:ovn-openflow-probe-interval=180 > ovs-vsctl set open . external_ids:ovn-remote-probe-interval=180000 > ovn-sbctl set connection . inactivity_probe=180000 > > ovn-nbctl ls-add public > > # r1 > i=1 > for m in `seq 0 9`;do > for n in `seq 1 99`;do > ovn-nbctl lr-add r${i} > ovn-nbctl lrp-add r${i} r${i}_public 00:de:ad:ff:$m:$n 172.16.$m.$n/16 > ovn-nbctl lrp-add r${i} r${i}_s${i} 00:de:ad:fe:$m:$n 173.$m.$n.1/24 > ovn-nbctl lr-nat-add r${i} dnat_and_snat 172.16.${m}.$((n+100)) 173.$m.$n.2 > ovn-nbctl lrp-set-gateway-chassis r${i}_public hv1 > > # s1 > ovn-nbctl ls-add s${i} > > # s1 - r1 > ovn-nbctl lsp-add s${i} s${i}_r${i} > ovn-nbctl lsp-set-type s${i}_r${i} router > ovn-nbctl lsp-set-addresses s${i}_r${i} "00:de:ad:fe:$m:$n 173.$m.$n.1" > ovn-nbctl lsp-set-options s${i}_r${i} router-port=r${i}_s${i} > > # s1 - vm1 > ovn-nbctl lsp-add s$i vm$i > ovn-nbctl lsp-set-addresses vm$i "00:de:ad:01:$m:$n 173.$m.$n.2" > > ovn-nbctl lsp-add public public_r${i} > ovn-nbctl lsp-set-type public_r${i} router > ovn-nbctl lsp-set-addresses public_r${i} router > ovn-nbctl lsp-set-options public_r${i} router-port=r${i}_public > nat-addresses=router > let i++ > if [ $i -gt 600 ];then > break; > fi > done > if [ $i -gt 600 ];then > break; > fi > done > > ovn-nbctl lsp-add public ln_p1 > ovn-nbctl lsp-set-addresses ln_p1 unknown > ovn-nbctl lsp-set-type ln_p1 localnet > ovn-nbctl lsp-set-options ln_p1 network_name=nattest > > ovn-nbctl show > ovn-sbctl show > > #add host vm1 > ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal > ovs-vsctl set Interface vm1 external_ids:iface-id=vm1 > > ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal > ovs-vsctl set Interface vm2 external_ids:iface-id=vm2 > > ovs-vsctl add-port br-int vm3 -- set interface vm3 type=internal > ovs-vsctl set Interface vm3 external_ids:iface-id=vm3 > > ##set provide network > ovs-vsctl add-br nat_test > ovs-vsctl set Open_vSwitch . > external-ids:ovn-bridge-mappings=nattest:nat_test > ovs-vsctl add-port nat_test vm0 -- set interface vm0 type=internal > ovs-vsctl set Interface vm0 external_ids:iface-id=vm0 > > ovn-nbctl lr-nat-del r1 dnat_and_snat 172.16.0.101 > ovn-nbctl lr-nat-add r1 dnat_and_snat 172.16.0.101 173.0.1.2 vm1 > 00:00:00:01:02:03 > ovn-nbctl lr-nat-list r1 > -------------------------------------------- > > With this script, I see the same results I saw when trying to reproduce the > issue yesterday. > > > > 2) Did you actually only bind vm1, vm2, and vm3, or did you actually bind > > > more during your test? > > > > yes,I only bind vm0 to vm3, 4 vms in total. > > You can see in my updated script above I'm binding the same VMs. vm1, vm2, > and vm3 are added to br-int, and vm0 is added to nat_test. > > > > > > > > > > > If you can provide some help, I can try to work on this more. > > > > > > One final question: is this a regression with FDP 21.H? Did this same test > > > pass properly with FDP 21.G? > > Could you please provide an answer to this question? If the behavior was the > same in 21.G, then there is no need for us to update the 21.H build to fix > this issue. We can just fix this before 21.I is released. This is fine in 21.G. I think it might be a regression. > > > > > > > and I can see error in log,I am not sure if this would affect. > > 2021-09-29T01:57:10.489Z|01319|ofctrl|INFO|OpenFlow error: OFPT_ERROR > > (OF1.5) (xid=0xeaa31): OFPBFC_TIMEOUT > > OFPT_BUNDLE_CONTROL (OF1.5) (xid=0xeaa31): > > bundle_id=0x4e4 type=OPEN_REQUEST flags=atomic ordered > > For the record, I do not see these in my test. BUNDLE_CONTROL messages are > used to install the flows from ovn-controller into ovs. The fact that it's > timing out likely means: > > a) The number of flows we're trying to install is VERY high. > b) ovs-vswitchd is too busy to be able to handle the requests from OVN. > > When you run this test, what do you see when you run `ovs-ofctl dump-flows > br-int | wc -l` ? If you run the command multiple times, does it produce > different results? # ovs-ofctl dump-flows br-int | wc -l 2296447 # ovs-ofctl dump-flows br-int | wc -l 2296455 # ovs-ofctl dump-flows br-int | wc -l 2021-09-30T02:36:42Z|00001|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br-int.mgmt) at ../lib/stream-fd.c:157 (98% CPU usage) 2021-09-30T02:36:42Z|00002|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br-int.mgmt) at ../lib/stream-fd.c:157 (98% CPU usage) 2021-09-30T02:36:42Z|00003|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br-int.mgmt) at ../lib/stream-fd.c:157 (98% CPU usage) 2021-09-30T02:36:42Z|00004|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br-int.mgmt) at ../lib/stream-fd.c:157 (98% CPU usage) 2021-09-30T02:36:42Z|00005|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br-int.mgmt) at ../lib/stream-fd.c:157 (98% CPU usage) 2021-09-30T02:36:42Z|00006|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br-int.mgmt) at ../lib/stream-fd.c:157 (98% CPU usage) 2021-09-30T02:36:42Z|00007|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br-int.mgmt) at ../lib/stream-fd.c:157 (98% CPU usage) 2021-09-30T02:36:42Z|00008|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br-int.mgmt) at ../lib/stream-fd.c:157 (98% CPU usage) 2021-09-30T02:36:42Z|00009|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br-int.mgmt) at ../lib/stream-fd.c:157 (98% CPU usage) 2021-09-30T02:36:42Z|00010|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br-int.mgmt) at ../lib/stream-fd.c:157 (98% CPU usage) 2021-09-30T02:36:48Z|00011|poll_loop|INFO|Dropped 288 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate 2021-09-30T02:36:48Z|00012|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br-int.mgmt) at ../lib/stream-fd.c:157 (97% CPU usage) 2296455 I tried one more time in 21.G, it takes a long time to sync this time. the flows is about to 2 million . maybe it is normal to take a long time, just because of too many flows. I'm closing this because 1) It's not clear if this is actually bad behavior/a regression. 2) In newer versions of OVN, we should see considerably faster operation because a) We've made many performance improvements in ovn-northd b) We've reduced the number of logical flows required for many setups |