Bug 2007501 - [OVN Scale]when ovn scale to more than 500 LR, ovn-nbctl --wait=hv sync hang
Summary: [OVN Scale]when ovn scale to more than 500 LR, ovn-nbctl --wait=hv sync hang
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovn-2021
Version: FDP 21.H
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Mark Michelson
QA Contact: ying xu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-24 05:45 UTC by ying xu
Modified: 2023-07-28 17:17 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-07-28 17:17:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-1573 0 None None None 2021-09-24 05:49:14 UTC

Description ying xu 2021-09-24 05:45:42 UTC
Description of problem:
when ovn scale to more than 500 LR, ovn-nbctl --wait=hv sync hang
it is found in the case:bz1776712_broadcast_limit
which is related to bug1776712

Version-Release number of selected component (if applicable):
ovn-2021-21.06.0-29.el8fdp.x86_64

How reproducible:
always

Steps to Reproduce:
client:
		ovs-vsctl add-br nat_test
		ip link set nat_test up
		ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=nattest:nat_test
		ovs-vsctl add-port nat_test $nic_test2
		ip link set $nic_test2 up
		ovs-vsctl set interface $nic_test2 external-ids:ovn-egress-iface=true


server:
		ovn-nbctl set NB_GLOBAL . options:northd_probe_interval=180000
		ovn-nbctl set connection . inactivity_probe=180000
		ovs-vsctl set open . external_ids:ovn-openflow-probe-interval=180
		ovs-vsctl set open . external_ids:ovn-remote-probe-interval=180000
		ovn-sbctl set connection . inactivity_probe=180000

		ovn-nbctl ls-add public

		# r1
		i=1
	for m in `seq 0 9`;do
		for n in `seq 1 99`;do
		ovn-nbctl lr-add r${i}
		ovn-nbctl lrp-add r${i} r${i}_public 00:de:ad:ff:$m:$n 172.16.$m.$n/16
		ovn-nbctl lrp-add r${i} r${i}_s${i} 00:de:ad:fe:$m:$n 173.$m.$n.1/24
		ovn-nbctl lr-nat-add r${i} dnat_and_snat 172.16.${m}.$((n+100)) 173.$m.$n.2
		ovn-nbctl lrp-set-gateway-chassis r${i}_public hv1

		# s1
		ovn-nbctl ls-add s${i}

		# s1 - r1
		ovn-nbctl lsp-add s${i} s${i}_r${i}
		ovn-nbctl lsp-set-type s${i}_r${i} router
		ovn-nbctl lsp-set-addresses s${i}_r${i} "00:de:ad:fe:$m:$n 173.$m.$n.1"
		ovn-nbctl lsp-set-options s${i}_r${i} router-port=r${i}_s${i}

		# s1 - vm1
		ovn-nbctl lsp-add s$i vm$i
		ovn-nbctl lsp-set-addresses vm$i "00:de:ad:01:$m:$n 173.$m.$n.2"
		ovn-nbctl lrp-add r$i r${i}_public 40:44:00:00:$m:$n 172.16.$m.$n/16

		ovn-nbctl lsp-add public public_r${i}
		ovn-nbctl lsp-set-type public_r${i} router
		ovn-nbctl lsp-set-addresses public_r${i} router
		ovn-nbctl lsp-set-options public_r${i} router-port=r${i}_public nat-addresses=router
		let i++
		if [ $i -gt 600 ];then
			break;
		fi
		done
		if [ $i -gt 600 ];then
			break;
		fi
	done
		ovn-nbctl lsp-add public ln_p1
		ovn-nbctl lsp-set-addresses ln_p1 unknown
		ovn-nbctl lsp-set-type ln_p1 localnet
		ovn-nbctl lsp-set-options ln_p1 network_name=nattest

		ovn-nbctl show
		ovn-sbctl show

		sync_set client $FUNCNAME
		sync_wait client $FUNCNAME

		ovs-vsctl show
	
		#add host vm1
		ip netns add vm1
		ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal
		ip link set vm1 netns vm1
		ip netns exec vm1 ip link set vm1 address 00:de:ad:01:00:01
		ip netns exec vm1 ip addr add 173.0.1.2/24 dev vm1
		ip netns exec vm1 ip link set vm1 up
		ovs-vsctl set Interface vm1 external_ids:iface-id=vm1
		
		ip netns add vm2
		ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal
		ip link set vm2 netns vm2
		ip netns exec vm2 ip link set vm2 address 00:de:ad:01:00:02
		ip netns exec vm2 ip addr add 173.0.2.2/24 dev vm2
		ip netns exec vm2 ip link set vm2 up
		ovs-vsctl set Interface vm2 external_ids:iface-id=vm2

		ip netns add vm3
		ovs-vsctl add-port br-int vm3 -- set interface vm3 type=internal
		ip link set vm3 netns vm3
		ip netns exec vm3 ip link set vm3 address 00:de:ad:01:00:03
		ip netns exec vm3 ip addr add 173.0.3.2/24 dev vm3
		ip netns exec vm3 ip link set vm3 up
		ovs-vsctl set Interface vm3 external_ids:iface-id=vm3

		#set provide network
		ovs-vsctl add-br nat_test
		ip link set nat_test up
		ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=nattest:nat_test
		#ovs-vsctl add-port nat_test $nic_test2
		#ip link set $nic_test2 up
		ip netns add vm0
		ip link set vm0 netns vm0
		ip netns exec vm0 ip link set vm0 address 00:00:00:00:00:01
		ip netns exec vm0 ip addr add 172.16.0.100/16 dev vm0
		ip netns exec vm0 ip link set vm0 up
		ovs-vsctl add-port nat_test vm0 -- set interface vm0 type=internal
		ip link set vm0 netns vm0
		ip netns exec vm0 ip link set vm0 address 00:00:00:00:00:01
		ip netns exec vm0 ip addr add 172.16.0.100/16 dev vm0
		ip netns exec vm0 ip link set vm0 up
		ovs-vsctl set Interface vm0 external_ids:iface-id=vm0
		ip netns exec vm1 ip route add default via 173.0.1.1
		ip netns exec vm2 ip route add default via 173.0.2.1
		ip netns exec vm3 ip route add default via 173.0.3.1
		rlRun "ovn-nbctl show"
		rlRun "ovs-vsctl show"

		rlRun "ovn-nbctl lr-nat-del r1 dnat_and_snat 172.16.0.101"
		rlRun "ovn-nbctl lr-nat-add r1 dnat_and_snat 172.16.0.101 173.0.1.2 vm1 00:00:00:01:02:03"
		rlRun "ovn-nbctl lr-nat-list r1"


then do""ovn-nbctl --wait=sb sync" and "ovn-nbctl --wait=hv sync"
ovn-nbctl --wait=hv sync'
^C2021-09-24T04:04:01Z|00001|fatal_signal|WARN|terminating with signal 2 (Interrupt)

[root@dell-per740-53 bz1776712_broadcast_limit]# top

top - 00:04:12 up 20:47,  1 user,  load average: 1.95, 2.19, 2.10
Tasks: 520 total,   2 running, 518 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.1 us,  0.0 sy,  0.0 ni, 97.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 128155.8 total,  87556.6 free,  39380.2 used,   1219.0 buff/cache
MiB Swap:   4096.0 total,   4096.0 free,      0.0 used.  87865.4 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                
  73205 openvsw+  10 -10 1263852   1.2g   3256 R  99.3   0.9  10:26.86 ovn-northd                                                                             
  73138 openvsw+  10 -10 2712836   2.4g  40916 S   0.3   1.9   0:55.47 ovs-vswitchd                                                                           
      1 root      20   0  189256  16644   9752 S   0.0   0.0   0:06.42 systemd                                                                                
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.01 kthreadd                                                                               
      3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp                                                                                 
      4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp                                                                             
      6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0H-events_highpri                                                            
      9 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu_wq                                                                           
     10 root      20   0       0      0      0 S   0.0   0.0   0:04.30 ksoftirqd/0                                                                            
     11 root      20   0       0      0      0 I   0.0   0.0   0:08.62 rcu_sched                                                                              
     12 root      rt   0       0      0      0 S   0.0   0.0   0:00.00 migration/0                                                                            
     13 root      rt   0       0      0      0 S   0.0   0.0   0:00.02 watchdog/0                                                                             
     14 root      20   0       0      0      0 S   0.0   0.0   0:00.01 cpuhp/0                                                                                
     15 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/1                                                                                
     16 root      rt   0       0      0      0 S   0.0   0.0   0:00.05 watchdog/1                                                                             
     17 root      rt   0       0      0      0 S   0.0   0.0   0:00.00 migration/1                                                                            
     18 root      20   0       0      0      0 S   0.0   0.0   0:00.00 ksoftirqd/1                                                                            
     19 root      20   0       0      0      0 I   0.0   0.0   0:00.00 kworker/1:0                                                                            
     20 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/1:0H                                                                           
     21 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/2                                                                                
     22 root      rt   0       0      0      0 S   0.0   0.0   0:00.06 watchdog/2                                                                             
     23 root      rt   0       0      0      0 S   0.0   0.0   0:00.00 migration/2                                                                            
     24 root      20   0       0      0      0 S   0.0   0.0   0:00.00 ksoftirqd/2                                                                            
     25 root      20   0       0      0      0 I   0.0   0.0   0:00.00 kworker/2:0                                                                            
     26 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/2:0H                                                                           
     27 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/3                                                                                
     28 root      rt   0       0      0      0 S   0.0   0.0   0:00.04 watchdog/3                                                                             
     29 root      rt   0       0      0      0 S   0.0   0.0   0:00.00 migration/3                                                                            
     30 root      20   0       0      0      0 S   0.0   0.0   0:00.00 ksoftirqd/3                                                                            
     31 root      20   0       0      0      0 I   0.0   0.0   0:00.00 kworker/3:0                                                                            
     32 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/3:0H          




Actual results:
hang for a long time(more than 10 minutes,or even more)

Expected results:
after some time the command should be done.(not a very long time)

Additional info:
I tried the case on version ovn-2021-21.06.0-24.el8fdp.x86_64,it is fine.

Comment 3 Mark Michelson 2021-09-28 15:11:32 UTC
Thanks for the explanation and the reproducer. In your reproducer, I see some references to variables and functions that I'm not familiar with:
* "$nic_test2" variable
* "$FUNCNAME" variable
* sync_set function
* sync_wait function
* rlRun function

I think that even without knowing what these specific things are, we should be able to adapt the reproducer. When posting reproducers in the future, please try to provide these supplemental variables/functions.

I am setting up a sandbox test to see if I can reproduce this, and I'm going to see if this same delay happens with upstream OVN master. I suspect that when you add the NAT as the final step, it results in a very inefficient ovn-northd loop as it tries to install flows relating to the NAT. There have been many optimizations in northd that will be present in OVN 21.09 when it is released. Specifically, I think this commit from Lorenzo may help fix the problem: https://github.com/ovn-org/ovn/commit/b3af6c8c442d824ad7646350adab40adb2d646f0

I will report back what my initial findings are from my sandbox tests.

Comment 4 Mark Michelson 2021-09-28 15:12:20 UTC
>* "$FUNCNAME" variable

Sorry, I didn't realize this was a BASH built-in variable. You can ignore this.

Comment 5 Mark Michelson 2021-09-28 19:26:50 UTC
So far, I have been unable to reproduce the issue. I took the reproducer that you provided and transformed it into the following:

$ cat ./slow.sh
#!/bin/bash

ovn-nbctl set NB_GLOBAL . options:northd_probe_interval=180000
ovn-nbctl set connection . inactivity_probe=180000
ovs-vsctl set open . external_ids:ovn-openflow-probe-interval=180
ovs-vsctl set open . external_ids:ovn-remote-probe-interval=180000
ovn-sbctl set connection . inactivity_probe=180000

ovn-nbctl ls-add public

# r1
i=1
for m in `seq 0 9`;do
	for n in `seq 1 99`;do
		ovn-nbctl lr-add r${i}
		ovn-nbctl lrp-add r${i} r${i}_public 00:de:ad:ff:$m:$n 172.16.$m.$n/16
		ovn-nbctl lrp-add r${i} r${i}_s${i} 00:de:ad:fe:$m:$n 173.$m.$n.1/24
		ovn-nbctl lr-nat-add r${i} dnat_and_snat 172.16.${m}.$((n+100)) 173.$m.$n.2
		ovn-nbctl lrp-set-gateway-chassis r${i}_public hv1
	
		# s1
		ovn-nbctl ls-add s${i}
	
		# s1 - r1
		ovn-nbctl lsp-add s${i} s${i}_r${i}
		ovn-nbctl lsp-set-type s${i}_r${i} router
		ovn-nbctl lsp-set-addresses s${i}_r${i} "00:de:ad:fe:$m:$n 173.$m.$n.1"
		ovn-nbctl lsp-set-options s${i}_r${i} router-port=r${i}_s${i}
	
		# s1 - vm1
		ovn-nbctl lsp-add s$i vm$i
		ovn-nbctl lsp-set-addresses vm$i "00:de:ad:01:$m:$n 173.$m.$n.2"
	
		ovn-nbctl lsp-add public public_r${i}
		ovn-nbctl lsp-set-type public_r${i} router
		ovn-nbctl lsp-set-addresses public_r${i} router
		ovn-nbctl lsp-set-options public_r${i} router-port=r${i}_public nat-addresses=router
		let i++
		if [ $i -gt 600 ];then
			break;
		fi
	done
	if [ $i -gt 600 ];then
		break;
	fi
done

ovn-nbctl lsp-add public ln_p1
ovn-nbctl lsp-set-addresses ln_p1 unknown
ovn-nbctl lsp-set-type ln_p1 localnet
ovn-nbctl lsp-set-options ln_p1 network_name=nattest

ovn-nbctl show
ovn-sbctl show

#add host vm1
ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal
ovs-vsctl set Interface vm1 external_ids:iface-id=vm1

ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal
ovs-vsctl set Interface vm2 external_ids:iface-id=vm2

ovs-vsctl add-port br-int vm3 -- set interface vm3 type=internal
ovs-vsctl set Interface vm3 external_ids:iface-id=vm3

ovn-nbctl lr-nat-del r1 dnat_and_snat 172.16.0.101
ovn-nbctl lr-nat-add r1 dnat_and_snat 172.16.0.101 173.0.1.2 vm1 00:00:00:01:02:03
ovn-nbctl lr-nat-list r1
--------------------------------

The key differences:
1) This does not create any network namespaces or ip links. Simply adding a port to br-int and setting external_ids:iface-id to the logical switch port is enough to get flows to install.
2) I removed the sync_set and sync_wait calls since I don't know what those do.
3) I changed the "rlRun" invocations to just call ovn-nbctl directly since I do not know what rlRun does.
4) I didn't create a nat_test bridge, since again this does not affect how flows get installed. [1]
5) I removed a line that was trying to create the r{$i}_public logical router port a second time.

--------------------------------

I ran the script above in a sandbox environment (`make sandbox` from the OVN source). After the script completes, I immediately run 

$ time ovn-nbctl --wait=hv sync

real	0m51.013s
user	0m0.001s
sys	0m0.004s

If I then run the command again, I see

$ time ovn-nbctl --wait=hv sync

real	0m11.409s
user	0m0.003s
sys	0m0.005s

And the times stay consistent after that. This seems to indicate that ovn-northd or ovn-controller is busy when I attempt the first sync. The sync takes longer because I have to wait for OVN to even start processing the sync command. After that, the cluster goes idle, so each additional sync is quicker and more consistent.

For reference, I checked to ensure the OpenFlow flows were being installed as expected:

$ ovs-ofctl dump-flows br-int | wc -l
849586

Let's try to work out the differences in our setups:
1) How many hypervisors are in your cluster? In my sandbox, there is just one.
2) Did you actually only bind vm1, vm2, and vm3, or did you actually bind more during your test?

If you can provide some help, I can try to work on this more.

One final question: is this a regression with FDP 21.H? Did this same test pass properly with FDP 21.G?

[1] I wasn't 100% sure if this would affect things, so I did attempt adding the nat_test bridge, along with an ovn-egress-iface port. This had no effect during my test.

Comment 6 ying xu 2021-09-29 02:39:41 UTC
(In reply to Mark Michelson from comment #5)
> So far, I have been unable to reproduce the issue. I took the reproducer
> that you provided and transformed it into the following:
> 
> $ cat ./slow.sh
> #!/bin/bash
> 
> ovn-nbctl set NB_GLOBAL . options:northd_probe_interval=180000
> ovn-nbctl set connection . inactivity_probe=180000
> ovs-vsctl set open . external_ids:ovn-openflow-probe-interval=180
> ovs-vsctl set open . external_ids:ovn-remote-probe-interval=180000
> ovn-sbctl set connection . inactivity_probe=180000
> 
> ovn-nbctl ls-add public
> 
> # r1
> i=1
> for m in `seq 0 9`;do
> 	for n in `seq 1 99`;do
> 		ovn-nbctl lr-add r${i}
> 		ovn-nbctl lrp-add r${i} r${i}_public 00:de:ad:ff:$m:$n 172.16.$m.$n/16
> 		ovn-nbctl lrp-add r${i} r${i}_s${i} 00:de:ad:fe:$m:$n 173.$m.$n.1/24
> 		ovn-nbctl lr-nat-add r${i} dnat_and_snat 172.16.${m}.$((n+100)) 173.$m.$n.2
> 		ovn-nbctl lrp-set-gateway-chassis r${i}_public hv1
> 	
> 		# s1
> 		ovn-nbctl ls-add s${i}
> 	
> 		# s1 - r1
> 		ovn-nbctl lsp-add s${i} s${i}_r${i}
> 		ovn-nbctl lsp-set-type s${i}_r${i} router
> 		ovn-nbctl lsp-set-addresses s${i}_r${i} "00:de:ad:fe:$m:$n 173.$m.$n.1"
> 		ovn-nbctl lsp-set-options s${i}_r${i} router-port=r${i}_s${i}
> 	
> 		# s1 - vm1
> 		ovn-nbctl lsp-add s$i vm$i
> 		ovn-nbctl lsp-set-addresses vm$i "00:de:ad:01:$m:$n 173.$m.$n.2"
> 	
> 		ovn-nbctl lsp-add public public_r${i}
> 		ovn-nbctl lsp-set-type public_r${i} router
> 		ovn-nbctl lsp-set-addresses public_r${i} router
> 		ovn-nbctl lsp-set-options public_r${i} router-port=r${i}_public
> nat-addresses=router
> 		let i++
> 		if [ $i -gt 600 ];then
> 			break;
> 		fi
> 	done
> 	if [ $i -gt 600 ];then
> 		break;
> 	fi
> done
> 
> ovn-nbctl lsp-add public ln_p1
> ovn-nbctl lsp-set-addresses ln_p1 unknown
> ovn-nbctl lsp-set-type ln_p1 localnet
> ovn-nbctl lsp-set-options ln_p1 network_name=nattest
> 
> ovn-nbctl show
> ovn-sbctl show
> 
> #add host vm1
> ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal
> ovs-vsctl set Interface vm1 external_ids:iface-id=vm1
> 
> ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal
> ovs-vsctl set Interface vm2 external_ids:iface-id=vm2
> 
> ovs-vsctl add-port br-int vm3 -- set interface vm3 type=internal
> ovs-vsctl set Interface vm3 external_ids:iface-id=vm3
> 
> ovn-nbctl lr-nat-del r1 dnat_and_snat 172.16.0.101
> ovn-nbctl lr-nat-add r1 dnat_and_snat 172.16.0.101 173.0.1.2 vm1
> 00:00:00:01:02:03
> ovn-nbctl lr-nat-list r1
> --------------------------------
> 
> The key differences:
> 1) This does not create any network namespaces or ip links. Simply adding a
> port to br-int and setting external_ids:iface-id to the logical switch port
> is enough to get flows to install.
> 2) I removed the sync_set and sync_wait calls since I don't know what those
> do.
> 3) I changed the "rlRun" invocations to just call ovn-nbctl directly since I
> do not know what rlRun does.
> 4) I didn't create a nat_test bridge, since again this does not affect how
> flows get installed. [1]
> 5) I removed a line that was trying to create the r{$i}_public logical
> router port a second time.
> 
> --------------------------------
> 
> I ran the script above in a sandbox environment (`make sandbox` from the OVN
> source). After the script completes, I immediately run 
> 
> $ time ovn-nbctl --wait=hv sync
> 
> real	0m51.013s
> user	0m0.001s
> sys	0m0.004s
> 
> If I then run the command again, I see
> 
> $ time ovn-nbctl --wait=hv sync
> 
> real	0m11.409s
> user	0m0.003s
> sys	0m0.005s
> 
> And the times stay consistent after that. This seems to indicate that
> ovn-northd or ovn-controller is busy when I attempt the first sync. The sync
> takes longer because I have to wait for OVN to even start processing the
> sync command. After that, the cluster goes idle, so each additional sync is
> quicker and more consistent.
> 
> For reference, I checked to ensure the OpenFlow flows were being installed
> as expected:
> 
> $ ovs-ofctl dump-flows br-int | wc -l
> 849586
> 
> Let's try to work out the differences in our setups:
> 1) How many hypervisors are in your cluster? In my sandbox, there is just
> one.

I am sorry for the confused configuration.you can skip them.

I have two systems, server is for ovn-central and controller,client is for another controller.
I think you should add the nat_test just like my reproducer



> 2) Did you actually only bind vm1, vm2, and vm3, or did you actually bind
> more during your test?

yes,I only bind vm0 to vm3, 4 vms in total.


> 
> If you can provide some help, I can try to work on this more.
> 
> One final question: is this a regression with FDP 21.H? Did this same test
> pass properly with FDP 21.G?
> 
> [1] I wasn't 100% sure if this would affect things, so I did attempt adding
> the nat_test bridge, along with an ovn-egress-iface port. This had no effect
> during my test.


and I can see error in log,I am not sure if this would affect.
2021-09-29T01:57:10.489Z|01319|ofctrl|INFO|OpenFlow error: OFPT_ERROR (OF1.5) (xid=0xeaa31): OFPBFC_TIMEOUT
OFPT_BUNDLE_CONTROL (OF1.5) (xid=0xeaa31):
 bundle_id=0x4e4 type=OPEN_REQUEST flags=atomic ordered

Comment 7 Mark Michelson 2021-09-29 15:14:31 UTC
> I have two systems, server is for ovn-central and controller,client is for
> another controller.
> I think you should add the nat_test just like my reproducer

If I add a minimal nat_test bridge, then my script now looks like this:

$ cat slow.sh
#!/bin/bash

ovs-vsctl add-br nat_test
ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=nattest:nat_test
ovs-vsctl add-port nat_test external_port
ovs-vsctl set interface external_port external-ids:ovn-egress-iface=true

ovn-nbctl set NB_GLOBAL . options:northd_probe_interval=180000
ovn-nbctl set connection . inactivity_probe=180000
ovs-vsctl set open . external_ids:ovn-openflow-probe-interval=180
ovs-vsctl set open . external_ids:ovn-remote-probe-interval=180000
ovn-sbctl set connection . inactivity_probe=180000

ovn-nbctl ls-add public

# r1
i=1
for m in `seq 0 9`;do
	for n in `seq 1 99`;do
		ovn-nbctl lr-add r${i}
		ovn-nbctl lrp-add r${i} r${i}_public 00:de:ad:ff:$m:$n 172.16.$m.$n/16
		ovn-nbctl lrp-add r${i} r${i}_s${i} 00:de:ad:fe:$m:$n 173.$m.$n.1/24
		ovn-nbctl lr-nat-add r${i} dnat_and_snat 172.16.${m}.$((n+100)) 173.$m.$n.2
		ovn-nbctl lrp-set-gateway-chassis r${i}_public hv1
	
		# s1
		ovn-nbctl ls-add s${i}
	
		# s1 - r1
		ovn-nbctl lsp-add s${i} s${i}_r${i}
		ovn-nbctl lsp-set-type s${i}_r${i} router
		ovn-nbctl lsp-set-addresses s${i}_r${i} "00:de:ad:fe:$m:$n 173.$m.$n.1"
		ovn-nbctl lsp-set-options s${i}_r${i} router-port=r${i}_s${i}
	
		# s1 - vm1
		ovn-nbctl lsp-add s$i vm$i
		ovn-nbctl lsp-set-addresses vm$i "00:de:ad:01:$m:$n 173.$m.$n.2"
	
		ovn-nbctl lsp-add public public_r${i}
		ovn-nbctl lsp-set-type public_r${i} router
		ovn-nbctl lsp-set-addresses public_r${i} router
		ovn-nbctl lsp-set-options public_r${i} router-port=r${i}_public nat-addresses=router
		let i++
		if [ $i -gt 600 ];then
			break;
		fi
	done
	if [ $i -gt 600 ];then
		break;
	fi
done

ovn-nbctl lsp-add public ln_p1
ovn-nbctl lsp-set-addresses ln_p1 unknown
ovn-nbctl lsp-set-type ln_p1 localnet
ovn-nbctl lsp-set-options ln_p1 network_name=nattest

ovn-nbctl show
ovn-sbctl show

#add host vm1
ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal
ovs-vsctl set Interface vm1 external_ids:iface-id=vm1

ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal
ovs-vsctl set Interface vm2 external_ids:iface-id=vm2

ovs-vsctl add-port br-int vm3 -- set interface vm3 type=internal
ovs-vsctl set Interface vm3 external_ids:iface-id=vm3

##set provide network
ovs-vsctl add-br nat_test
ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=nattest:nat_test
ovs-vsctl add-port nat_test vm0 -- set interface vm0 type=internal
ovs-vsctl set Interface vm0 external_ids:iface-id=vm0

ovn-nbctl lr-nat-del r1 dnat_and_snat 172.16.0.101
ovn-nbctl lr-nat-add r1 dnat_and_snat 172.16.0.101 173.0.1.2 vm1 00:00:00:01:02:03
ovn-nbctl lr-nat-list r1
--------------------------------------------

With this script, I see the same results I saw when trying to reproduce the issue yesterday.

> > 2) Did you actually only bind vm1, vm2, and vm3, or did you actually bind
> > more during your test?
> 
> yes,I only bind vm0 to vm3, 4 vms in total.

You can see in my updated script above I'm binding the same VMs. vm1, vm2, and vm3 are added to br-int, and vm0 is added to nat_test.

> 
> 
> > 
> > If you can provide some help, I can try to work on this more.
> > 
> > One final question: is this a regression with FDP 21.H? Did this same test
> > pass properly with FDP 21.G?

Could you please provide an answer to this question? If the behavior was the same in 21.G, then there is no need for us to update the 21.H build to fix this issue. We can just fix this before 21.I is released.

> 
> 
> and I can see error in log,I am not sure if this would affect.
> 2021-09-29T01:57:10.489Z|01319|ofctrl|INFO|OpenFlow error: OFPT_ERROR
> (OF1.5) (xid=0xeaa31): OFPBFC_TIMEOUT
> OFPT_BUNDLE_CONTROL (OF1.5) (xid=0xeaa31):
>  bundle_id=0x4e4 type=OPEN_REQUEST flags=atomic ordered

For the record, I do not see these in my test. BUNDLE_CONTROL messages are used to install the flows from ovn-controller into ovs. The fact that it's timing out likely means:

a) The number of flows we're trying to install is VERY high.
b) ovs-vswitchd is too busy to be able to handle the requests from OVN.

When you run this test, what do you see when you run `ovs-ofctl dump-flows br-int | wc -l` ? If you run the command multiple times, does it produce different results?

Comment 8 Mark Michelson 2021-09-29 15:30:23 UTC
I asked for some help from the rest of the core OVN team. One suggestion is that ovn-northd might be crashing for some reason. Do you see any core dumps on the system running ovn-northd? If so, getting a core dump (or even just a backtrace) from one of the crashed processes would be helpful.

Comment 9 ying xu 2021-09-30 02:37:33 UTC
(In reply to Mark Michelson from comment #7)
> > I have two systems, server is for ovn-central and controller,client is for
> > another controller.
> > I think you should add the nat_test just like my reproducer
> 
> If I add a minimal nat_test bridge, then my script now looks like this:
> 
> $ cat slow.sh
> #!/bin/bash
> 
> ovs-vsctl add-br nat_test
> ovs-vsctl set Open_vSwitch .
> external-ids:ovn-bridge-mappings=nattest:nat_test
> ovs-vsctl add-port nat_test external_port
> ovs-vsctl set interface external_port external-ids:ovn-egress-iface=true
> 
> ovn-nbctl set NB_GLOBAL . options:northd_probe_interval=180000
> ovn-nbctl set connection . inactivity_probe=180000
> ovs-vsctl set open . external_ids:ovn-openflow-probe-interval=180
> ovs-vsctl set open . external_ids:ovn-remote-probe-interval=180000
> ovn-sbctl set connection . inactivity_probe=180000
> 
> ovn-nbctl ls-add public
> 
> # r1
> i=1
> for m in `seq 0 9`;do
> 	for n in `seq 1 99`;do
> 		ovn-nbctl lr-add r${i}
> 		ovn-nbctl lrp-add r${i} r${i}_public 00:de:ad:ff:$m:$n 172.16.$m.$n/16
> 		ovn-nbctl lrp-add r${i} r${i}_s${i} 00:de:ad:fe:$m:$n 173.$m.$n.1/24
> 		ovn-nbctl lr-nat-add r${i} dnat_and_snat 172.16.${m}.$((n+100)) 173.$m.$n.2
> 		ovn-nbctl lrp-set-gateway-chassis r${i}_public hv1
> 	
> 		# s1
> 		ovn-nbctl ls-add s${i}
> 	
> 		# s1 - r1
> 		ovn-nbctl lsp-add s${i} s${i}_r${i}
> 		ovn-nbctl lsp-set-type s${i}_r${i} router
> 		ovn-nbctl lsp-set-addresses s${i}_r${i} "00:de:ad:fe:$m:$n 173.$m.$n.1"
> 		ovn-nbctl lsp-set-options s${i}_r${i} router-port=r${i}_s${i}
> 	
> 		# s1 - vm1
> 		ovn-nbctl lsp-add s$i vm$i
> 		ovn-nbctl lsp-set-addresses vm$i "00:de:ad:01:$m:$n 173.$m.$n.2"
> 	
> 		ovn-nbctl lsp-add public public_r${i}
> 		ovn-nbctl lsp-set-type public_r${i} router
> 		ovn-nbctl lsp-set-addresses public_r${i} router
> 		ovn-nbctl lsp-set-options public_r${i} router-port=r${i}_public
> nat-addresses=router
> 		let i++
> 		if [ $i -gt 600 ];then
> 			break;
> 		fi
> 	done
> 	if [ $i -gt 600 ];then
> 		break;
> 	fi
> done
> 
> ovn-nbctl lsp-add public ln_p1
> ovn-nbctl lsp-set-addresses ln_p1 unknown
> ovn-nbctl lsp-set-type ln_p1 localnet
> ovn-nbctl lsp-set-options ln_p1 network_name=nattest
> 
> ovn-nbctl show
> ovn-sbctl show
> 
> #add host vm1
> ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal
> ovs-vsctl set Interface vm1 external_ids:iface-id=vm1
> 
> ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal
> ovs-vsctl set Interface vm2 external_ids:iface-id=vm2
> 
> ovs-vsctl add-port br-int vm3 -- set interface vm3 type=internal
> ovs-vsctl set Interface vm3 external_ids:iface-id=vm3
> 
> ##set provide network
> ovs-vsctl add-br nat_test
> ovs-vsctl set Open_vSwitch .
> external-ids:ovn-bridge-mappings=nattest:nat_test
> ovs-vsctl add-port nat_test vm0 -- set interface vm0 type=internal
> ovs-vsctl set Interface vm0 external_ids:iface-id=vm0
> 
> ovn-nbctl lr-nat-del r1 dnat_and_snat 172.16.0.101
> ovn-nbctl lr-nat-add r1 dnat_and_snat 172.16.0.101 173.0.1.2 vm1
> 00:00:00:01:02:03
> ovn-nbctl lr-nat-list r1
> --------------------------------------------
> 
> With this script, I see the same results I saw when trying to reproduce the
> issue yesterday.
> 
> > > 2) Did you actually only bind vm1, vm2, and vm3, or did you actually bind
> > > more during your test?
> > 
> > yes,I only bind vm0 to vm3, 4 vms in total.
> 
> You can see in my updated script above I'm binding the same VMs. vm1, vm2,
> and vm3 are added to br-int, and vm0 is added to nat_test.
> 
> > 
> > 
> > > 
> > > If you can provide some help, I can try to work on this more.
> > > 
> > > One final question: is this a regression with FDP 21.H? Did this same test
> > > pass properly with FDP 21.G?
> 
> Could you please provide an answer to this question? If the behavior was the
> same in 21.G, then there is no need for us to update the 21.H build to fix
> this issue. We can just fix this before 21.I is released.


This is fine in 21.G. I think it might be a regression.

> 
> > 
> > 
> > and I can see error in log,I am not sure if this would affect.
> > 2021-09-29T01:57:10.489Z|01319|ofctrl|INFO|OpenFlow error: OFPT_ERROR
> > (OF1.5) (xid=0xeaa31): OFPBFC_TIMEOUT
> > OFPT_BUNDLE_CONTROL (OF1.5) (xid=0xeaa31):
> >  bundle_id=0x4e4 type=OPEN_REQUEST flags=atomic ordered
> 
> For the record, I do not see these in my test. BUNDLE_CONTROL messages are
> used to install the flows from ovn-controller into ovs. The fact that it's
> timing out likely means:
> 
> a) The number of flows we're trying to install is VERY high.
> b) ovs-vswitchd is too busy to be able to handle the requests from OVN.
> 
> When you run this test, what do you see when you run `ovs-ofctl dump-flows
> br-int | wc -l` ? If you run the command multiple times, does it produce
> different results?

# ovs-ofctl dump-flows br-int | wc -l
2296447
# ovs-ofctl dump-flows br-int | wc -l
2296455
# ovs-ofctl dump-flows br-int | wc -l
2021-09-30T02:36:42Z|00001|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br-int.mgmt) at ../lib/stream-fd.c:157 (98% CPU usage)
2021-09-30T02:36:42Z|00002|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br-int.mgmt) at ../lib/stream-fd.c:157 (98% CPU usage)
2021-09-30T02:36:42Z|00003|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br-int.mgmt) at ../lib/stream-fd.c:157 (98% CPU usage)
2021-09-30T02:36:42Z|00004|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br-int.mgmt) at ../lib/stream-fd.c:157 (98% CPU usage)
2021-09-30T02:36:42Z|00005|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br-int.mgmt) at ../lib/stream-fd.c:157 (98% CPU usage)
2021-09-30T02:36:42Z|00006|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br-int.mgmt) at ../lib/stream-fd.c:157 (98% CPU usage)
2021-09-30T02:36:42Z|00007|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br-int.mgmt) at ../lib/stream-fd.c:157 (98% CPU usage)
2021-09-30T02:36:42Z|00008|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br-int.mgmt) at ../lib/stream-fd.c:157 (98% CPU usage)
2021-09-30T02:36:42Z|00009|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br-int.mgmt) at ../lib/stream-fd.c:157 (98% CPU usage)
2021-09-30T02:36:42Z|00010|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br-int.mgmt) at ../lib/stream-fd.c:157 (98% CPU usage)
2021-09-30T02:36:48Z|00011|poll_loop|INFO|Dropped 288 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2021-09-30T02:36:48Z|00012|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (<->/var/run/openvswitch/br-int.mgmt) at ../lib/stream-fd.c:157 (97% CPU usage)
2296455

Comment 10 ying xu 2021-09-30 08:41:38 UTC
I tried one more time in 21.G, it takes a long time to sync this time.

the flows is about to 2 million .

maybe it is normal to take a long time, just because of too many flows.

Comment 11 Mark Michelson 2023-07-28 17:17:04 UTC
I'm closing this because

1) It's not clear if this is actually bad behavior/a regression.
2) In newer versions of OVN, we should see considerably faster operation because
   a) We've made many performance improvements in ovn-northd
   b) We've reduced the number of logical flows required for many setups


Note You need to log in before you can comment on or make changes to this bug.