Currently, ovn-controller updates OF rules one by one. This means that if update of logical flows requires to remove one OF rule and add a different one, ovn-controller will remove the rule first and add new one later. This might lead to a case where service provided by the old rule will no longer work until the new rule is installed leading to the dataplane downtime and packet loss. This might be significant for a large scale setups with high number of OF rules. To avoid this problem ovn-controller should add all the flow modifications to an OF bundle and commit the bundle with all changes atomically. This way there will be no time period where no relevant OF rules installed. This might also relieve some pressure on ovs-vswitchd that will not need to create a new version of OF tables for each flow update and trigger revalidation.
Sent to the mail-list fro review: https://patchwork.ozlabs.org/project/ovn/patch/20210408123112.678123-1-i.maximets@ovn.org/
Version 2: https://patchwork.ozlabs.org/project/ovn/patch/20210413082323.2491511-1-i.maximets@ovn.org/
reproduced with following script on ovn2.13-20.21.0-135.el7: systemctl start openvswitch systemctl start ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:127.0.0.1:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=127.0.0.1 systemctl restart ovn-controller ovn-nbctl --wait=hv set NB_Global . options:use_logical_dp_groups=true # Start ovn-nbctl daemon mode: export OVN_NB_DAEMON=$(ovn-nbctl --detach) # Enable vconn debug logs (ovn-controller to ovs-vswitchd openflow connection) ovn-appctl -t ovn-controller vlog/disable-rate-limit vconn ovn-appctl -t ovn-controller vlog/set vconn:dbg if1=tap30687bca-dd if2=tapf5637489-e3 ovn-nbctl ls-add ls1 ovn-nbctl lsp-add ls1 $if1 ovn-nbctl lsp-set-addresses $if1 "fa:16:3e:3a:25:31 10.0.126.50" ovn-nbctl lsp-add ls1 $if2 ovn-nbctl lsp-set-addresses $if2 "fa:16:3e:75:69:6e 10.0.126.80" ovs-vsctl add-port br-int $if1 -- set interface $if1 type=internal external_ids:iface-id=$if1 ovs-vsctl add-port br-int $if2 -- set interface $if2 type=internal external_ids:iface-id=$if2 # Bind two of the OVS ports to OVN: ip netns add vm1 ip link set $if1 netns vm1 ip netns exec vm1 ip link set $if1 address fa:16:3e:3a:25:31 ip netns exec vm1 ip addr add 10.0.126.50/24 dev $if1 ip netns exec vm1 ip link set $if1 up ip netns add vm2 ip link set $if2 netns vm2 ip netns exec vm2 ip link set $if2 address fa:16:3e:75:69:6e ip netns exec vm2 ip addr add 10.0.126.80/24 dev $if2 ip netns exec vm2 ip link set $if2 up # Start continuous ping from one port to the other, e.g.: vm1 -> vm2 ip netns exec vm1 ping 10.0.126.80 -i 0.1 &> ping.log & ping_pid=$! # Add an unrelated logical switch with an internal OVS port attached to it: ovs-vsctl add-port br-int vm-test -- set interface vm-test type=internal -- set interface vm-test external_ids:iface-id=vm-test # In a loop, simulate CMS changes to the topology by removing and adding the # unrelated logical switch: for i in {1..10} do ovn-nbctl --wait=hv ls-add ls -- lsp-add ls vm-test ovn-sbctl list logical_dp_group sleep 5 ovn-nbctl --wait=hv ls-del ls ovn-sbctl list logical_dp_group sleep 5 done kill -2 $ping_pid tail ping.log [root@dell-per740-12 bz1947398]# rpm -qa | grep -E "openvswitch2.13|ovn2.13" openvswitch2.13-2.13.0-96.el7fdp.x86_64 ovn2.13-host-20.12.0-135.el7fdp.x86_64 ovn2.13-central-20.12.0-135.el7fdp.x86_64 ovn2.13-20.12.0-135.el7fdp.x86_64 + tail ping.log 64 bytes from 10.0.126.80: icmp_seq=1011 ttl=64 time=0.047 ms 64 bytes from 10.0.126.80: icmp_seq=1012 ttl=64 time=0.048 ms 64 bytes from 10.0.126.80: icmp_seq=1013 ttl=64 time=0.046 ms 64 bytes from 10.0.126.80: icmp_seq=1014 ttl=64 time=0.047 ms 64 bytes from 10.0.126.80: icmp_seq=1015 ttl=64 time=0.048 ms 64 bytes from 10.0.126.80: icmp_seq=1016 ttl=64 time=0.048 ms --- 10.0.126.80 ping statistics --- 1016 packets transmitted, 1014 received, 0% packet loss, time 101510ms rtt min/avg/max/mdev = 0.017/0.043/4.073/0.127 ms <=== 2 packets lost
Verified on ovn-2021-21.06.0-4: 64 bytes from 10.0.126.80: icmp_seq=971 ttl=64 time=0.050 ms 64 bytes from 10.0.126.80: icmp_seq=972 ttl=64 time=0.050 ms 64 bytes from 10.0.126.80: icmp_seq=973 ttl=64 time=0.051 ms 64 bytes from 10.0.126.80: icmp_seq=974 ttl=64 time=0.050 ms 64 bytes from 10.0.126.80: icmp_seq=975 ttl=64 time=0.050 ms 64 bytes from 10.0.126.80: icmp_seq=976 ttl=64 time=0.050 ms --- 10.0.126.80 ping statistics --- 976 packets transmitted, 976 received, 0% packet loss, time 101394ms rtt min/avg/max/mdev = 0.038/0.050/1.046/0.032 ms [root@wsfd-advnetlab16 bz1947398]# rpm -qa | grep -E "openvswitch2.15|ovn-2021" ovn-2021-21.06.0-4.el8fdp.x86_64 openvswitch2.15-2.15.0-26.el8fdp.x86_64 ovn-2021-central-21.06.0-4.el8fdp.x86_64 python3-openvswitch2.15-2.15.0-26.el8fdp.x86_64 ovn-2021-host-21.06.0-4.el8fdp.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2969