Bug 1947056
| Summary: | [ovn-controller] Packet drops when using logical_dp_groups when a lflow dp_group is updated. | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Dumitru Ceara <dceara> |
| Component: | ovn2.13 | Assignee: | Dumitru Ceara <dceara> |
| Status: | CLOSED ERRATA | QA Contact: | Jianlin Shi <jishi> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | FDP 20.H | CC: | ctrautma, dcbw, ffernand, fhallal, i.maximets, jishi, jmelvin, ptalbert, ralongi, rkhan |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | ovn2.13-20.12.0-108.el8fdp | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-06-21 14:44:39 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1946420 | ||
|
Comment 10
Ilya Maximets
2021-04-08 12:48:28 UTC
Fix posted for review: http://patchwork.ozlabs.org/project/ovn/list/?series=238154&state=* (In reply to Dumitru Ceara from comment #11) > Fix posted for review: > http://patchwork.ozlabs.org/project/ovn/list/?series=238154&state=* V2 posted: http://patchwork.ozlabs.org/project/ovn/list/?series=238728 Fix accepted upstream; waiting on backport downstream. test with following script:
systemctl start openvswitch
# conf.db in attachment
# all db files is for openvswitch2.15.
cp -f new_version/conf.db /etc/openvswitch/conf.db
systemctl restart openvswitch
systemctl start ovn-northd
# ovnnb_db.db in attachment
cp -f new_version/ovnnb_db.db /var/lib/ovn/ovnnb_db.db
systemctl restart ovn-northd
ovn-nbctl set-connection ptcp:6641
ovn-sbctl set-connection ptcp:6642
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:127.0.0.1:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=127.0.0.1
systemctl restart ovn-controller
#ovn-nbctl --wait=hv set NB_Global . options:use_logical_dp_groups=true
# Start ovn-nbctl daemon mode:
export OVN_NB_DAEMON=$(ovn-nbctl --detach)
# Enable vconn debug logs (ovn-controller to ovs-vswitchd openflow connection)
ovn-appctl -t ovn-controller vlog/disable-rate-limit vconn
ovn-appctl -t ovn-controller vlog/set vconn:dbg
if1=tap30687bca-dd
if2=tapf5637489-e3
# Bind two of the OVS ports to OVN:
ip netns add vm1
ip link set $if1 netns vm1
ip netns exec vm1 ip link set $if1 address fa:16:3e:3a:25:31
ip netns exec vm1 ip addr add 10.0.126.50/24 dev $if1
ip netns exec vm1 ip link set $if1 up
ip netns add vm2
ip link set $if2 netns vm2
ip netns exec vm2 ip link set $if2 address fa:16:3e:75:69:6e
ip netns exec vm2 ip addr add 10.0.126.80/24 dev $if2
ip netns exec vm2 ip link set $if2 up
while :
do
if ip netns exec vm1 ping 10.0.126.80 -c 1
then
break
else
sleep 1
fi
done
# Start continuous ping from one port to the other, e.g.: vm1 -> vm2
ip netns exec vm1 ping 10.0.126.80 -i 0.1 &> ping.log &
ping_pid=$!
# Add an unrelated logical switch with an internal OVS port attached to it:
ovs-vsctl add-port br-int vm-test -- set interface vm-test type=internal -- set interface vm-test external_ids:iface-id=vm-test
# In a loop, simulate CMS changes to the topology by removing and adding the
# unrelated logical switch:
for i in {1..3}
do
ovn-nbctl ls-add ls -- lsp-add ls vm-test
#ovn-sbctl list logical_dp_group
sleep 10
ovn-nbctl ls-del ls
#ovn-sbctl list logical_dp_group
sleep 10
done
kill -2 $ping_pid
tail ping.log
reproduced on ovn2.13-20.12.0-104.el8:
--- 10.0.126.80 ping statistics ---
414 packets transmitted, 207 received, 50% packet loss, time 60076ms
rtt min/avg/max/mdev = 0.044/0.066/0.853/0.056 ms
Verified on ovn2.13-20.12.0-135.el8:
--- 10.0.126.80 ping statistics ---
579 packets transmitted, 579 received, 0% packet loss, time 60054ms
rtt min/avg/max/mdev = 0.016/0.051/0.664/0.033 ms
[root@dell-per730-03 bz1947056]# rpm -qa | grep -E "openvswitch2.15|ovn2.13"
openvswitch2.15-2.15.0-23.el8fdp.x86_64
ovn2.13-20.12.0-135.el8fdp.x86_64
ovn2.13-central-20.12.0-135.el8fdp.x86_64
ovn2.13-host-20.12.0-135.el8fdp.x86_64
also verified on ovn-2021-21.03.0-40.el8fdp.x86_64: [root@dell-per730-03 bz1947056]# rpm -qa | grep -E "openvswitch2.15|ovn-2021" openvswitch2.15-2.15.0-23.el8fdp.x86_64 ovn-2021-central-21.03.0-40.el8fdp.x86_64 ovn-2021-21.03.0-40.el8fdp.x86_64 ovn-2021-host-21.03.0-40.el8fdp.x86_64 --- 10.0.126.80 ping statistics --- 579 packets transmitted, 579 received, 0% packet loss, time 60053ms rtt min/avg/max/mdev = 0.017/0.053/0.645/0.028 ms for rhel7, the db files attached should be converted with "ovsdb-tool compact $db_file" with db files converted, reproduced on ovn2.13-20.12.0-104.el7: --- 10.0.126.80 ping statistics --- 1007 packets transmitted, 1007 received, 0% packet loss, time 100600ms rtt min/avg/max/mdev = 0.033/0.073/0.619/0.031 ms Verified on ovn2.13-20.12.0-135.el7: [root@dell-per740-12 bz1947056]# rpm -qa | grep -E "openvswitch2.13|ovn2.13" openvswitch2.13-2.13.0-96.el7fdp.x86_64 ovn2.13-host-20.12.0-135.el7fdp.x86_64 ovn2.13-central-20.12.0-135.el7fdp.x86_64 ovn2.13-20.12.0-135.el7fdp.x86_64 --- 10.0.126.80 ping statistics --- 1007 packets transmitted, 1007 received, 0% packet loss, time 100600ms rtt min/avg/max/mdev = 0.033/0.073/0.619/0.031 ms reproduced without the nb files with following script:
systemctl start openvswitch
systemctl start ovn-northd
ovn-nbctl set-connection ptcp:6641
ovn-sbctl set-connection ptcp:6642
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:127.0.0.1:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=127.0.0.1
systemctl restart ovn-controller
ovn-nbctl --wait=hv set NB_Global . options:use_logical_dp_groups=true
# Start ovn-nbctl daemon mode:
export OVN_NB_DAEMON=$(ovn-nbctl --detach)
# Enable vconn debug logs (ovn-controller to ovs-vswitchd openflow connection)
ovn-appctl -t ovn-controller vlog/disable-rate-limit vconn
ovn-appctl -t ovn-controller vlog/set vconn:dbg
for i in {1..99}
do
ovn-nbctl ls-add lstest$i
ovn-nbctl ls-add lstest${i}_p
ovn-nbctl lr-add lrtest$i
ovn-nbctl lrp-add lrtest$i lrt${i}-ls$i 00:00:00:00:00:$i 10.0.$i.1/24
ovn-nbctl lrp-add lrtest$i lrt${i}-ls$i-p fa:00:00:00:00:$i 1.1.$i.1/24
ovn-nbctl lsp-add lstest$i ls${i}-lrt$i
ovn-nbctl lsp-set-type ls${i}-lrt$i router
ovn-nbctl lsp-set-addresses ls${i}-lrt$i router
ovn-nbctl lsp-set-options ls${i}-lrt$i router-port=lrt${i}-ls$i
ovn-nbctl lsp-add lstest${i}_p ls${i}-p-lrt$i
ovn-nbctl lsp-set-type ls${i}-p-lrt$i router
ovn-nbctl lsp-set-addresses ls${i}-p-lrt$i router
ovn-nbctl lsp-set-options ls${i}-p-lrt$i router-port=lrt${i}-ls$i-p
ovn-nbctl lr-nat-add lrtest$i snat 1.1.$i.11 10.0.$i.11
pg_scale=""
for j in {1..10}
do
ovn-nbctl lsp-add lstest$i lstest${i}p$j
ovn-nbctl lsp-set-addresses lstest${i}p$j "00:$j:00:00:00:$i"
pg_scale="$pg_scale lstest${i}p$j"
ovs-vsctl add-port br-int lstest${i}p$j -- set interface lstest${i}p$j type=internal external_ids:iface-id=lstest${i}p$j
done
for j in {1..10}
do
ovn-nbctl lsp-add lstest${i}_p lstest${i}_p-p$j
ovn-nbctl lsp-set-addresses lstest${i}_p-p$j "fa:$j:00:00:00:$i"
pg_scale="$pg_scale lstest${i}_p-p$j"
ovs-vsctl add-port br-int lstest${i}_p-p$j -- set interface lstest${i}_p-p$j type=internal external_ids:iface-id=lstest${i}_p-p$j
done
ovn-nbctl pg-add pg1t$i
ovn-nbctl pg-set-ports pg1t$i $pg_scale
ovn-nbctl --type=port-group acl-add pg1t$i from-lport 1001 "inport == @pg1$i" allow-related
done
ovn-nbctl ls-add ls1
for i in {1..99}
do
ovn-nbctl lsp-add ls1 ls1p$i
ovn-nbctl lsp-set-addresses ls1p$i "fa:16:3e:3a:26:$i"
ovs-vsctl add-port br-int ls1p$i -- set interface ls1p$i type=internal external_ids:iface-id=ls1p$i
ovn-nbctl acl-add ls1 from-lport 1000 "inport==\"$ls1p$i\" && ip" allow-related
done
ovn-nbctl ls-add ls2
for i in {1..99}
do
ovn-nbctl lsp-add ls2 ls2p$i
ovn-nbctl lsp-set-addresses ls2p$i "fa:16:3e:3a:27:$i"
ovs-vsctl add-port br-int ls2p$i -- set interface ls2p$i type=internal external_ids:iface-id=ls2p$i
ovn-nbctl acl-add ls2 from-lport 1000 "inport==\"$ls2p$i\" && ip" allow-related
done
ovn-nbctl lr-add lr1
ovn-nbctl lrp-add lr1 lr1-ls1 00:00:00:00:01:01 10.0.126.1/24
ovn-nbctl lsp-add ls1 ls1-lr1
ovn-nbctl lsp-set-type ls1-lr1 router
ovn-nbctl lsp-set-options ls1-lr1 router-port=lr1-ls1
ovn-nbctl lsp-set-addresses ls1-lr1 router
ovn-nbctl lrp-add lr1 lr1-ls2 00:00:00:00:01:02 1.1.1.1/24
ovn-nbctl lsp-add ls2 ls2-lr1
ovn-nbctl lsp-set-type ls2-lr1 router
ovn-nbctl lsp-set-options ls2-lr1 router-port=lr1-ls2
ovn-nbctl lsp-set-addresses ls2-lr1 router
ovn-nbctl set logical_router lr1 options:chassis=hv1
for i in {2..200}
do
ovn-nbctl lr-nat-add lr1 snat 1.1.1.$i 10.0.126.$i
done
if1=tap30687bca-dd
if2=tapf5637489-e3
ovn-nbctl lsp-add ls1 $if1
ovn-nbctl lsp-set-addresses $if1 "fa:16:3e:3a:25:31 10.0.126.50"
ovn-nbctl lsp-add ls1 $if2
ovn-nbctl lsp-set-addresses $if2 "fa:16:3e:75:69:6e 10.0.126.80"
ovs-vsctl add-port br-int $if1 -- set interface $if1 type=internal external_ids:iface-id=$if1
ovs-vsctl add-port br-int $if2 -- set interface $if2 type=internal external_ids:iface-id=$if2
# Bind two of the OVS ports to OVN:
ip netns add vm1
ip link set $if1 netns vm1
ip netns exec vm1 ip link set $if1 address fa:16:3e:3a:25:31
ip netns exec vm1 ip addr add 10.0.126.50/24 dev $if1
ip netns exec vm1 ip link set $if1 up
ip netns add vm2
ip link set $if2 netns vm2
ip netns exec vm2 ip link set $if2 address fa:16:3e:75:69:6e
ip netns exec vm2 ip addr add 10.0.126.80/24 dev $if2
ip netns exec vm2 ip link set $if2 up
while :
do
if ip netns exec vm1 ping 10.0.126.80 -c 1 -w 1 -W 1
then
break
else
sleep 1
fi
done
# Start continuous ping from one port to the other, e.g.: vm1 -> vm2
ip netns exec vm1 ping 10.0.126.80 -i 0.1 &> ping.log &
ping_pid=$!
# Add an unrelated logical switch with an internal OVS port attached to it:
ovs-vsctl add-port br-int vm-test -- set interface vm-test type=internal -- set interface vm-test external_ids:iface-id=vm-test
# In a loop, simulate CMS changes to the topology by removing and adding the
# unrelated logical switch:
for i in {1..10}
do
ovn-nbctl --wait=hv ls-add ls -- lsp-add ls vm-test
sleep 5
ovn-nbctl --wait=hv ls-del ls
sleep 5
done
kill -2 $ping_pid
tail ping.log
reproduced on ovn2.13-20.21.0-104.el7:
[root@dell-per740-12 bz1947056]# tail ping_104.log
64 bytes from 10.0.126.80: icmp_seq=1397 ttl=64 time=0.033 ms
64 bytes from 10.0.126.80: icmp_seq=1398 ttl=64 time=0.035 ms
64 bytes from 10.0.126.80: icmp_seq=1399 ttl=64 time=0.035 ms
64 bytes from 10.0.126.80: icmp_seq=1400 ttl=64 time=0.035 ms
64 bytes from 10.0.126.80: icmp_seq=1401 ttl=64 time=0.045 ms
64 bytes from 10.0.126.80: icmp_seq=1402 ttl=64 time=0.036 ms
--- 10.0.126.80 ping statistics ---
1402 packets transmitted, 1168 received, 16% packet loss, time 141682ms
rtt min/avg/max/mdev = 0.031/0.062/0.707/0.025 ms
Verified on ovn2.13-20.12.0-135.el7:
[root@dell-per740-12 bz1947056]# tail ping.log
64 bytes from 10.0.126.80: icmp_seq=1155 ttl=64 time=0.064 ms
64 bytes from 10.0.126.80: icmp_seq=1156 ttl=64 time=0.063 ms
64 bytes from 10.0.126.80: icmp_seq=1157 ttl=64 time=0.064 ms
64 bytes from 10.0.126.80: icmp_seq=1158 ttl=64 time=0.064 ms
64 bytes from 10.0.126.80: icmp_seq=1159 ttl=64 time=0.065 ms
64 bytes from 10.0.126.80: icmp_seq=1160 ttl=64 time=0.063 ms
--- 10.0.126.80 ping statistics ---
1160 packets transmitted, 1160 received, 0% packet loss, time 115901ms
rtt min/avg/max/mdev = 0.031/0.063/0.622/0.024 ms
also verified on ovn-2021-21.03.0-40.el8fdp.x86_64:
+ tail ping.log
64 bytes from 10.0.126.80: icmp_seq=1065 ttl=64 time=0.036 ms
64 bytes from 10.0.126.80: icmp_seq=1066 ttl=64 time=0.018 ms
64 bytes from 10.0.126.80: icmp_seq=1067 ttl=64 time=0.034 ms
64 bytes from 10.0.126.80: icmp_seq=1068 ttl=64 time=0.039 ms
64 bytes from 10.0.126.80: icmp_seq=1069 ttl=64 time=0.040 ms
64 bytes from 10.0.126.80: icmp_seq=1070 ttl=64 time=0.017 ms
--- 10.0.126.80 ping statistics ---
1070 packets transmitted, 1070 received, 0% packet loss, time 111133ms
rtt min/avg/max/mdev = 0.017/0.036/0.648/0.027 ms
[root@dell-per730-03 bz1947056]# rpm -qa | grep -E "openvswitch2.15|ovn-2021"
ovn-2021-host-21.03.0-40.el8fdp.x86_64
openvswitch2.15-2.15.0-23.el8fdp.x86_64
ovn-2021-central-21.03.0-40.el8fdp.x86_64
python3-openvswitch2.15-2.15.0-23.el8fdp.x86_64
ovn-2021-21.03.0-40.el8fdp.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn2.13 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2507 |