Description of problem: gateway_mtu is not effective after change from a small value to a larger value Version-Release number of selected component (if applicable): [root@dell-per730-57 ovn]# rpm -qa | grep ovn ovn2.11-central-2.11.0-16.el8fdp.x86_64 ovn2.11-2.11.0-16.el8fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-1.0-121.noarch ovn2.11-host-2.11.0-16.el8fdp.x86_64 [root@dell-per730-57 ovn]# rpm -qa | grep openvswitch kernel-kernel-networking-openvswitch-ovn-1.0-121.noarch openvswitch-selinux-extra-policy-1.0-11.el8fdp.noarch openvswitch2.11-2.11.0-9.el8fdp.x86_64 [root@dell-per730-57 ovn]# How reproducible: everytime Steps to Reproduce: 1.set gateway_mtu to 1000,the packets can be fragmented to 1000 2.change the gateway_mtu to 1500,the packets are still fragmented to 1000 [root@dell-per730-57 ovn]# ovn-nbctl set logical_router_port r1_s3 options:gateway_mtu=1000 [root@dell-per730-57 ovn]# virsh console hv1_vm00 Connected to domain hv1_vm00 Escape character is ^] [root@localhost ~]# ping -s 9000 172.16.103.11 PING 172.16.103.11 (172.16.103.11) 9000(9028) bytes of data. From 172.16.102.1 icmp_seq=1 Frag needed and DF set (mtu = 982) 9008 bytes from 172.16.103.11: icmp_seq=2 ttl=63 time=1.45 ms 9008 bytes from 172.16.103.11: icmp_seq=3 ttl=63 time=0.991 ms 9008 bytes from 172.16.103.11: icmp_seq=4 ttl=63 time=0.975 ms --- 172.16.103.11 ping statistics --- 4 packets transmitted, 3 received, +1 errors, 25% packet loss, time 3004ms rtt min/avg/max/mdev = 0.975/1.141/1.458/0.225 ms packets captured on peer: [root@localhost ~]# tcpdump -ei eth1 -nn tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes 04:14:18.053864 00:de:ad:ff:01:03 > 00:de:ad:00:00:01, ethertype IPv4 (0x0800), length 994: 172.16.102.11 > 172.16.103.11: ICMP echo request, id 11611, seq 2, length 960 04:14:18.053888 00:de:ad:ff:01:03 > 00:de:ad:00:00:01, ethertype IPv4 (0x0800), length 994: 172.16.102.11 > 172.16.103.11: ip-proto-1 04:14:18.053923 00:de:ad:ff:01:03 > 00:de:ad:00:00:01, ethertype IPv4 (0x0800), length 994: 172.16.102.11 > 172.16.103.11: ip-proto-1 change the gateway_mtu to 1500: [root@dell-per730-57 ovn]# ovn-nbctl set logical_router_port r1_s3 options:gateway_mtu=1500 [root@dell-per730-57 ovn]# virsh console hv1_vm00 Connected to domain hv1_vm00 Escape character is ^] [root@localhost ~]# ping -s 9000 172.16.103.11 PING 172.16.103.11 (172.16.103.11) 9000(9028) bytes of data. 9008 bytes from 172.16.103.11: icmp_seq=1 ttl=63 time=2.01 ms 9008 bytes from 172.16.103.11: icmp_seq=2 ttl=63 time=0.824 ms 9008 bytes from 172.16.103.11: icmp_seq=3 ttl=63 time=0.917 ms 9008 bytes from 172.16.103.11: icmp_seq=4 ttl=63 time=0.901 ms 9008 bytes from 172.16.103.11: icmp_seq=5 ttl=63 time=0.989 ms 9008 bytes from 172.16.103.11: icmp_seq=6 ttl=63 time=0.801 ms --- 172.16.103.11 ping statistics --- 6 packets transmitted, 6 received, 0% packet loss, time 5007ms rtt min/avg/max/mdev = 0.801/1.074/2.014/0.425 ms [root@localhost ~]# packets captured on peer: [root@localhost ~]# tcpdump -ei eth1 -nn tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes 04:17:24.450836 00:de:ad:ff:01:03 > 00:de:ad:00:00:01, ethertype IPv4 (0x0800), length 994: 172.16.102.11 > 172.16.103.11: ICMP echo request, id 11614, seq 1, length 960 04:17:24.450851 00:de:ad:ff:01:03 > 00:de:ad:00:00:01, ethertype IPv4 (0x0800), length 994: 172.16.102.11 > 172.16.103.11: ip-proto-1 04:17:24.450855 00:de:ad:ff:01:03 > 00:de:ad:00:00:01, ethertype IPv4 (0x0800), length 994: 172.16.102.11 > 172.16.103.11: ip-proto-1 04:17:24.450858 00:de:ad:ff:01:03 > 00:de:ad:00:00:01, ethertype IPv4 (0x0800), length 994: 172.16.102.11 > 172.16.103.11: ip-proto-1 Expected results: the fragment size can changed with the setting Additional info:
Hi, The default behavior on hv1_vm00 is to learn interface MTU size from "ICMP unreachable - need to frag" packets sent by the gateway. The expiry timeout is controlled via /proc/sys/net/ipv4/route/mtu_expires. On my system: # cat /proc/sys/net/ipv4/route/mtu_expires 600 To disable this behavior, in the VM: # echo 0 > /proc/sys/net/ipv4/route/mtu_expires After disabling MTU size learning ping will always try to send packets as big as the locally configured MTU size. In my case with MTU 1500 in the VM: On northd: # ovn-nbctl set logical_router_port rtr-ls1 options:gateway_mtu=1000 In VM: # ping 10.0.0.1 -s 9000 PING 10.0.0.1 (10.0.0.1) 9000(9028) bytes of data. From 20.0.0.254 icmp_seq=1 Frag needed and DF set (mtu = 982) From 20.0.0.254 icmp_seq=2 Frag needed and DF set (mtu = 982) On northd increase gateway mtu to 1500: # ovn-nbctl set logical_router_port rtr-ls1 options:gateway_mtu=1500 In VM: # ip netns exec vm2 ping 10.0.0.1 -s 9000 PING 10.0.0.1 (10.0.0.1) 9000(9028) bytes of data. From 20.0.0.254 icmp_seq=1 Frag needed and DF set (mtu = 1482) From 20.0.0.254 icmp_seq=2 Frag needed and DF set (mtu = 1482) From 20.0.0.254 icmp_seq=3 Frag needed and DF set (mtu = 1482) # ip netns exec vm2 ping 10.0.0.1 -s 1454 PING 10.0.0.1 (10.0.0.1) 1454(1482) bytes of data. 1462 bytes from 10.0.0.1: icmp_seq=1 ttl=63 time=0.762 ms 1462 bytes from 10.0.0.1: icmp_seq=2 ttl=63 time=0.639 ms So it seems that the functionality works as expected.
Hi, I have another question for the fragment.If I set the gateway_mtu less than 568,it seems can't ping successfully to the remote,only displayed (mtu = 542) like this: [root@dell-per730-19 ovn]# ovn-nbctl get logical_router_port r1_s3 options:gateway_mtu "560" [root@dell-per730-19 ovn]# virsh console hv1_vm00 Connected to domain hv1_vm00 Escape character is ^] [root@localhost ~]# ping -s 1000 172.16.103.11 PING 172.16.103.11 (172.16.103.11) 1000(1028) bytes of data. From 172.16.102.1 icmp_seq=1 Frag needed and DF set (mtu = 542) From 172.16.102.1 icmp_seq=2 Frag needed and DF set (mtu = 542) From 172.16.102.1 icmp_seq=3 Frag needed and DF set (mtu = 542) From 172.16.102.1 icmp_seq=4 Frag needed and DF set (mtu = 542) From 172.16.102.1 icmp_seq=5 Frag needed and DF set (mtu = 542) From 172.16.102.1 icmp_seq=6 Frag needed and DF set (mtu = 542) From 172.16.102.1 icmp_seq=7 Frag needed and DF set (mtu = 542) From 172.16.102.1 icmp_seq=8 Frag needed and DF set (mtu = 542) From 172.16.102.1 icmp_seq=9 Frag needed and DF set (mtu = 542) From 172.16.102.1 icmp_seq=10 Frag needed and DF set (mtu = 542) From 172.16.102.1 icmp_seq=11 Frag needed and DF set (mtu = 542) From 172.16.102.1 icmp_seq=12 Frag needed and DF set (mtu = 542) From 172.16.102.1 icmp_seq=13 Frag needed and DF set (mtu = 542) From 172.16.102.1 icmp_seq=14 Frag needed and DF set (mtu = 542) --- 172.16.103.11 ping statistics --- 14 packets transmitted, 0 received, +14 errors, 100% packet loss, time 13017ms But it can ping success if I change the gateway_mtu to 562: root@dell-per730-19 ovn]# ovn-nbctl set logical_router_port r1_s3 options:gateway_mtu=562 [root@dell-per730-19 ovn]# ovn-nbctl get logical_router_port r1_s3 options:gateway_mtu "562" [root@localhost ~]# ping -s 1000 172.16.103.11 PING 172.16.103.11 (172.16.103.11) 1000(1028) bytes of data. From 172.16.102.1 icmp_seq=1 Frag needed and DF set (mtu = 544) 1008 bytes from 172.16.103.11: icmp_seq=2 ttl=63 time=1.01 ms 1008 bytes from 172.16.103.11: icmp_seq=3 ttl=63 time=0.785 ms 1008 bytes from 172.16.103.11: icmp_seq=4 ttl=63 time=0.632 ms 1008 bytes from 172.16.103.11: icmp_seq=5 ttl=63 time=0.590 ms --- 172.16.103.11 ping statistics --- 5 packets transmitted, 4 received, +1 errors, 20% packet loss, time 4003ms rtt min/avg/max/mdev = 0.590/0.755/1.013/0.165 ms [root@localhost ~]# Is it expected?Thanks.
the vm can ping successfully with gateway_mtu larger than 562,corret the value in comment2.
Hi, Did hv1_vm00 get rebooted in between the gateway_mtu change on r1_s3? Or did the configuration for /proc/sys/net/ipv4/route/mtu_expires on hv1_vm00 change? If not can you please share a tcpdump from hv1_vm00 when gateway_mtu is 562 and ping is successful? tcpdump -n -i <interface> -v Thanks