Bug 1717793 - [OVN]gateway_mtu is not effective after change from a small value to a larger value
Summary: [OVN]gateway_mtu is not effective after change from a small value to a larger...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovn2.11
Version: FDP 19.C
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Dumitru Ceara
QA Contact: haidong li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-06 08:19 UTC by haidong li
Modified: 2019-06-25 13:22 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-24 19:48:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description haidong li 2019-06-06 08:19:40 UTC
Description of problem:
gateway_mtu is not effective after change from a small value to a larger value

Version-Release number of selected component (if applicable):
[root@dell-per730-57 ovn]# rpm -qa | grep ovn
ovn2.11-central-2.11.0-16.el8fdp.x86_64
ovn2.11-2.11.0-16.el8fdp.x86_64
kernel-kernel-networking-openvswitch-ovn-1.0-121.noarch
ovn2.11-host-2.11.0-16.el8fdp.x86_64
[root@dell-per730-57 ovn]# rpm -qa | grep openvswitch
kernel-kernel-networking-openvswitch-ovn-1.0-121.noarch
openvswitch-selinux-extra-policy-1.0-11.el8fdp.noarch
openvswitch2.11-2.11.0-9.el8fdp.x86_64
[root@dell-per730-57 ovn]# 


How reproducible:
everytime

Steps to Reproduce:
1.set gateway_mtu to 1000,the packets can be fragmented to 1000
2.change the gateway_mtu to 1500,the packets are still fragmented to 1000

[root@dell-per730-57 ovn]# ovn-nbctl set logical_router_port r1_s3 options:gateway_mtu=1000
[root@dell-per730-57 ovn]# virsh console hv1_vm00
Connected to domain hv1_vm00
Escape character is ^]

[root@localhost ~]# ping -s 9000 172.16.103.11
PING 172.16.103.11 (172.16.103.11) 9000(9028) bytes of data.
From 172.16.102.1 icmp_seq=1 Frag needed and DF set (mtu = 982)
9008 bytes from 172.16.103.11: icmp_seq=2 ttl=63 time=1.45 ms
9008 bytes from 172.16.103.11: icmp_seq=3 ttl=63 time=0.991 ms
9008 bytes from 172.16.103.11: icmp_seq=4 ttl=63 time=0.975 ms

--- 172.16.103.11 ping statistics ---
4 packets transmitted, 3 received, +1 errors, 25% packet loss, time 3004ms
rtt min/avg/max/mdev = 0.975/1.141/1.458/0.225 ms

packets captured on peer:

[root@localhost ~]# tcpdump -ei eth1 -nn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
04:14:18.053864 00:de:ad:ff:01:03 > 00:de:ad:00:00:01, ethertype IPv4 (0x0800), length 994: 172.16.102.11 > 172.16.103.11: ICMP echo request, id 11611, seq 2, length 960
04:14:18.053888 00:de:ad:ff:01:03 > 00:de:ad:00:00:01, ethertype IPv4 (0x0800), length 994: 172.16.102.11 > 172.16.103.11: ip-proto-1
04:14:18.053923 00:de:ad:ff:01:03 > 00:de:ad:00:00:01, ethertype IPv4 (0x0800), length 994: 172.16.102.11 > 172.16.103.11: ip-proto-1

change the gateway_mtu to 1500:
[root@dell-per730-57 ovn]# ovn-nbctl set logical_router_port r1_s3 options:gateway_mtu=1500
[root@dell-per730-57 ovn]# virsh console hv1_vm00
Connected to domain hv1_vm00
Escape character is ^]

[root@localhost ~]# ping -s 9000 172.16.103.11
PING 172.16.103.11 (172.16.103.11) 9000(9028) bytes of data.
9008 bytes from 172.16.103.11: icmp_seq=1 ttl=63 time=2.01 ms
9008 bytes from 172.16.103.11: icmp_seq=2 ttl=63 time=0.824 ms
9008 bytes from 172.16.103.11: icmp_seq=3 ttl=63 time=0.917 ms
9008 bytes from 172.16.103.11: icmp_seq=4 ttl=63 time=0.901 ms
9008 bytes from 172.16.103.11: icmp_seq=5 ttl=63 time=0.989 ms
9008 bytes from 172.16.103.11: icmp_seq=6 ttl=63 time=0.801 ms

--- 172.16.103.11 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5007ms
rtt min/avg/max/mdev = 0.801/1.074/2.014/0.425 ms
[root@localhost ~]# 

packets captured on peer:
[root@localhost ~]# tcpdump -ei eth1 -nn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
04:17:24.450836 00:de:ad:ff:01:03 > 00:de:ad:00:00:01, ethertype IPv4 (0x0800), length 994: 172.16.102.11 > 172.16.103.11: ICMP echo request, id 11614, seq 1, length 960
04:17:24.450851 00:de:ad:ff:01:03 > 00:de:ad:00:00:01, ethertype IPv4 (0x0800), length 994: 172.16.102.11 > 172.16.103.11: ip-proto-1
04:17:24.450855 00:de:ad:ff:01:03 > 00:de:ad:00:00:01, ethertype IPv4 (0x0800), length 994: 172.16.102.11 > 172.16.103.11: ip-proto-1
04:17:24.450858 00:de:ad:ff:01:03 > 00:de:ad:00:00:01, ethertype IPv4 (0x0800), length 994: 172.16.102.11 > 172.16.103.11: ip-proto-1



Expected results:
the fragment size can changed with the setting

Additional info:

Comment 1 Dumitru Ceara 2019-06-24 19:48:11 UTC
Hi,

The default behavior on hv1_vm00 is to learn interface MTU size from "ICMP unreachable - need to frag" packets sent by the gateway. The expiry timeout is controlled via /proc/sys/net/ipv4/route/mtu_expires.

On my system:
# cat /proc/sys/net/ipv4/route/mtu_expires
600

To disable this behavior, in the VM:
# echo 0 > /proc/sys/net/ipv4/route/mtu_expires

After disabling MTU size learning ping will always try to send packets as big as the locally configured MTU size. In my case with MTU 1500 in the VM:

On northd:
# ovn-nbctl set logical_router_port rtr-ls1 options:gateway_mtu=1000

In VM:
# ping 10.0.0.1 -s 9000
PING 10.0.0.1 (10.0.0.1) 9000(9028) bytes of data.
From 20.0.0.254 icmp_seq=1 Frag needed and DF set (mtu = 982)
From 20.0.0.254 icmp_seq=2 Frag needed and DF set (mtu = 982)

On northd increase gateway mtu to 1500:
# ovn-nbctl set logical_router_port rtr-ls1 options:gateway_mtu=1500

In VM:
# ip netns exec vm2 ping 10.0.0.1 -s 9000
PING 10.0.0.1 (10.0.0.1) 9000(9028) bytes of data.
From 20.0.0.254 icmp_seq=1 Frag needed and DF set (mtu = 1482)
From 20.0.0.254 icmp_seq=2 Frag needed and DF set (mtu = 1482)
From 20.0.0.254 icmp_seq=3 Frag needed and DF set (mtu = 1482)

# ip netns exec vm2 ping 10.0.0.1 -s 1454
PING 10.0.0.1 (10.0.0.1) 1454(1482) bytes of data.
1462 bytes from 10.0.0.1: icmp_seq=1 ttl=63 time=0.762 ms
1462 bytes from 10.0.0.1: icmp_seq=2 ttl=63 time=0.639 ms

So it seems that the functionality works as expected.

Comment 2 haidong li 2019-06-25 05:53:26 UTC
Hi,
  I have another question for the fragment.If I set the gateway_mtu less than 568,it seems can't ping successfully to the remote,only displayed (mtu = 542) like this:

[root@dell-per730-19 ovn]# ovn-nbctl get logical_router_port r1_s3 options:gateway_mtu
"560"
[root@dell-per730-19 ovn]# virsh console hv1_vm00
Connected to domain hv1_vm00
Escape character is ^]

[root@localhost ~]# ping -s 1000  172.16.103.11
PING 172.16.103.11 (172.16.103.11) 1000(1028) bytes of data.
From 172.16.102.1 icmp_seq=1 Frag needed and DF set (mtu = 542)
From 172.16.102.1 icmp_seq=2 Frag needed and DF set (mtu = 542)
From 172.16.102.1 icmp_seq=3 Frag needed and DF set (mtu = 542)
From 172.16.102.1 icmp_seq=4 Frag needed and DF set (mtu = 542)
From 172.16.102.1 icmp_seq=5 Frag needed and DF set (mtu = 542)
From 172.16.102.1 icmp_seq=6 Frag needed and DF set (mtu = 542)
From 172.16.102.1 icmp_seq=7 Frag needed and DF set (mtu = 542)
From 172.16.102.1 icmp_seq=8 Frag needed and DF set (mtu = 542)
From 172.16.102.1 icmp_seq=9 Frag needed and DF set (mtu = 542)
From 172.16.102.1 icmp_seq=10 Frag needed and DF set (mtu = 542)
From 172.16.102.1 icmp_seq=11 Frag needed and DF set (mtu = 542)
From 172.16.102.1 icmp_seq=12 Frag needed and DF set (mtu = 542)
From 172.16.102.1 icmp_seq=13 Frag needed and DF set (mtu = 542)
From 172.16.102.1 icmp_seq=14 Frag needed and DF set (mtu = 542)

--- 172.16.103.11 ping statistics ---
14 packets transmitted, 0 received, +14 errors, 100% packet loss, time 13017ms

But it can ping success if I change the gateway_mtu to 562:

root@dell-per730-19 ovn]# ovn-nbctl set logical_router_port r1_s3 options:gateway_mtu=562
[root@dell-per730-19 ovn]# ovn-nbctl get logical_router_port r1_s3 options:gateway_mtu
"562"

[root@localhost ~]# ping -s 1000  172.16.103.11
PING 172.16.103.11 (172.16.103.11) 1000(1028) bytes of data.
From 172.16.102.1 icmp_seq=1 Frag needed and DF set (mtu = 544)
1008 bytes from 172.16.103.11: icmp_seq=2 ttl=63 time=1.01 ms
1008 bytes from 172.16.103.11: icmp_seq=3 ttl=63 time=0.785 ms
1008 bytes from 172.16.103.11: icmp_seq=4 ttl=63 time=0.632 ms
1008 bytes from 172.16.103.11: icmp_seq=5 ttl=63 time=0.590 ms

--- 172.16.103.11 ping statistics ---
5 packets transmitted, 4 received, +1 errors, 20% packet loss, time 4003ms
rtt min/avg/max/mdev = 0.590/0.755/1.013/0.165 ms
[root@localhost ~]# 


Is it expected?Thanks.

Comment 3 haidong li 2019-06-25 05:55:19 UTC
the vm can ping successfully with gateway_mtu larger than 562,corret the value in comment2.

Comment 4 Dumitru Ceara 2019-06-25 07:48:09 UTC
Hi,

Did hv1_vm00 get rebooted in between the gateway_mtu change on r1_s3? Or did the configuration for /proc/sys/net/ipv4/route/mtu_expires on hv1_vm00 change?

If not can you please share a tcpdump from hv1_vm00 when gateway_mtu is 562 and ping is successful?
tcpdump -n -i <interface> -v

Thanks


Note You need to log in before you can comment on or make changes to this bug.