Bug 1412234 - extend virtio-net to expose host MTU to guest
Summary: extend virtio-net to expose host MTU to guest
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel
Version: 7.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Aaron Conole
QA Contact: xiywang
Jiri Herrmann
URL:
Whiteboard:
Depends On: 1366919 1452756
Blocks: 1411862 1408701 1429163 1450162 1451342
TreeView+ depends on / blocked
 
Reported: 2017-01-11 15:20 UTC by Maxime Coquelin
Modified: 2017-08-02 05:03 UTC (History)
29 users (show)

Fixed In Version: kernel-3.10.0-613.el7
Doc Type: Enhancement
Doc Text:
Clone Of: 1366919
Environment:
Last Closed: 2017-08-02 05:03:21 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1460760 0 medium CLOSED Virtio-net interface MTU overwritten to 1500 bytes 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHSA-2017:1842 0 normal SHIPPED_LIVE Important: kernel security, bug fix, and enhancement update 2017-08-01 18:22:09 UTC

Internal Links: 1460760

Description Maxime Coquelin 2017-01-11 15:20:56 UTC
+++ This bug was initially created as a clone of Bug #1366919 +++

Description of problem:

It is useful to expose
host MTU to guest. In particular, guest can
then disable guest offloads and be sure that packets
will fit in MTU sized receive buffers,
and / or disable host offloads and still be sure
that packets can be transmitted.

Will need to clone to linux and dpdk down the road.

--- Additional comment from Maxime Coquelin on 2016-12-05 08:52:19 EST ---

QEMU RFC v3 sent on the ML:
https://mail-archive.com/qemu-devel@nongnu.org/msg416009.html
Message-Id: <20161130101017.13382-1-maxime.coquelin@redhat.com>

In this version, the MTU value is specified by the user (management
tool) in QEMU command line via the host_mtu virtio-net parameter.

Comment 1 Maxime Coquelin 2017-01-11 15:31:55 UTC
Ticket to backport Kernel patches adding MTU feature support, which at least comprises:
93a205e virtio-net: Update the mtu code to match virtio spec
d0c2c99 net: use core MTU range checking in virt drivers
14de9d1 virtio-net: Add initial MTU advice feature

Comment 2 Maxime Coquelin 2017-01-19 08:52:58 UTC
Sorry, reverting to "New" state, wrong Bz.

Comment 7 Rafael Aquini 2017-03-17 02:42:45 UTC
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 9 Rafael Aquini 2017-03-17 18:05:28 UTC
Patch(es) available on kernel-3.10.0-613.el7

Comment 11 xiywang 2017-04-05 06:22:19 UTC
Hi Aaron,

Could you provide some test steps? Like what tool should I use to set MTU? And what results are you expecting?

My understanding of this bug is that these patches are just provide MTU change feature in host but not guest, is that right?

I saw your patches, vi->max_mtu initiated as 65535. If virtio_has_feature VIRTIO_NET_F_MTU and if virtnet_change_mtu failed, then set vi->max_mtu as struct virtio_net_config offset mtu get from virtio_cread16. Right?

Will this feature finally added to guest? Since the title of this bug is expose host MTU to "guest"...

Thank you.

Best Regards,
Xiyue

Comment 12 Aaron Conole 2017-04-07 10:38:04 UTC
So, you'll need a qemu which supports the host_mtu property for network adapters.  I believe this is part of qemu 2.8, but I'm not 100% sure on that.

You'll need to manually launch the qemu, specifying the host_mtu property on the command line, and setting mtu in the guest.

Does this answer steps?  Sorry they are a bit vague.

These provide mtu change in the guest.  The reason is the host should inform the guest what is the maximum MTU supported (on the bridge, fe) so that there is no need for PMTU discovery across the network.

The patches are in kernel.  The guest kernel will use virtio-net driver, which is where these patches live.

Comment 13 Maxime Coquelin 2017-04-07 10:39:41 UTC
(In reply to Aaron Conole from comment #12)
> So, you'll need a qemu which supports the host_mtu property for network
> adapters.  I believe this is part of qemu 2.8, but I'm not 100% sure on that.
No, this is upcoming qemu 2.9.

Comment 14 yalzhang@redhat.com 2017-05-04 13:27:03 UTC
Hi Aaron, I have tried with different model type in vm, the mtu limitation is different. Would you please help to confirm if this is expected? It is driver specific, right?

For <model type='rtl8139'/>, the max value I can set in vm is 4096
# ifconfig ens3 mtu 9000
SIOCSIFMTU: Invalid argument

# ifconfig ens3 mtu 4097
SIOCSIFMTU: Invalid argument

# ifconfig ens3 mtu 4096

For e1000, the max mtu we can set is 16110;   
For e1000e, it is 9212; 
For virtio, it is 65535;

Comment 15 Aaron Conole 2017-05-04 13:42:37 UTC
This bug is only for the virtio model.  The mtu limitation is (indeed) per-device type.  Only the virtio guest device will read the max-mtu passed in from the host by qemu.

Comment 16 yalzhang@redhat.com 2017-05-05 06:22:04 UTC
Hi Aaron, so there is no need to test for e1000, e1000e and rtl8139 model type, right? 
The difference between rtl8139 and virtio is that when I use virtio model type in guest, the host will inform the guest what is the maximum MTU supported(by virtio-net driver in host); but for rtl8139, the PMTU discovery across the network is needed, right? Even though by different ways, the result is the same, ritht?


I have test it as below, please help to confirm if below test steps makes sense? Thank you very much!

kernel-3.10.0-640.el7.x86_64
qemu-kvm-rhev-2.9.0-2.el7.x86_64
libvirt-3.2.0-4.el7.x86_64

1. start a network with mtu=9000

2. start 2 guestes connected to this network, with mtu=7000 set in interface and model type=virtio, 

3. Then we will get the bridge mtu change to the min one 7000, and 2 tap devices with mtu=7000, the network mtu is 7000
# ifconfig | grep mtu
......
virbr0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 7000
vnet0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 7000
vnet1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 7000

4. on guest1 and guest2, set mtu to 9000
# ifconfig eth0 | grep mtu
eth0: flags=4098<BROADCAST,MULTICAST>  mtu 1500

# ifconfig eth0 mtu 9000

# ifconfig eth0 | grep mtu
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000

5. on guest1, ping guest2
# ping -c 2 -s 8000 -M do 192.168.122.38
PING 192.168.122.38 (192.168.122.38) 8000(8028) bytes of data.

--- 192.168.122.38 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms

# ping -c 2 -s 6972 -M do 192.168.122.38
PING 192.168.122.38 (192.168.122.38) 6972(7000) bytes of data.
6980 bytes from 192.168.122.38: icmp_seq=1 ttl=64 time=0.326 ms
6980 bytes from 192.168.122.38: icmp_seq=2 ttl=64 time=0.292 ms

--- 192.168.122.38 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.292/0.309/0.326/0.017 ms


For rtl8139, I try the same steps but in step 4 set mtu to 4096 as the limitation, it works well.

# ping -c 2 -s 4068 -M do 192.168.122.38
PING 192.168.122.38 (192.168.122.38) 4068(4096) bytes of data.
4076 bytes from 192.168.122.38: icmp_seq=1 ttl=64 time=0.614 ms
4076 bytes from 192.168.122.38: icmp_seq=2 ttl=64 time=0.476 ms

--- 192.168.122.38 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.476/0.545/0.614/0.069 ms
 
# ping -c 2 -s 4500 -M do 192.168.122.38
PING 192.168.122.38 (192.168.122.38) 4500(4528) bytes of data.
ping: local error: Message too long, mtu=4096
ping: local error: Message too long, mtu=4096

--- 192.168.122.38 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1000ms

Comment 17 xiywang 2017-05-08 02:59:05 UTC
1. boot a guest with host_mtu=65520
/usr/libexec/qemu-kvm -name rhel7.4 -cpu IvyBridge -m 4096 -realtime mlock=off -smp 4 \
-drive file=/home/rhel7.4.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,snapshot=off -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0 \
-netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,queues=2 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:a1:d0:5f,vectors=6,mq=on,host_mtu=65520 \
-monitor stdio -device qxl-vga,id=video0 -serial unix:/tmp/console,server,nowait -vnc :1 -spice port=5900,disable-ticketing

2. set mtu in host
# ifconfig tap0 mtu 65520
tap0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65520
        inet6 fe80::2c30:72ff:fe71:3fb0  prefixlen 64  scopeid 0x20<link>
        ether 2e:30:72:71:3f:b0  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 91  bytes 7160 (6.9 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

3. set mtu in guest
# ifconfig eth0 mtu 65535
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65535
        inet 10.73.75.107  netmask 255.255.252.0  broadcast 10.73.75.255
        inet6 fe80::5054:ff:fea1:d05f  prefixlen 64  scopeid 0x20<link>
        inet6 2620:52:0:4948:5054:ff:fea1:d05f  prefixlen 64  scopeid 0x0<global>
        ether 52:54:00:a1:d0:5f  txqueuelen 1000  (Ethernet)
        RX packets 2734  bytes 430119 (420.0 KiB)
        RX errors 0  dropped 11  overruns 0  frame 0
        TX packets 278  bytes 228110 (222.7 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

4. ping external host
# ping -c 3 -s 65535 -M do 10.73.72.146
Error: packet size 65535 is too large. Maximum is 65507
# ping -c 3 -s 65520 -M do 10.73.72.146
Error: packet size 65520 is too large. Maximum is 65507
# ping -c 3 -s 65492 -M do 10.73.72.146
PING 10.73.72.146 (10.73.72.146) 65492(65520) bytes of data.
65500 bytes from 10.73.72.146: icmp_seq=1 ttl=64 time=0.253 ms
65500 bytes from 10.73.72.146: icmp_seq=2 ttl=64 time=0.140 ms
65500 bytes from 10.73.72.146: icmp_seq=3 ttl=64 time=0.227 ms

Comment 18 xiywang 2017-05-08 03:03:23 UTC
scenario 2

1. boot a guest with host_mtu=9000
...
-netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,queues=2 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:a1:d0:5f,vectors=6,mq=on,host_mtu=9000
...

2. set mtu in host to 65520
# ifconfig tap0 mtu 65520
tap0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65520
        inet6 fe80::2c30:72ff:fe71:3fb0  prefixlen 64  scopeid 0x20<link>
        ether 2e:30:72:71:3f:b0  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 91  bytes 7160 (6.9 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

3. set mtu in guest
# ifconfig eth0 mtu 65535
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65535
        inet 10.73.75.107  netmask 255.255.252.0  broadcast 10.73.75.255
        inet6 fe80::5054:ff:fea1:d05f  prefixlen 64  scopeid 0x20<link>
        inet6 2620:52:0:4948:5054:ff:fea1:d05f  prefixlen 64  scopeid 0x0<global>
        ether 52:54:00:a1:d0:5f  txqueuelen 1000  (Ethernet)
        RX packets 2734  bytes 430119 (420.0 KiB)
        RX errors 0  dropped 11  overruns 0  frame 0
        TX packets 278  bytes 228110 (222.7 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

4. ping external host
# ping -c 3 -s 65535 -M do 10.73.72.146
Error: packet size 65535 is too large. Maximum is 65507
# ping -c 3 -s 65492 -M do 10.73.72.146
PING 10.73.72.146 (10.73.72.146) 65492(65520) bytes of data.
65500 bytes from 10.73.72.146: icmp_seq=1 ttl=64 time=0.253 ms
65500 bytes from 10.73.72.146: icmp_seq=2 ttl=64 time=0.140 ms
65500 bytes from 10.73.72.146: icmp_seq=3 ttl=64 time=0.227 ms

Comment 19 xiywang 2017-05-08 03:06:12 UTC
scenario 3

1. boot a guest with host_mtu=9000

2. check mtu in host
tap0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500

3. set mtu in guest to 65535
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65535

4. ping external host
ping -c 3 -s 65492 -M do 10.73.72.146
PING 10.73.72.146 (10.73.72.146) 65492(65520) bytes of data.
65500 bytes from 10.73.72.146: icmp_seq=1 ttl=64 time=0.410 ms
65500 bytes from 10.73.72.146: icmp_seq=2 ttl=64 time=0.271 ms
65500 bytes from 10.73.72.146: icmp_seq=3 ttl=64 time=0.223 ms

--- 10.73.72.146 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.223/0.301/0.410/0.080 ms

Comment 20 xiywang 2017-05-08 03:18:43 UTC
Hi Aaron,

I have some questions on this bz.

1. in scenario 1 (boot guest with host_mtu=65520, set tap0 mtu on host to 65520, set eth0 in guest to 65535), when I ping external host with packets size 65520, it prompt 'Error: packet size 65520 is too large. Maximum is 65507'. But actually I can only use pkt size 65492 maximum to ping out from the guest.
So this 'maximum is 65507' will be a misleading for user?

2. in scenario 2 (boot guest with host_mtu=9000, set tap0 mtu on host to 65520, set eth0 in guest to 65535). I can still ping out from guest successfully with pkt size 65492.
Then how does 'host_mtu=9000' works? Is host_mtu decides the pkt size in guest, or tap0 mtu on host decides the pkt size actually?

3. in scenario 3 (boot guest with host_mtu=9000, leave tap0 mtu on host to default as 1500, set eth0 in guest to 65535), I can STILL ping out from guest with pkt size 65492. Which seems weird to me. None of the host_mtu or tap0 mtu or guest eth0 mtu set to 65520. Why the maxium pkt size could only be 65520? Not 65535, not 1500, or even not 9000?

Thanks,
Xiyue

Comment 21 Aaron Conole 2017-05-09 12:35:59 UTC
(In reply to xiywang from comment #20)
> Hi Aaron,
> 
> I have some questions on this bz.
> 
> 1. in scenario 1 (boot guest with host_mtu=65520, set tap0 mtu on host to
> 65520, set eth0 in guest to 65535), when I ping external host with packets
> size 65520, it prompt 'Error: packet size 65520 is too large. Maximum is
> 65507'. But actually I can only use pkt size 65492 maximum to ping out from
> the guest.
> So this 'maximum is 65507' will be a misleading for user?

If you are manually adjusting the host MTUs after boot then I am not sure what's going to happen (since the feature is an initialization-time setting).  Did I read your test steps correctly?  I don't think you should have to change tap0 mtu, but maybe I'm missing something.

> 2. in scenario 2 (boot guest with host_mtu=9000, set tap0 mtu on host to
> 65520, set eth0 in guest to 65535). I can still ping out from guest
> successfully with pkt size 65492.
> Then how does 'host_mtu=9000' works? Is host_mtu decides the pkt size in
> guest, or tap0 mtu on host decides the pkt size actually?
> 
> 3. in scenario 3 (boot guest with host_mtu=9000, leave tap0 mtu on host to
> default as 1500, set eth0 in guest to 65535), I can STILL ping out from
> guest with pkt size 65492. Which seems weird to me. None of the host_mtu or
> tap0 mtu or guest eth0 mtu set to 65520. Why the maxium pkt size could only
> be 65520? Not 65535, not 1500, or even not 9000?

That sounds like you're not using a virtio net device in the guest.  Can you describe how you're booting your guest (which kernel it is running and what qemu options you are passing)?
 
> Thanks,
> Xiyue

Comment 22 xiywang 2017-05-10 07:05:21 UTC
Hi Aaron,

I listed qemu command line in c17 and I'll paste it here.
1. boot a guest with host_mtu=65520
...
-netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,queues=2 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:a1:d0:5f,vectors=6,mq=on,host_mtu=65520 \
...

After tap0 is created automatically, the mtu is not 65520 as I set but is 1500 as default.
tap0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::cc8d:77ff:feb8:d808  prefixlen 64  scopeid 0x20<link>
        ether ce:8d:77:b8:d8:08  txqueuelen 1000  (Ethernet)
        RX packets 74  bytes 7392 (7.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2087  bytes 163520 (159.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Meanwhile, eth0 mtu in guest is not 65520 as well, it's 1500.
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.73.75.107  netmask 255.255.252.0  broadcast 10.73.75.255
        inet6 fe80::5054:ff:fea1:d05f  prefixlen 64  scopeid 0x20<link>
        inet6 2620:52:0:4948:5054:ff:fea1:d05f  prefixlen 64  scopeid 0x0<global>

Comment 23 xiywang 2017-05-10 07:09:16 UTC
I tested the case without mq again (since I'm not sure whether it will affect mtu) and the result is the same as above.
tap0 in host and eth0 in guest are both initialized as mtu 1500.

Is this the result you're expecting?
Which means I have to manually set host tap0 mtu AND guest eth0 mtu?

Comment 24 Aaron Conole 2017-05-18 14:54:23 UTC
The results you are seeing aren't expected.  I've followed up with the qemu team.

Comment 25 Aaron Conole 2017-06-06 15:24:34 UTC
I think since 1452756 has shown success, there shouldn't be any more blockers to testing / verifying this, correct?

Comment 26 xiywang 2017-06-07 01:11:27 UTC
(In reply to Aaron Conole from comment #25)
> I think since 1452756 has shown success, there shouldn't be any more
> blockers to testing / verifying this, correct?

Yes. I'll also set this bug to Verified.
Test result can be find in:
https://bugzilla.redhat.com/show_bug.cgi?id=1452756#c15

Thank you all Aaron and Maxime.

Comment 28 errata-xmlrpc 2017-08-02 05:03:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:1842


Note You need to log in before you can comment on or make changes to this bug.