Bug 1766414 - [downstream] [UI] hint after updating mtu on networks connected to running VMs
Summary: [downstream] [UI] hint after updating mtu on networks connected to running VMs
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.3.6
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ovirt-4.5.0
: ---
Assignee: Nobody
QA Contact: Michael Burman
URL:
Whiteboard:
Depends On: 1676708
Blocks: 1113630 1848986
TreeView+ depends on / blocked
 
Reported: 2019-10-29 01:22 UTC by Germano Veit Michel
Modified: 2020-06-19 12:39 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1676708
Environment:
Last Closed:
oVirt Team: Network
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 4540631 None None None 2019-10-30 03:23:06 UTC

Description Germano Veit Michel 2019-10-29 01:22:17 UTC
+++ This bug was initially created as a clone of Bug #1676708 +++

Description of problem:
After updating network MTU from 1500 to 9000, while VM connected to this network all MTUs (for bridge and for vnet VM devices) updated to correct values, but after migrating VM to another host MTU again 1500, restarting VM fixes this, and migration now sets 9000 mtu.
This can lead to serious connection problems between VMs for example when MTU was changed, everything is working, but after some time you decide to upgrade hosts, put them in maintenance, VM migrates to another host and network connectivity randomly fails.

before migration:
vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq master test-net-mtu state UNKNOWN mode DEFAULT group default qlen 1000
after migration:
vnet12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master test-net-mtu state UNKNOWN mode DEFAULT group default qlen 1000

Version-Release number of selected component (if applicable):
ovirt-engine 4.3.0 (also 4.2.8)

How reproducible:
100%

Steps to Reproduce:
1. create network with default 1500 mtu
2. start VM with NIC attached to this network
3. change MTU for network to 9000
4. check, that MTU also changed for vm vnic:
 virsh -r domiflist test-vm
 ip link show vnet_from_first_command
must be 9000
5. migrate VM to another host
6. check vnet MTU with same commands from step 4, now it's 1500 again

Actual results:
MTU for vnet is 1500 after migration

Expected results:
MTU for vnet is 9000 after migration
OR alternatively:
migration must be prohibited until VM restarted
OR
engine should mark VM as having next run config

Additional info:

--- Additional comment from Dominik Holler on 2019-02-13 21:53:46 UTC ---

Sergey, would you please share the vdsm.log of the source and destination host, and most important, the engine.log containing the migration?

--- Additional comment from Michael Burman on 2019-02-14 10:17:29 UTC ---

QE can't reproduce on 4.3.0.4-0.1.el7

Please note that it is not supported to update network's MTU while it used by VM, the change will fail on vdsm side:
"VDSM host_mixed_3 command HostSetupNetworksVDS failed: Bridge mtu has interfaces set([u'vnet0']) connected"

You need first unplug the vNIC from the VM, update the network's MTU, wait the change applied successfully on the host(UI notification), plug the vNIC back. Now the MTU updated successfully and preserved after migration.

--- Additional comment from Sergey on 2019-02-14 10:34:14 UTC ---

Attached engine and vdsm logs from src and dst host, don't pay attention to errors about failed network creation, I've created it on wrong interface in our test env.
Migrating VM name: empty-no-os
Net name: test-vlan-noconn
Net VDSM Name: on68b632b6f2134

Before migration:
34: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast master on68b632b6f2134 state UNKNOWN group default qlen 1000

After:
36: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master on68b632b6f2134 state UNKNOWN group default qlen 1000

--- Additional comment from Sergey on 2019-02-14 12:05:28 UTC ---

(In reply to Michael Burman from comment #2)
> QE can't reproduce on 4.3.0.4-0.1.el7
> 
> Please note that it is not supported to update network's MTU while it used
> by VM, the change will fail on vdsm side:
> "VDSM host_mixed_3 command HostSetupNetworksVDS failed: Bridge mtu has
> interfaces set([u'vnet0']) connected"
> 
> You need first unplug the vNIC from the VM, update the network's MTU, wait
> the change applied successfully on the host(UI notification), plug the vNIC
> back. Now the MTU updated successfully and preserved after migration.

But in fact I've tested on 2 installations and both gave no errors while updating MTU, and actually changed MTU on host side, on both I've used
"Linux Bridge" switch type and VLAN network, maybe network type(connected or vlan) is critical here.

--- Additional comment from Dominik Holler on 2019-02-14 12:12:26 UTC ---

(In reply to Sergey from comment #4)
> (In reply to Michael Burman from comment #2)
> > QE can't reproduce on 4.3.0.4-0.1.el7
> > 
> > Please note that it is not supported to update network's MTU while it used
> > by VM, the change will fail on vdsm side:
> > "VDSM host_mixed_3 command HostSetupNetworksVDS failed: Bridge mtu has
> > interfaces set([u'vnet0']) connected"
> > 
> > You need first unplug the vNIC from the VM, update the network's MTU, wait
> > the change applied successfully on the host(UI notification), plug the vNIC
> > back. Now the MTU updated successfully and preserved after migration.
> 
> But in fact I've tested on 2 installations and both gave no errors while
> updating MTU, and actually changed MTU on host side, on both I've used
> "Linux Bridge" switch type and VLAN network, maybe network type(connected or
> vlan) is critical here.


The behavior of the host should not depend on the network type.
Looks like the updated MTU was never propagated to libvirt and the guest OS.
The expected behavior is documented in
https://ovirt.org/develop/release-management/features/network/managed_mtu_for_vm_networks.html#update-mtu-flow

Do you have a suggestion what would help you to know that the unplug/plug step is required?

--- Additional comment from Sergey on 2019-02-14 13:22:45 UTC ---

(In reply to Dominik Holler from comment #5)
> The behavior of the host should not depend on the network type.
> Looks like the updated MTU was never propagated to libvirt and the guest OS.
> The expected behavior is documented in
> https://ovirt.org/develop/release-management/features/network/
> managed_mtu_for_vm_networks.html#update-mtu-flow
> 
> Do you have a suggestion what would help you to know that the unplug/plug
> step is required?

Thanks for a link, now I can see, that it should not work,
when mtu on VM device changed to 9000 without any actions on VM, and ping with large packets started to flow(after also changing MTU inside),
it made me believe, that migration also should work without problems, it was the only feature to get MTU update fully function from my point of view  :)

Maybe warning message, when saving net with changed mtu, stating that NIC unplug/plug or VM shutdown/poweron required to change MTU, also it can include list of affected VMs.
Or "next run config", but next run has a drawback, it won't clear after unplugging/plugging NIC.

Comment 8 Rolfe Dlugy-Hegwer 2020-06-04 17:08:21 UTC
The updated documentation contains notes stating:

IMPORTANT
If you change the network’s MTU settings, you must propagate this change to the running virtual machines on the network: Hot unplug and replug every virtual machine’s vNIC that should apply the MTU setting, or restart the virtual machines. Otherwise, these interfaces fail when the virtual machine migrates to another host. For more information, see BZ#1766414.

See:
- https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4-beta/html-single/administration_guide/index
- https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4-beta/html-single/planning_and_prerequisites_guide/index


Note You need to log in before you can comment on or make changes to this bug.