Bug 1943613

Summary: Report a better error message to users if they attempt to live migrate a vm after a neutron network mtu change without first hard rebooting it.
Product: Red Hat OpenStack Reporter: Jakub Libosvar <jlibosva>
Component: openstack-novaAssignee: OSP DFG:Compute <osp-dfg-compute>
Status: CLOSED MIGRATED QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: high Docs Contact:
Priority: medium    
Version: 16.1 (Train)CC: alifshit, dasmith, eglynn, giridhar.ramaraju, jhakimra, jmelvin, jparker, kchamart, oblaut, ralonsoh, rhayakaw, rsafrono, sbauza, sgordon, smooney, vromanso
Target Milestone: ---Keywords: TestCannotAutomate, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2025-01-18 02:56:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
nova compute logs from target node
none
nova compute logs from source node none

Description Jakub Libosvar 2021-03-26 15:36:45 UTC
Description of problem:
VMs cannot live migrate after MTU of a network its port is in is changed.

Version-Release number of selected component (if applicable):
openstack-nova-compute-20.4.1-1.20200917173450.el8ost.noarch

How reproducible:
Always

Steps to Reproduce:
1. Have a VM on a network A
2. openstack network set --mtu 1442 a
3. Live migrate the VM

Actual results:
Fails because MTU of source and target node differs

Expected results:
Succeeds with the new MTU

Additional info:
2021-03-26 15:35:13.615 7 ERROR nova.virt.libvirt.driver [-] [instance: a79d5ec4-d768-4dd0-8c08-bc6f4b07e321] Live Migration failure: unsupported configuration: Target network card MTU 1442 does not match source 1450: libvirt.libvirtError: unsupported configuration: Target network card MTU 1442 does not match source 1450

Comment 1 Jakub Libosvar 2021-03-26 15:39:33 UTC
Created attachment 1766669 [details]
nova compute logs from target node

Comment 2 Jakub Libosvar 2021-03-26 15:40:56 UTC
Created attachment 1766671 [details]
nova compute logs from source node

Comment 6 smooney 2021-04-07 19:43:12 UTC
libvirt does not allow the mtu to be modified on a running vm so nova cannot update the mtu when its updated in neutron and cannot update it during a live migration.
As such the current procedure is expected to fail.We have determined that the current behaviour is correct and the initial bug report was invalid.
this change bz has been kept open to track improving the error message and possible enhancing the documentation related to live migrations and mtu changes.

Comment 7 smooney 2021-04-26 12:39:34 UTC
by the way for reference we added the MTU to the xml in 
https://github.com/openstack/nova/commit/f02b3800051234ecc14f3117d5987b1a8ef75877

to resolve https://bugs.launchpad.net/nova/+bug/1747496

so we cannot remove setting it in the xml or we woudl break jumbo frames.

if we were to remove setting it then we would need to enhance ovs, ovn or the neutron l2 agent to manage the mtu.

we could do this by defining a new Dynamic MTU extension in neutron that was only reported when it was enabled.
how this extion would work is as follows, backends that support it are commit to take over managmenet of the
interface mtu including updating it if the network mtu changes. if nova sees the extension report it will
not generate the mtu elements and delegate the managmeent to neutron.


this would not resolve the issue for 16 or 17 but it would resolve the issue in osp 18 or later.

for now we can keep this bug for the better error reporting but i think this would a viable path forward in the long term.

Comment 8 Jakub Libosvar 2022-01-26 22:22:55 UTC
(In reply to smooney from comment #7)
> by the way for reference we added the MTU to the xml in 
> https://github.com/openstack/nova/commit/
> f02b3800051234ecc14f3117d5987b1a8ef75877
> 
> to resolve https://bugs.launchpad.net/nova/+bug/1747496
> 
> so we cannot remove setting it in the xml or we woudl break jumbo frames.
> 
> if we were to remove setting it then we would need to enhance ovs, ovn or
> the neutron l2 agent to manage the mtu.
> 
> we could do this by defining a new Dynamic MTU extension in neutron that was
> only reported when it was enabled.
> how this extion would work is as follows, backends that support it are
> commit to take over managmenet of the
> interface mtu including updating it if the network mtu changes. if nova sees
> the extension report it will
> not generate the mtu elements and delegate the managmeent to neutron.
> 
> 
> this would not resolve the issue for 16 or 17 but it would resolve the issue
> in osp 18 or later.
> 
> for now we can keep this bug for the better error reporting but i think this
> would a viable path forward in the long term.

Wouldn't it be better to request an RFE to libvirt to be able to change MTU during live migration? With that, we can calculate that MTU is no longer valid and request a new MTU on the target node. If this can't be done, it doesn't make much sense to have an option to change MTU in the Neutron API as it breaks other features.

Comment 9 Artom Lifshitz 2022-02-01 14:57:30 UTC
(In reply to Jakub Libosvar from comment #8)
> (In reply to smooney from comment #7)
> > by the way for reference we added the MTU to the xml in 
> > https://github.com/openstack/nova/commit/
> > f02b3800051234ecc14f3117d5987b1a8ef75877
> > 
> > to resolve https://bugs.launchpad.net/nova/+bug/1747496
> > 
> > so we cannot remove setting it in the xml or we woudl break jumbo frames.
> > 
> > if we were to remove setting it then we would need to enhance ovs, ovn or
> > the neutron l2 agent to manage the mtu.
> > 
> > we could do this by defining a new Dynamic MTU extension in neutron that was
> > only reported when it was enabled.
> > how this extion would work is as follows, backends that support it are
> > commit to take over managmenet of the
> > interface mtu including updating it if the network mtu changes. if nova sees
> > the extension report it will
> > not generate the mtu elements and delegate the managmeent to neutron.
> > 
> > 
> > this would not resolve the issue for 16 or 17 but it would resolve the issue
> > in osp 18 or later.
> > 
> > for now we can keep this bug for the better error reporting but i think this
> > would a viable path forward in the long term.
> 
> Wouldn't it be better to request an RFE to libvirt to be able to change MTU
> during live migration? With that, we can calculate that MTU is no longer
> valid and request a new MTU on the target node. If this can't be done, it
> doesn't make much sense to have an option to change MTU in the Neutron API
> as it breaks other features.

We could, but Nova doesn't want to support that. Supporting changing the MTU for a running instance is a much larger problem (detecting network-vif-changed events, somehow handling changing the XML of running instance by either unplugging/replugging or rebooting or something else, etc), and we'd rather explicitly refuse it then implement just this tiny subset dealing with live migration. So in this case I think Neutron should make the MTU field read-only to avoid getting into this mess altogether.