Bug 1943613
Summary: | Report a better error message to users if they attempt to live migrate a vm after a neutron network mtu change without first hard rebooting it. | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Jakub Libosvar <jlibosva> | ||||||
Component: | openstack-nova | Assignee: | OSP DFG:Compute <osp-dfg-compute> | ||||||
Status: | CLOSED MIGRATED | QA Contact: | OSP DFG:Compute <osp-dfg-compute> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 16.1 (Train) | CC: | alifshit, dasmith, eglynn, giridhar.ramaraju, jhakimra, jmelvin, jparker, kchamart, oblaut, ralonsoh, rhayakaw, rsafrono, sbauza, sgordon, smooney, vromanso | ||||||
Target Milestone: | --- | Keywords: | TestCannotAutomate, Triaged | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2025-01-18 02:56:30 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Jakub Libosvar
2021-03-26 15:36:45 UTC
Created attachment 1766669 [details]
nova compute logs from target node
Created attachment 1766671 [details]
nova compute logs from source node
libvirt does not allow the mtu to be modified on a running vm so nova cannot update the mtu when its updated in neutron and cannot update it during a live migration. As such the current procedure is expected to fail.We have determined that the current behaviour is correct and the initial bug report was invalid. this change bz has been kept open to track improving the error message and possible enhancing the documentation related to live migrations and mtu changes. by the way for reference we added the MTU to the xml in https://github.com/openstack/nova/commit/f02b3800051234ecc14f3117d5987b1a8ef75877 to resolve https://bugs.launchpad.net/nova/+bug/1747496 so we cannot remove setting it in the xml or we woudl break jumbo frames. if we were to remove setting it then we would need to enhance ovs, ovn or the neutron l2 agent to manage the mtu. we could do this by defining a new Dynamic MTU extension in neutron that was only reported when it was enabled. how this extion would work is as follows, backends that support it are commit to take over managmenet of the interface mtu including updating it if the network mtu changes. if nova sees the extension report it will not generate the mtu elements and delegate the managmeent to neutron. this would not resolve the issue for 16 or 17 but it would resolve the issue in osp 18 or later. for now we can keep this bug for the better error reporting but i think this would a viable path forward in the long term. (In reply to smooney from comment #7) > by the way for reference we added the MTU to the xml in > https://github.com/openstack/nova/commit/ > f02b3800051234ecc14f3117d5987b1a8ef75877 > > to resolve https://bugs.launchpad.net/nova/+bug/1747496 > > so we cannot remove setting it in the xml or we woudl break jumbo frames. > > if we were to remove setting it then we would need to enhance ovs, ovn or > the neutron l2 agent to manage the mtu. > > we could do this by defining a new Dynamic MTU extension in neutron that was > only reported when it was enabled. > how this extion would work is as follows, backends that support it are > commit to take over managmenet of the > interface mtu including updating it if the network mtu changes. if nova sees > the extension report it will > not generate the mtu elements and delegate the managmeent to neutron. > > > this would not resolve the issue for 16 or 17 but it would resolve the issue > in osp 18 or later. > > for now we can keep this bug for the better error reporting but i think this > would a viable path forward in the long term. Wouldn't it be better to request an RFE to libvirt to be able to change MTU during live migration? With that, we can calculate that MTU is no longer valid and request a new MTU on the target node. If this can't be done, it doesn't make much sense to have an option to change MTU in the Neutron API as it breaks other features. (In reply to Jakub Libosvar from comment #8) > (In reply to smooney from comment #7) > > by the way for reference we added the MTU to the xml in > > https://github.com/openstack/nova/commit/ > > f02b3800051234ecc14f3117d5987b1a8ef75877 > > > > to resolve https://bugs.launchpad.net/nova/+bug/1747496 > > > > so we cannot remove setting it in the xml or we woudl break jumbo frames. > > > > if we were to remove setting it then we would need to enhance ovs, ovn or > > the neutron l2 agent to manage the mtu. > > > > we could do this by defining a new Dynamic MTU extension in neutron that was > > only reported when it was enabled. > > how this extion would work is as follows, backends that support it are > > commit to take over managmenet of the > > interface mtu including updating it if the network mtu changes. if nova sees > > the extension report it will > > not generate the mtu elements and delegate the managmeent to neutron. > > > > > > this would not resolve the issue for 16 or 17 but it would resolve the issue > > in osp 18 or later. > > > > for now we can keep this bug for the better error reporting but i think this > > would a viable path forward in the long term. > > Wouldn't it be better to request an RFE to libvirt to be able to change MTU > during live migration? With that, we can calculate that MTU is no longer > valid and request a new MTU on the target node. If this can't be done, it > doesn't make much sense to have an option to change MTU in the Neutron API > as it breaks other features. We could, but Nova doesn't want to support that. Supporting changing the MTU for a running instance is a much larger problem (detecting network-vif-changed events, somehow handling changing the XML of running instance by either unplugging/replugging or rebooting or something else, etc), and we'd rather explicitly refuse it then implement just this tiny subset dealing with live migration. So in this case I think Neutron should make the MTU field read-only to avoid getting into this mess altogether. |