Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2208260

Summary: Some VMs can't live migrate after OVN migration
Product: Red Hat OpenStack Reporter: Roman Safronov <rsafrono>
Component: openstack-neutronAssignee: Arnau Verdaguer <averdagu>
Status: CLOSED NOTABUG QA Contact: Eran Kuris <ekuris>
Severity: high Docs Contact:
Priority: high    
Version: 17.1 (Wallaby)CC: averdagu, bcafarel, chrisw, ekuris, jamsmith, jlibosva, mlavalle, pgrist, scohen, ykarel
Target Milestone: gaKeywords: Reopened, TestBlocker, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-06-19 13:35:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2211036    
Bug Blocks:    

Description Roman Safronov 2023-05-18 11:52:32 UTC
Description of problem:
On an environment that was after migrating from ovs to ovn, on attempt to migrate one of the VMs created before the migration failed. openstack server migrate command completes without errors but the VM remain on the same node.

I found the following in nova log on that node
Live Migration failure: Cannot get interface MTU on 'qbr2d71c5c2-c9': No such device

At the same time interface is present:
[root@compute-0 nova]# ip a | grep qbr2d71c5c2-c9
33: qbr2d71c5c2-c9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
35: qvb2d71c5c2-c9@qvo2d71c5c2-c9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master qbr2d71c5c2-c9 state UP group default qlen 1000
36: tap2d71c5c2-c9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master qbr2d71c5c2-c9 state UNKNOWN group default qlen 1000

Tried to migrate other VMs and they migrated successfully.

I noticed that neutron port of this specific VM is DOWN, others were ACTIVE
(overcloud) [stack@undercloud-0 ~]$ openstack port list | grep ovn-migration-port-norma
| 2d71c5c2-c942-457b-8947-6a6df7bb9020 | ovn-migration-port-normal-ext-pinger-1 | fa:16:3e:ee:ed:72 | ip_address='10.0.0.186', subnet_id='89225144-8874-4e06-960b-e92d99d4cd90'                          | DOWN   |
| 342be005-77b4-48c5-8c5d-78fee9123c5c | ovn-migration-port-normal-int-pinger-2 | fa:16:3e:98:c0:7c | ip_address='192.168.168.129', subnet_id='e0af1eea-2468-486b-b246-6344a57ee950'                     | ACTIVE |
| 5db7b849-8aae-43bd-825f-561c1f9ebffc | ovn-migration-port-normal-ext-pinger-2 | fa:16:3e:2a:dd:c6 | ip_address='10.0.0.242', subnet_id='89225144-8874-4e06-960b-e92d99d4cd90'                          | ACTIVE |
| 78d95720-667c-4b71-adb7-abb4ddfcf2fd | ovn-migration-port-normal-int-pinger-4 | fa:16:3e:d2:7e:ef | ip_address='192.168.168.125', subnet_id='e0af1eea-2468-486b-b246-6344a57ee950'                     | ACTIVE |
| 877bc727-a0f7-40f0-a963-e3d7dc615936 | ovn-migration-port-normal-int-pinger-3 | fa:16:3e:2a:4d:40 | ip_address='192.168.168.206', subnet_id='e0af1eea-2468-486b-b246-6344a57ee950'                     | ACTIVE |
| be3582eb-a2e8-4ffa-84a2-fe5481342043 | ovn-migration-port-normal-ext-pinger-3 | fa:16:3e:bd:7a:28 | ip_address='10.0.0.250', subnet_id='89225144-8874-4e06-960b-e92d99d4cd90'                          | ACTIVE |
| d6700a72-d076-45e9-8a93-68bc49703d9e | ovn-migration-port-normal-ext-pinger-4 | fa:16:3e:4f:77:72 | ip_address='10.0.0.230', subnet_id='89225144-8874-4e06-960b-e92d99d4cd90'                          | ACTIVE |
| da18bfb4-dbb3-4d13-bb46-a703afd44819 | ovn-migration-port-normal-ext-pinger-5 | fa:16:3e:55:f5:f5 | ip_address='10.0.0.211', subnet_id='89225144-8874-4e06-960b-e92d99d4cd90'                          | ACTIVE |
| e70315b7-ce96-4cec-afbd-16761d131b44 | ovn-migration-port-normal-int-pinger-1 | fa:16:3e:2a:91:f9 | ip_address='192.168.168.152', subnet_id='e0af1eea-2468-486b-b246-6344a57ee950'                     | ACTIVE |



Version-Release number of selected component (if applicable):
RHOS-17.1-RHEL-9-20230511.n.1

How reproducible:
Happens occassionally. Seen in other CI jobs that live_migration_validation fails complaining that server not migrated

Steps to Reproduce:
1. Deploy an OVS environment. 
2. Create networks, router, security groups, several VMs (I had 8, 2 VMs on each compute, one VM connected to external net and one to internal)
3. Migrate the environment to OVN according to official procedure https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.0/html/testing_migration_of_the_networking_service_to_the_ml2ovn_mechanism_driver/migrating-ovs-to-ovn
4. Try to live migrate all VMs

Actual results:
One of VMs failed to migrate, it's port is DOWN

Expected results:
All VMs are able to migrate. There are no VMs with ports in state DOWN

Additional info:

more detailed snippet from nova log

2023-05-17 19:46:03.583 2 DEBUG nova.virt.libvirt.driver [-] [instance: 8c64aa35-debc-4ea5-a5c8-ed8145757cb0] About to invoke the migrate API _live_migration_operation /usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py:9848
2023-05-17 19:46:03.726 2 ERROR nova.virt.libvirt.driver [-] [instance: 8c64aa35-debc-4ea5-a5c8-ed8145757cb0] Live Migration failure: Cannot get interface MTU on 'qbr2d71c5c2-c9': No such device: libvirt.libvirtError: Cannot get interface MTU on 'qbr2d71c5c2-c9': No such device
2023-05-17 19:46:03.727 2 DEBUG nova.virt.libvirt.driver [-] [instance: 8c64aa35-debc-4ea5-a5c8-ed8145757cb0] Migration operation thread notification thread_finished /usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py:10206
2023-05-17 19:46:03.748 2 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] [POLLIN] on fd 27 __log_wakeup /usr/lib64/python3.9/site-packages/ovs/poller.py:263
2023-05-17 19:46:03.750 2 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] [POLLIN] on fd 27 __log_wakeup /usr/lib64/python3.9/site-packages/ovs/poller.py:263
2023-05-17 19:46:03.885 2 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] [POLLIN] on fd 27 __log_wakeup /usr/lib64/python3.9/site-packages/ovs/poller.py:263
2023-05-17 19:46:04.062 2 DEBUG nova.virt.libvirt.migration [-] [instance: 8c64aa35-debc-4ea5-a5c8-ed8145757cb0] VM running on src, migration failed _log /usr/lib/python3.9/site-packages/nova/virt/libvirt/migration.py:432
2023-05-17 19:46:04.062 2 DEBUG nova.virt.libvirt.driver [-] [instance: 8c64aa35-debc-4ea5-a5c8-ed8145757cb0] Fixed incorrect job type to be 4 _live_migration_monitor /usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py:10020
2023-05-17 19:46:04.063 2 ERROR nova.virt.libvirt.driver [-] [instance: 8c64aa35-debc-4ea5-a5c8-ed8145757cb0] Migration operation has aborted
2023-05-17 19:46:04.073 2 DEBUG nova.compute.manager [req-89db5ab4-d0f7-4d19-bc45-0426866fab09 8196688ca00b4c26ba0d591f03fcd95d 91a6848448f4411085efcf11e54edfb1 - default default] [instance: 8c64aa35-debc-4ea5-a5c8-ed8145757cb0] Received event network-vif-plugged-2d71c5c2-c942-457b-8947-6a6df7bb9020 external_instance_event /usr/lib/python3.9/site-packages/nova/compute/manager.py:10516

Comment 1 Ihar Hrachyshka 2023-05-22 15:04:12 UTC
Here's where nova modifies xml https://github.com/openstack/nova/blob/2dde4538bcbe3400c39a3a5934fe6f48f02a3c52/nova/virt/libvirt/migration.py#L347

It should probably check if source is hybrid and dest is not then remove all qbr interfaces from dest xml.

Comment 17 Lukas Svaty 2023-06-16 08:58:50 UTC
These bugs have blocker flags set while no release flags (rhos-17.1+). Please make sure the issue is ACKed for release before it's accepted as a release blocker.