Description of problem:
On an environment that was after migrating from ovs to ovn, on attempt to migrate one of the VMs created before the migration failed. openstack server migrate command completes without errors but the VM remain on the same node.
I found the following in nova log on that node
Live Migration failure: Cannot get interface MTU on 'qbr2d71c5c2-c9': No such device
At the same time interface is present:
[root@compute-0 nova]# ip a | grep qbr2d71c5c2-c9
33: qbr2d71c5c2-c9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
35: qvb2d71c5c2-c9@qvo2d71c5c2-c9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master qbr2d71c5c2-c9 state UP group default qlen 1000
36: tap2d71c5c2-c9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master qbr2d71c5c2-c9 state UNKNOWN group default qlen 1000
Tried to migrate other VMs and they migrated successfully.
I noticed that neutron port of this specific VM is DOWN, others were ACTIVE
(overcloud) [stack@undercloud-0 ~]$ openstack port list | grep ovn-migration-port-norma
| 2d71c5c2-c942-457b-8947-6a6df7bb9020 | ovn-migration-port-normal-ext-pinger-1 | fa:16:3e:ee:ed:72 | ip_address='10.0.0.186', subnet_id='89225144-8874-4e06-960b-e92d99d4cd90' | DOWN |
| 342be005-77b4-48c5-8c5d-78fee9123c5c | ovn-migration-port-normal-int-pinger-2 | fa:16:3e:98:c0:7c | ip_address='192.168.168.129', subnet_id='e0af1eea-2468-486b-b246-6344a57ee950' | ACTIVE |
| 5db7b849-8aae-43bd-825f-561c1f9ebffc | ovn-migration-port-normal-ext-pinger-2 | fa:16:3e:2a:dd:c6 | ip_address='10.0.0.242', subnet_id='89225144-8874-4e06-960b-e92d99d4cd90' | ACTIVE |
| 78d95720-667c-4b71-adb7-abb4ddfcf2fd | ovn-migration-port-normal-int-pinger-4 | fa:16:3e:d2:7e:ef | ip_address='192.168.168.125', subnet_id='e0af1eea-2468-486b-b246-6344a57ee950' | ACTIVE |
| 877bc727-a0f7-40f0-a963-e3d7dc615936 | ovn-migration-port-normal-int-pinger-3 | fa:16:3e:2a:4d:40 | ip_address='192.168.168.206', subnet_id='e0af1eea-2468-486b-b246-6344a57ee950' | ACTIVE |
| be3582eb-a2e8-4ffa-84a2-fe5481342043 | ovn-migration-port-normal-ext-pinger-3 | fa:16:3e:bd:7a:28 | ip_address='10.0.0.250', subnet_id='89225144-8874-4e06-960b-e92d99d4cd90' | ACTIVE |
| d6700a72-d076-45e9-8a93-68bc49703d9e | ovn-migration-port-normal-ext-pinger-4 | fa:16:3e:4f:77:72 | ip_address='10.0.0.230', subnet_id='89225144-8874-4e06-960b-e92d99d4cd90' | ACTIVE |
| da18bfb4-dbb3-4d13-bb46-a703afd44819 | ovn-migration-port-normal-ext-pinger-5 | fa:16:3e:55:f5:f5 | ip_address='10.0.0.211', subnet_id='89225144-8874-4e06-960b-e92d99d4cd90' | ACTIVE |
| e70315b7-ce96-4cec-afbd-16761d131b44 | ovn-migration-port-normal-int-pinger-1 | fa:16:3e:2a:91:f9 | ip_address='192.168.168.152', subnet_id='e0af1eea-2468-486b-b246-6344a57ee950' | ACTIVE |
Version-Release number of selected component (if applicable):
RHOS-17.1-RHEL-9-20230511.n.1
How reproducible:
Happens occassionally. Seen in other CI jobs that live_migration_validation fails complaining that server not migrated
Steps to Reproduce:
1. Deploy an OVS environment.
2. Create networks, router, security groups, several VMs (I had 8, 2 VMs on each compute, one VM connected to external net and one to internal)
3. Migrate the environment to OVN according to official procedure https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.0/html/testing_migration_of_the_networking_service_to_the_ml2ovn_mechanism_driver/migrating-ovs-to-ovn
4. Try to live migrate all VMs
Actual results:
One of VMs failed to migrate, it's port is DOWN
Expected results:
All VMs are able to migrate. There are no VMs with ports in state DOWN
Additional info:
more detailed snippet from nova log
2023-05-17 19:46:03.583 2 DEBUG nova.virt.libvirt.driver [-] [instance: 8c64aa35-debc-4ea5-a5c8-ed8145757cb0] About to invoke the migrate API _live_migration_operation /usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py:9848
2023-05-17 19:46:03.726 2 ERROR nova.virt.libvirt.driver [-] [instance: 8c64aa35-debc-4ea5-a5c8-ed8145757cb0] Live Migration failure: Cannot get interface MTU on 'qbr2d71c5c2-c9': No such device: libvirt.libvirtError: Cannot get interface MTU on 'qbr2d71c5c2-c9': No such device
2023-05-17 19:46:03.727 2 DEBUG nova.virt.libvirt.driver [-] [instance: 8c64aa35-debc-4ea5-a5c8-ed8145757cb0] Migration operation thread notification thread_finished /usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py:10206
2023-05-17 19:46:03.748 2 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] [POLLIN] on fd 27 __log_wakeup /usr/lib64/python3.9/site-packages/ovs/poller.py:263
2023-05-17 19:46:03.750 2 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] [POLLIN] on fd 27 __log_wakeup /usr/lib64/python3.9/site-packages/ovs/poller.py:263
2023-05-17 19:46:03.885 2 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] [POLLIN] on fd 27 __log_wakeup /usr/lib64/python3.9/site-packages/ovs/poller.py:263
2023-05-17 19:46:04.062 2 DEBUG nova.virt.libvirt.migration [-] [instance: 8c64aa35-debc-4ea5-a5c8-ed8145757cb0] VM running on src, migration failed _log /usr/lib/python3.9/site-packages/nova/virt/libvirt/migration.py:432
2023-05-17 19:46:04.062 2 DEBUG nova.virt.libvirt.driver [-] [instance: 8c64aa35-debc-4ea5-a5c8-ed8145757cb0] Fixed incorrect job type to be 4 _live_migration_monitor /usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py:10020
2023-05-17 19:46:04.063 2 ERROR nova.virt.libvirt.driver [-] [instance: 8c64aa35-debc-4ea5-a5c8-ed8145757cb0] Migration operation has aborted
2023-05-17 19:46:04.073 2 DEBUG nova.compute.manager [req-89db5ab4-d0f7-4d19-bc45-0426866fab09 8196688ca00b4c26ba0d591f03fcd95d 91a6848448f4411085efcf11e54edfb1 - default default] [instance: 8c64aa35-debc-4ea5-a5c8-ed8145757cb0] Received event network-vif-plugged-2d71c5c2-c942-457b-8947-6a6df7bb9020 external_instance_event /usr/lib/python3.9/site-packages/nova/compute/manager.py:10516
These bugs have blocker flags set while no release flags (rhos-17.1+). Please make sure the issue is ACKed for release before it's accepted as a release blocker.