Description of problem: Migration timeouts waiting for event related to network-vif-plugged and VM is not migrated to new host. Version-Release number of selected component (if applicable): OSP puddle is: tag: 16.1_20200930.1 OVN rpm used is: ovn2.13-20.06.2-11.el8fdp.x86_64 How reproducible: Steps to Reproduce: 1.Create a VM with dpdk port, geneve port with Floating IP mapped to it. 2.Migrate VM. openstack server migrate trex --live-migration --host overcloud-computeovsdpdksriov-1.localdomain --block-migration --wait 3. Actual results: VM fails to migrate because of below timeout. /var/log/containers/nova/nova-compute.log:2020-10-08 02:28:25.360 7 WARNING nova.compute.manager [-] [instance: 3bd8d309-396e-4350-9e09-088c4095b46c] Timed out waiting for events: [('network-vif-plugged', 'f54135cc-24e6-490a-9948-3f4cf1ffa553'), ('network-vif-plugged', 'e48eabde-b7f7-4642-98ce-be6bea31bb59')]. If these timeouts are a persistent issue it could mean the networking backend on host overcloud-computeovsdpdksriov-1.localdomain does not support sending these events unless there are port binding host changes which does not happen at this point in the live migration process. You may need to disable the live_migration_wait_for_vif_plug option on host overcloud-computeovsdpdksriov-1.localdomain.: eventlet.timeout.Timeout: 300 seconds Expected results: VM migrates to new host. Additional info:
Hey all, Sorry for the long delay, I've had an issue with the setup. A small summary of what I observed regarding this issue: ------------------------------------------------- We spawn instance, instance is in 'ACTIVE' state: ------------------------------------------------- (Log from node: computeovndpdksriov-0) nova-compute.log:2021-01-03 01:43:58.281 7 INFO nova.virt.libvirt.driver [-] [instance: 498642df-7ccf-41d2-a06d-e5ac76e74e16] Instance spawned successfully. ----------------------------------------- OVN-DPDK interface is bound successfully: ----------------------------------------- (Log from node: computeovndpdksriov-0) ovn-controller.log:2021-01-03T01:43:53.558Z|93760|binding|INFO|Claiming lport 4a6df7af-4347-4798-bb2d-ccbfd8c3da2d for this chassis. ovn-controller.log:2021-01-03T01:43:53.558Z|93761|binding|INFO|4a6df7af-4347-4798-bb2d-ccbfd8c3da2d: Claiming fa:16:3e:71:36:80 20.10.114.156 ovn-metadata-agent.log:2021-01-03 01:43:53.565 26082 INFO networking_ovn.agent.metadata.agent [-] Port 4a6df7af-4347-4798-bb2d-ccbfd8c3da2d in datapath 64dd5eac-b61d-4306-8119-2f1c0e67b770 bound to our chassis server.log:2021-01-03 01:43:51.348 28 DEBUG neutron.plugins.ml2.managers [req-9bed8ac9-ae8d-4d25-aaa8-f829050bc032 8df1067897e84cadaa5e7dcbb1988050 63f328e271e14521ad9d9721c8b3ef02 - default default] Bound port: 4a6df7af-4347-4798-bb2d-ccbfd8c3da2d, host: computeovndpdksriov-0.localdomain, vif_type: vhostuser, vif_details: {"port_filter": false, "vhostuser_mode": "server", "vhostuser_ovs_plug": true, "vhostuser_socket": "/var/lib/vhost_sockets/vhu4a6df7af-43"}, binding_levels: [{'bound_driver': 'ovn', 'bound_segment': {'id':'a06df25b-8716-43dc-b0ce-b80940cf03fa', 'network_type': 'geneve', 'physical_network': None, 'segmentation_id': 1, 'network_id': '974bea1a-cc43-453f-998a-33117725bba2'}}] _bind_port_level /usr/lib/python3.6/site-packages/neutron/plugins/ml2/managers.py:937 ------------------------------------------------------------------------------------------------------------ We initiate migration of instance '498642df-7ccf-41d2-a06d-e5ac76e74e16', instance enters 'MIGRATING' state: ------------------------------------------------------------------------------------------------------------ (Log from node: controller-0) nova-api.log:2021-01-03 01:44:50.795 23 DEBUG nova.compute.api [req-397689be-aecf-4a35-8983-03af602a1cc0 f9528afcd9dd43afb17b0c648675a7ab 63f328e271e14521ad9d9721c8b3ef02 - default default] Instance 498642df-7ccf-41d2-a06d-e5ac76e74e16 is migrating, copying events to all relevant hosts: {'computeovndpdksriov-0.localdomain', 'computeovndpdksriov-1.localdomain'} _get_relevant_hosts /usr/lib/python3.6/site-packages/nova/compute/api.py:5013 server.log:2021-01-03 01:44:40.533 29 DEBUG neutron.plugins.ml2.managers [req-a59caa6d-c265-4b7c-b937-c44eb850a9fc 8df1067897e84cadaa5e7dcbb1988050 63f328e271e14521ad9d9721c8b3ef02 - default default] Attempting to bind port 4a6df7af-4347-4798-bb2d-ccbfd8c3da2d on host computeovndpdksriov-1.localdomain for vnic_type normal with profile None bind_port /usr/lib/python3.6/site-packages/neutron/plugins/ml2/managers.py:795 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- From origin compute (computeovndpdksriov-0) we can see the following log indicating destination compute (computeovndpdksriov-1) was unable to attach network interface: -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- (Log from node: computeovndpdksriov-0) nova-compute.log:2021-01-03 01:49:50.408 7 WARNING nova.compute.manager [-] [instance: 498642df-7ccf-41d2-a06d-e5ac76e74e16] Timed out waiting for events: [('network-vif-plugged', '4a6df7af-4347-4798-bb2d-ccbfd8c3da2d')]. If these timeouts are a persistent issue it could mean the networking backend on host computeovndpdksriov-1.localdomain does not support sending these events unless there are port binding host changes which does not happen at this point in the live migration process. You may need to disable the live_migration_wait_for_vif_plug option on host computeovndpdksriov-1.localdomain.: eventlet.timeout.Timeout: 300 seconds ------------------------------------------------------------------------------------------------------ Origin compute notifies destination compute to drop the migration, instance returns to 'ACTIVE' state: ------------------------------------------------------------------------------------------------------ nova-compute.log:2021-01-03 01:49:50.657 7 DEBUG nova.compute.manager [-] [instance: 498642df-7ccf-41d2-a06d-e5ac76e74e16] Calling destination to drop move claim. _rollback_live_migration /usr/lib/python3.6/site-packages/nova/compute/manager.py:7851 -------------------------------------- Destination compute fails to bind port: -------------------------------------- (Log from node: computeovndpdksriov-1) ovn-controller.log:2021-01-03T01:44:45.693Z|89307|binding|INFO|Not claiming lport 4a6df7af-4347-4798-bb2d-ccbfd8c3da2d, chassis db1d0617-3361-4493-a99f-c270fa34be64 requested-chassis computeovndpdksriov-0.localdomain nova-compute.log:2021-01-03 01:49:55.153 8 DEBUG nova.network.neutronv2.api [req-eecda5fd-ffa5-4616-95a3-40a8f3f7672d e3442d6e406a4595a319dfa4e9c000ea 453a762cc58c4bf386dbe1428631f122 - default default] [instance: 498642df-7ccf-41d2-a06d-e5ac76e74e16] Removing port 4a6df7af-4347-4798-bb2d-ccbfd8c3da2d migration profile _clear_migration_port_profile /usr/lib/python3.6/site-packages/nova/network/neutronv2/api.py:274 nova-compute.log:2021-01-03 01:49:58.066 8 DEBUG nova.network.neutronv2.api [req-eecda5fd-ffa5-4616-95a3-40a8f3f7672d e3442d6e406a4595a319dfa4e9c000ea 453a762cc58c4bf386dbe1428631f122 - default default] Deleted binding for port 4a6df7af-4347-4798-bb2d-ccbfd8c3da2d and host computeovndpdksriov-1.localdomain. delete_port_binding /usr/lib/python3.6/site-packages/nova/network/neutronv2/api.py:1341 nova-compute.log:2021-01-03 01:49:58.104 8 DEBUG nova.network.os_vif_util [req-eecda5fd-ffa5-4616-95a3-40a8f3f7672d e3442d6e406a4595a319dfa4e9c000ea 453a762cc58c4bf386dbe1428631f122 - default default] Converted object VIFVHostUser(active=False,address=fa:16:3e:71:36:80,has_traffic_filtering=False,id=4a6df7af-4347-4798-bb2d-ccbfd8c3da2d,mode='server',network=Network(974bea1a-cc43-453f-998a-33117725bba2),path='/var/lib/vhost_sockets/vhu4a6df7af-43',plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=True,vif_name='vhu4a6df7af-43') nova_to_osvif_vif /usr/lib/python3.6/site-packages/nova/network/os_vif_util.py:553 nova-compute.log:2021-01-03 01:49:58.104 8 DEBUG os_vif [req-eecda5fd-ffa5-4616-95a3-40a8f3f7672d e3442d6e406a4595a319dfa4e9c000ea 453a762cc58c4bf386dbe1428631f122 - default default] Unplugging vif VIFVHostUser(active=False,address=fa:16:3e:71:36:80,has_traffic_filtering=False,id=4a6df7af-4347-4798-bb2d-ccbfd8c3da2d,mode='server',network=Network(974bea1a-cc43-453f-998a-33117725bba2),path='/var/lib/vhost_sockets/vhu4a6df7af-43',plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=True,vif_name='vhu4a6df7af-43') unplug /usr/lib/python3.6/site-packages/os_vif/__init__.py:109 nova-compute.log:2021-01-03 01:49:58.143 8 INFO os_vif [req-eecda5fd-ffa5-4616-95a3-40a8f3f7672d e3442d6e406a4595a319dfa4e9c000ea 453a762cc58c4bf386dbe1428631f122 - default default] Successfully unplugged vif VIFVHostUser(active=False,address=fa:16:3e:71:36:80,has_traffic_filtering=False,id=4a6df7af-4347-4798-bb2d-ccbfd8c3da2d,mode='server',network=Network(974bea1a-cc43-453f-998a-33117725bba2),path='/var/lib/vhost_sockets/vhu4a6df7af-43',plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=True,vif_name='vhu4a6df7af-43') Will be glad to share credentials for the deployment via mail.
Fixed in RHOS-16.1-RHEL-8-20210129.n. All our NFV testcase are passing, I alse verified it manually (overcloud) [stack@undercloud-0 ~]$ openstack server list --a +--------------------------------------+--------------------------------------------+--------+---------------------------------------+---------------------------------------+--------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+--------------------------------------------+--------+---------------------------------------+---------------------------------------+--------+ | abe98744-d409-4179-a9c6-736da1b42bba | tempest-TestDpdkScenarios-server-112007100 | ACTIVE | dpdk-mgmt=10.10.10.119, 10.35.141.169 | rhel-guest-image-7-6-210-x86-64-qcow2 | | +--------------------------------------+--------------------------------------------+--------+---------------------------------------+---------------------------------------+--------+ (overcloud) [stack@undercloud-0 ~]$ openstack server show abe98744-d409-4179-a9c6-736da1b42bba | grep compute | sed 's/ *//g' ^[[A|OS-EXT-SRV-ATTR:host|computeovndpdksriov-1.localdomain| |OS-EXT-SRV-ATTR:hypervisor_hostname|computeovndpdksriov-1.localdomain| (overcloud) [stack@undercloud-0 ~]$ openstack server migrate abe98744-d409-4179-a9c6-736da1b42bba --live-migration --host computeovndpdksriov-0.localdomain --block-migration --wait Progress: 94Complete (overcloud) [stack@undercloud-0 ~]$ openstack server show abe98744-d409-4179-a9c6-736da1b42bba | grep compute | sed 's/ *//g' |OS-EXT-SRV-ATTR:host|computeovndpdksriov-0.localdomain| |OS-EXT-SRV-ATTR:hypervisor_hostname|computeovndpdksriov-0.localdomain|
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.4 director bug fix advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0817
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days