Bug 1886244 - Migration timeouts waiting for event related to network-vif-plugged
Summary: Migration timeouts waiting for event related to network-vif-plugged
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-networking-ovn
Version: 16.1 (Train)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: z4
: 16.1 (Train on RHEL 8.2)
Assignee: ffernand
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks: 1934357
TreeView+ depends on / blocked
 
Reported: 2020-10-08 02:44 UTC by yogananth subramanian
Modified: 2023-09-15 00:49 UTC (History)
15 users (show)

Fixed In Version: python-networking-ovn-7.3.1-1.20201114024052.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1934357 (view as bug list)
Environment:
Last Closed: 2021-03-17 15:32:20 UTC
Target Upstream Version:
Embargoed:
ffernand: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1910213 0 None None None 2021-01-05 10:16:37 UTC
OpenStack gerrit 769306 0 None MERGED ovn: Support live migration to DPDK nodes 2021-02-19 11:54:42 UTC
OpenStack gerrit 771187 0 None MERGED ovn: Support live migration to DPDK nodes 2021-02-19 11:54:41 UTC
Red Hat Product Errata RHBA-2021:0817 0 None None None 2021-03-17 15:34:06 UTC

Description yogananth subramanian 2020-10-08 02:44:48 UTC
Description of problem:
Migration timeouts waiting for event related to network-vif-plugged and VM is not migrated to new host.


Version-Release number of selected component (if applicable):
OSP puddle is:
tag: 16.1_20200930.1
OVN rpm used is:
 ovn2.13-20.06.2-11.el8fdp.x86_64

How reproducible:


Steps to Reproduce:
1.Create a VM with dpdk port, geneve port with Floating IP mapped to it.

2.Migrate VM.
openstack server migrate trex --live-migration --host      overcloud-computeovsdpdksriov-1.localdomain   --block-migration  --wait
3.

Actual results:
VM fails to migrate because of below timeout.

/var/log/containers/nova/nova-compute.log:2020-10-08 02:28:25.360 7 WARNING nova.compute.manager [-] [instance: 3bd8d309-396e-4350-9e09-088c4095b46c] Timed out waiting for events: [('network-vif-plugged', 'f54135cc-24e6-490a-9948-3f4cf1ffa553'), ('network-vif-plugged', 'e48eabde-b7f7-4642-98ce-be6bea31bb59')]. If these timeouts are a persistent issue it could mean the networking backend on host overcloud-computeovsdpdksriov-1.localdomain does not support sending these events unless there are port binding host changes which does not happen at this point in the live migration process. You may need to disable the live_migration_wait_for_vif_plug option on host overcloud-computeovsdpdksriov-1.localdomain.: eventlet.timeout.Timeout: 300 seconds


Expected results:

VM migrates to new host.
Additional info:

Comment 9 Vadim Khitrin 2021-01-04 09:29:36 UTC
Hey all,

Sorry for the long delay, I've had an issue with the setup.
A small summary of what I observed regarding this issue:
-------------------------------------------------
We spawn instance, instance is in 'ACTIVE' state:
-------------------------------------------------
(Log from node: computeovndpdksriov-0)
nova-compute.log:2021-01-03 01:43:58.281 7 INFO nova.virt.libvirt.driver [-] [instance: 498642df-7ccf-41d2-a06d-e5ac76e74e16] Instance spawned successfully.
-----------------------------------------
OVN-DPDK interface is bound successfully:
-----------------------------------------
(Log from node: computeovndpdksriov-0)
ovn-controller.log:2021-01-03T01:43:53.558Z|93760|binding|INFO|Claiming lport 4a6df7af-4347-4798-bb2d-ccbfd8c3da2d for this chassis.
ovn-controller.log:2021-01-03T01:43:53.558Z|93761|binding|INFO|4a6df7af-4347-4798-bb2d-ccbfd8c3da2d: Claiming fa:16:3e:71:36:80 20.10.114.156
ovn-metadata-agent.log:2021-01-03 01:43:53.565 26082 INFO networking_ovn.agent.metadata.agent [-] Port 4a6df7af-4347-4798-bb2d-ccbfd8c3da2d in datapath 64dd5eac-b61d-4306-8119-2f1c0e67b770 bound to our chassis
server.log:2021-01-03 01:43:51.348 28 DEBUG neutron.plugins.ml2.managers [req-9bed8ac9-ae8d-4d25-aaa8-f829050bc032 8df1067897e84cadaa5e7dcbb1988050 63f328e271e14521ad9d9721c8b3ef02 - default default] Bound port: 4a6df7af-4347-4798-bb2d-ccbfd8c3da2d, host: computeovndpdksriov-0.localdomain, vif_type: vhostuser, vif_details: {"port_filter": false, "vhostuser_mode": "server", "vhostuser_ovs_plug": true, "vhostuser_socket": "/var/lib/vhost_sockets/vhu4a6df7af-43"}, binding_levels: [{'bound_driver': 'ovn', 'bound_segment': {'id':'a06df25b-8716-43dc-b0ce-b80940cf03fa', 'network_type': 'geneve', 'physical_network': None, 'segmentation_id': 1, 'network_id': '974bea1a-cc43-453f-998a-33117725bba2'}}] _bind_port_level /usr/lib/python3.6/site-packages/neutron/plugins/ml2/managers.py:937
------------------------------------------------------------------------------------------------------------
We initiate migration of instance '498642df-7ccf-41d2-a06d-e5ac76e74e16', instance enters 'MIGRATING' state:
------------------------------------------------------------------------------------------------------------
(Log from node: controller-0)
nova-api.log:2021-01-03 01:44:50.795 23 DEBUG nova.compute.api [req-397689be-aecf-4a35-8983-03af602a1cc0 f9528afcd9dd43afb17b0c648675a7ab 63f328e271e14521ad9d9721c8b3ef02 - default default] Instance 498642df-7ccf-41d2-a06d-e5ac76e74e16 is migrating, copying events to all relevant hosts: {'computeovndpdksriov-0.localdomain', 'computeovndpdksriov-1.localdomain'} _get_relevant_hosts /usr/lib/python3.6/site-packages/nova/compute/api.py:5013
server.log:2021-01-03 01:44:40.533 29 DEBUG neutron.plugins.ml2.managers [req-a59caa6d-c265-4b7c-b937-c44eb850a9fc 8df1067897e84cadaa5e7dcbb1988050 63f328e271e14521ad9d9721c8b3ef02 - default default] Attempting to bind port 4a6df7af-4347-4798-bb2d-ccbfd8c3da2d on host computeovndpdksriov-1.localdomain for vnic_type normal with profile None bind_port /usr/lib/python3.6/site-packages/neutron/plugins/ml2/managers.py:795
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
From origin compute (computeovndpdksriov-0) we can see the following log indicating destination compute (computeovndpdksriov-1) was unable to attach network interface:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
(Log from node: computeovndpdksriov-0)
nova-compute.log:2021-01-03 01:49:50.408 7 WARNING nova.compute.manager [-] [instance: 498642df-7ccf-41d2-a06d-e5ac76e74e16] Timed out waiting for events: [('network-vif-plugged', '4a6df7af-4347-4798-bb2d-ccbfd8c3da2d')]. If these timeouts are a persistent issue it could mean the networking backend on host computeovndpdksriov-1.localdomain does not support sending these events unless there are port binding host changes which does not happen at this point in the live migration process. You may need to disable the live_migration_wait_for_vif_plug option on host computeovndpdksriov-1.localdomain.: eventlet.timeout.Timeout: 300 seconds
------------------------------------------------------------------------------------------------------
Origin compute notifies destination compute to drop the migration, instance returns to 'ACTIVE' state:
------------------------------------------------------------------------------------------------------
nova-compute.log:2021-01-03 01:49:50.657 7 DEBUG nova.compute.manager [-] [instance: 498642df-7ccf-41d2-a06d-e5ac76e74e16] Calling destination to drop move claim. _rollback_live_migration /usr/lib/python3.6/site-packages/nova/compute/manager.py:7851
--------------------------------------
Destination compute fails to bind port:
--------------------------------------
(Log from node: computeovndpdksriov-1)
ovn-controller.log:2021-01-03T01:44:45.693Z|89307|binding|INFO|Not claiming lport 4a6df7af-4347-4798-bb2d-ccbfd8c3da2d, chassis db1d0617-3361-4493-a99f-c270fa34be64 requested-chassis computeovndpdksriov-0.localdomain
nova-compute.log:2021-01-03 01:49:55.153 8 DEBUG nova.network.neutronv2.api [req-eecda5fd-ffa5-4616-95a3-40a8f3f7672d e3442d6e406a4595a319dfa4e9c000ea 453a762cc58c4bf386dbe1428631f122 - default default] [instance: 498642df-7ccf-41d2-a06d-e5ac76e74e16] Removing port 4a6df7af-4347-4798-bb2d-ccbfd8c3da2d migration profile _clear_migration_port_profile /usr/lib/python3.6/site-packages/nova/network/neutronv2/api.py:274
nova-compute.log:2021-01-03 01:49:58.066 8 DEBUG nova.network.neutronv2.api [req-eecda5fd-ffa5-4616-95a3-40a8f3f7672d e3442d6e406a4595a319dfa4e9c000ea 453a762cc58c4bf386dbe1428631f122 - default default] Deleted binding for port 4a6df7af-4347-4798-bb2d-ccbfd8c3da2d and host computeovndpdksriov-1.localdomain. delete_port_binding /usr/lib/python3.6/site-packages/nova/network/neutronv2/api.py:1341
nova-compute.log:2021-01-03 01:49:58.104 8 DEBUG nova.network.os_vif_util [req-eecda5fd-ffa5-4616-95a3-40a8f3f7672d e3442d6e406a4595a319dfa4e9c000ea 453a762cc58c4bf386dbe1428631f122 - default default] Converted object VIFVHostUser(active=False,address=fa:16:3e:71:36:80,has_traffic_filtering=False,id=4a6df7af-4347-4798-bb2d-ccbfd8c3da2d,mode='server',network=Network(974bea1a-cc43-453f-998a-33117725bba2),path='/var/lib/vhost_sockets/vhu4a6df7af-43',plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=True,vif_name='vhu4a6df7af-43') nova_to_osvif_vif /usr/lib/python3.6/site-packages/nova/network/os_vif_util.py:553
nova-compute.log:2021-01-03 01:49:58.104 8 DEBUG os_vif [req-eecda5fd-ffa5-4616-95a3-40a8f3f7672d e3442d6e406a4595a319dfa4e9c000ea 453a762cc58c4bf386dbe1428631f122 - default default] Unplugging vif VIFVHostUser(active=False,address=fa:16:3e:71:36:80,has_traffic_filtering=False,id=4a6df7af-4347-4798-bb2d-ccbfd8c3da2d,mode='server',network=Network(974bea1a-cc43-453f-998a-33117725bba2),path='/var/lib/vhost_sockets/vhu4a6df7af-43',plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=True,vif_name='vhu4a6df7af-43') unplug /usr/lib/python3.6/site-packages/os_vif/__init__.py:109
nova-compute.log:2021-01-03 01:49:58.143 8 INFO os_vif [req-eecda5fd-ffa5-4616-95a3-40a8f3f7672d e3442d6e406a4595a319dfa4e9c000ea 453a762cc58c4bf386dbe1428631f122 - default default] Successfully unplugged vif VIFVHostUser(active=False,address=fa:16:3e:71:36:80,has_traffic_filtering=False,id=4a6df7af-4347-4798-bb2d-ccbfd8c3da2d,mode='server',network=Network(974bea1a-cc43-453f-998a-33117725bba2),path='/var/lib/vhost_sockets/vhu4a6df7af-43',plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=True,vif_name='vhu4a6df7af-43')

Will be glad to share credentials for the deployment via mail.

Comment 33 Miguel Angel Nieto 2021-02-02 11:03:48 UTC
Fixed in RHOS-16.1-RHEL-8-20210129.n.
All our NFV testcase are passing, I alse verified it manually

(overcloud) [stack@undercloud-0 ~]$ openstack server list --a
+--------------------------------------+--------------------------------------------+--------+---------------------------------------+---------------------------------------+--------+
| ID                                   | Name                                       | Status | Networks                              | Image                                 | Flavor |
+--------------------------------------+--------------------------------------------+--------+---------------------------------------+---------------------------------------+--------+
| abe98744-d409-4179-a9c6-736da1b42bba | tempest-TestDpdkScenarios-server-112007100 | ACTIVE | dpdk-mgmt=10.10.10.119, 10.35.141.169 | rhel-guest-image-7-6-210-x86-64-qcow2 |        |
+--------------------------------------+--------------------------------------------+--------+---------------------------------------+---------------------------------------+--------+

(overcloud) [stack@undercloud-0 ~]$ openstack server show abe98744-d409-4179-a9c6-736da1b42bba | grep compute | sed 's/ *//g'
^[[A|OS-EXT-SRV-ATTR:host|computeovndpdksriov-1.localdomain|
|OS-EXT-SRV-ATTR:hypervisor_hostname|computeovndpdksriov-1.localdomain|


(overcloud) [stack@undercloud-0 ~]$ openstack server migrate abe98744-d409-4179-a9c6-736da1b42bba --live-migration --host computeovndpdksriov-0.localdomain --block-migration --wait
Progress: 94Complete

(overcloud) [stack@undercloud-0 ~]$ openstack server show abe98744-d409-4179-a9c6-736da1b42bba | grep compute | sed 's/ *//g'
|OS-EXT-SRV-ATTR:host|computeovndpdksriov-0.localdomain|
|OS-EXT-SRV-ATTR:hypervisor_hostname|computeovndpdksriov-0.localdomain|

Comment 42 errata-xmlrpc 2021-03-17 15:32:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.4 director bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0817

Comment 43 Red Hat Bugzilla 2023-09-15 00:49:21 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.