This bug was initially created as a copy of Bug #1903653 I am copying this bug because: The root cause of the bug is that ovn does not support live migrations without packet loss. ovn is incpalable of modeling that a port can exist on 2 chassis at once or modeling that it is live migrating form one host to another. this cannot be fix by a change to neutron or nova. This bug is tracking a workaround to reannounce the VM on the destination after the initial packets are lost. Description of problem: Customer is migration one instance by command openstack server migrate --live-migration --host and he is observing a bunch of packets lost (using ping) during migration proccess Migration ends succesfully Version-Release number of selected component (if applicable): Distro: [redhat-release] Red Hat Enterprise Linux release 8.2 (Ootpa) [rhosp-release] Red Hat OpenStack Platform release 16.1.0 GA (Train) [os-release] Red Hat Enterprise Linux 8.2 (Ootpa) 8.2 (Ootpa) network-scripts-openvswitch2.13-2.13.0-39.el8fdp.x86_64 Wed Jul 22 11:38:29 2020 openvswitch2.13-2.13.0-39.el8fdp.x86_64 Wed Jul 22 11:39:05 2020 openvswitch-selinux-extra-policy-1.0-22.el8fdp.noarch Wed Jul 22 11:38:33 2020 rhosp-openvswitch-2.13-8.el8ost.noarch Wed Jul 22 11:39:46 2020 puppet-nova-15.5.1-0.20200608173427.c15e37c.el8ost.noarch Wed Jul 22 11:39:28 2020 python3-novaclient-15.1.0-0.20200310162625.cd396b8.el8ost.noarch Wed Jul 22 11:34:50 2020 puppet-neutron-15.5.1-0.20200514103419.0a45ec7.el8ost.noarch Wed Jul 22 11:39:29 2020 python3-neutronclient-6.14.0-0.20200310192910.115f60f.el8ost.noarch Wed Jul 22 11:35:09 2020 How reproducible: Create a new instance and move from one compute node to another Steps to Reproduce: 1. Create instance 2. Migrate instance 3. Packet lost during time migration 4 Instance Migrated Actual results: Instance migrated with packet lost Expected results: Instance migrated without packet lost Additional info: instance ID : cf6dc934-4cd6-4195-8a81-54201bb2c4b7 request : req-f5910bd3-2063-464b-8c1d-c00f8d4e423c /ovn_controller.log 2020-12-02T10:47:55.031474209+01:00 stderr F 2020-12-02T09:47:55Z|00221|binding|INFO|Dropped 22 log messages in last 18 seconds (most recently, 18 seconds ago) due to excessive rate 2020-12-02T10:47:55.031554442+01:00 stderr F 2020-12-02T09:47:55Z|00222|binding|INFO|Not claiming lport 428caa60-f0cc-4b3e-878f-f048b28173ba, chassis xxxx requested-chassis <hostname>.<domain> ovs-vswitchd.log 2020-12-02T02:22:01.692Z|00212|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log 2020-12-02T09:45:20.809Z|00213|bridge|INFO|bridge br-int: added interface tap428caa60-f0 on port 37 2020-12-02T09:45:20.818Z|00214|bridge|INFO|bridge br-int: added interface patch-br-int-to-provnet-ad580e33-e2a2-4f33-85f3-3b699b158e88 on port 38 2020-12-02T09:45:20.818Z|00215|bridge|INFO|bridge br-ex: added interface patch-provnet-ad580e33-e2a2-4f33-85f3-3b699b158e88-to-br-int on port 12 2020-12-02T09:45:30.821Z|00216|connmgr|INFO|br-int<->unix#0: 365 flow_mods 10 s ago (365 adds) 2020-12-02T09:46:30.821Z|00217|connmgr|INFO|br-int<->unix#0: 6 flow_mods 59 s ago (6 adds) 2020-12-02T09:47:58.869Z|00218|bridge|INFO|bridge br-int: added interface tap28039bf1-30 on port 39 2020-12-02T09:48:05.040Z|00219|connmgr|INFO|br-int<->unix#0: 23 flow_mods in the 4 s starting 10 s ago (20 adds, 2 deletes, 1 modifications) 2020-12-02T09:53:55.995Z|00220|timeval|WARN|Unreasonably long 1485ms poll interval (1ms user, 195ms system) 2020-12-02T09:53:55.995Z|00221|timeval|WARN|context switches: 1 voluntary, 0 involuntary ovs-vswitchd.log 2020-12-02T02:43:01.816Z|00200|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log 2020-12-02T09:25:54.293Z|00201|bridge|INFO|bridge br-int: added interface <span style="background-color:#FCE94F"><font color="#2E3436">tap428caa60</font></span>-f0 on port 34 2020-12-02T09:25:54.305Z|00202|bridge|INFO|bridge br-int: added interface patch-br-int-to-provnet-ad580e33-e2a2-4f33-85f3-3b699b158e88 on port 35 2020-12-02T09:25:54.305Z|00203|bridge|INFO|bridge br-ex: added interface patch-provnet-ad580e33-e2a2-4f33-85f3-3b699b158e88-to-br-int on port 11 2020-12-02T09:25:55.100Z|00204|bridge|INFO|bridge br-int: added interface tap28039bf1-30 on port 36 2020-12-02T09:26:04.300Z|00205|connmgr|INFO|br-int<->unix#0: 387 flow_mods in the 1 s starting 10 s ago (386 adds, 1 modifications) 2020-12-02T09:31:29.732Z|00206|timeval|WARN|Unreasonably long 3136ms poll interval (0ms user, 124ms system) 2020-12-02T09:31:29.732Z|00207|timeval|WARN|context switches: 2 voluntary, 0 involuntary </pre> nova-compute.log 22020-12-02 10:47:59.619 8 DEBUG nova.network.neutronv2.api [req-f5910bd3-2063-464b-8c1d-c00f8d4e423c 967706eb9b764d738d231e6c018c6fc8 f131d5d7dd2b4bb78179fef91daf95f8 - default default] Deleted binding for port 428caa60-f0cc-4b3e-878f-f048b28173ba and host <hostname>.<domain>. delete_port_binding /usr/lib/python3.6/site-packages/nova/network/neutronv2/api.py:1341 2020-12-02 10:48:00.543 8 DEBUG nova.compute.manager [req-53dc026d-dcaf-47ad-b967-e0169c83a340 3d8b5ccbccc148a6a5ea6ccdd3b3f778 9d8322178b4d4d0093297a01615857a5 - default default] [instance: cf6dc934-4cd6-4195-8a81-54201bb2c4b7] Received event network-vif-plugged-428caa60-f0cc-4b3e-878f-f048b28173ba external_instance_event /usr/lib/python3.6/site-packages/nova/compute/manager.py:9272 2020-12-02 10:48:00.543 8 DEBUG oslo_concurrency.lockutils [req-53dc026d-dcaf-47ad-b967-e0169c83a340 3d8b5ccbccc148a6a5ea6ccdd3b3f778 9d8322178b4d4d0093297a01615857a5 - default default] Lock "cf6dc934-4cd6-4195-8a81-54201bb2c4b7-events" acquired by "nova.compute.manager.InstanceEvents.pop_instance_event.<locals>._pop_event" :: waited 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:327 2020-12-02 10:48:00.543 8 DEBUG oslo_concurrency.lockutils [req-53dc026d-dcaf-47ad-b967-e0169c83a340 3d8b5ccbccc148a6a5ea6ccdd3b3f778 9d8322178b4d4d0093297a01615857a5 - default default] Lock "cf6dc934-4cd6-4195-8a81-54201bb2c4b7-events" released by "nova.compute.manager.InstanceEvents.pop_instance_event.<locals>._pop_event" :: held 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:339 2020-12-02 10:48:00.544 8 DEBUG nova.compute.manager [req-53dc026d-dcaf-47ad-b967-e0169c83a340 3d8b5ccbccc148a6a5ea6ccdd3b3f778 9d8322178b4d4d0093297a01615857a5 - default default] [instance: cf6dc934-4cd6-4195-8a81-54201bb2c4b7] No waiting events found dispatching network-vif-plugged-428caa60-f0cc-4b3e-878f-f048b28173ba pop_instance_event /usr/lib/python3.6/site-packages/nova/compute/manager.py:361 2020-12-02 10:48:00.544 8 WARNING nova.compute.manager [req-53dc026d-dcaf-47ad-b967-e0169c83a340 3d8b5ccbccc148a6a5ea6ccdd3b3f778 9d8322178b4d4d0093297a01615857a5 - default default] [instance: cf6dc934-4cd6-4195-8a81-54201bb2c4b7] Received unexpected event network-vif-plugged-428caa60-f0cc-4b3e-878f-f048b28173ba for instance with vm_state active and task_state None. 2020-12-02 10:48:02.601 8 DEBUG nova.compute.manager [req-ce71e2ec-4625-4c7a-be27-5849e945b292 3d8b5ccbccc148a6a5ea6ccdd3b3f778 9d8322178b4d4d0093297a01615857a5 - default default] [instance: cf6dc934-4cd6-4195-8a81-54201bb2c4b7] Received event network-vif-plugged-428caa60-f0cc-4b3e-878f-f048b28173ba external_instance_event /usr/lib/python3.6/site-packages/nova/compute/manager.py:9272 2020-12-02 10:48:02.602 8 DEBUG oslo_concurrency.lockutils [req-ce71e2ec-4625-4c7a-be27-5849e945b292 3d8b5ccbccc148a6a5ea6ccdd3b3f778 9d8322178b4d4d0093297a01615857a5 - default default] Lock "cf6dc934-4cd6-4195-8a81-54201bb2c4b7-events" acquired by "nova.compute.manager.InstanceEvents.pop_instance_event.<locals>._pop_event" :: waited 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:327 2020-12-02 10:48:02.602 8 DEBUG oslo_concurrency.lockutils [req-ce71e2ec-4625-4c7a-be27-5849e945b292 3d8b5ccbccc148a6a5ea6ccdd3b3f778 9d8322178b4d4d0093297a01615857a5 - default default] Lock "cf6dc934-4cd6-4195-8a81-54201bb2c4b7-events" released by "nova.compute.manager.InstanceEvents.pop_instance_event.<locals>._pop_event" :: held 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:339 2020-12-02 10:48:02.602 8 DEBUG nova.compute.manager [req-ce71e2ec-4625-4c7a-be27-5849e945b292 3d8b5ccbccc148a6a5ea6ccdd3b3f778 9d8322178b4d4d0093297a01615857a5 - default default] [instance: cf6dc934-4cd6-4195-8a81-54201bb2c4b7] No waiting events found dispatching network-vif-plugged-428caa60-f0cc-4b3e-878f-f048b28173ba pop_instance_event /usr/lib/python3.6/site-packages/nova/compute/manager.py:361 2020-12-02 10:48:02.603 8 WARNING nova.compute.manager [req-ce71e2ec-4625-4c7a-be27-5849e945b292 3d8b5ccbccc148a6a5ea6ccdd3b3f778 9d8322178b4d4d0093297a01615857a5 - default default] [instance: cf6dc934-4cd6-4195-8a81-54201bb2c4b7] Received unexpected event network-vif-plugged-428caa60-f0cc-4b3e-878f-f048b28173ba for instance with vm_state active and task_state None. 2020-12-02 10:48:02.871 8 DEBUG oslo_service.periodic_task [req-af3534f6-3452-4cdf-91c4-4356e6c7b13c - - - - -] Running periodic task ComputeManager._poll_rebooting_instances run_periodic_tasks /usr/lib/python3.6/site-packages/oslo_service/periodic_task.py:217 /var/log/messages Dec 2 10:45:20 <hostname> ovs-vsctl[62116]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=5 -- --if-exists del-port tap428caa60-f0 -- add-port br-int tap428caa60-f0 -- set I:q! nterface tap428caa60-f0 "external-ids:attached-mac=\"fa:16:3e:b8:6a:9f\"" -- set Interface tap428caa60-f0 "external-ids:iface-id=\"428caa60-f0cc-4b3e-878f-f048b28173ba\"" -- set Interface tap428caa60-f0 "external-ids:vm-id=\"cf6dc934-4cd6-4195-8a81-54201bb2c4b7\"" -- set Interface tap428caa60-f0 external-ids:iface-status=active Dec 2 10:45:20 <hostname> NetworkManager[8361]: <info> [1606902320.8099] device (tap428caa60-f0): enslaved to non-master-type device ovs-system; ignoring Dec 2 10:45:20 <hostname> kernel: device tap428caa60-f0 entered promiscuous mode Dec 2 10:45:20 <hostname> NetworkManager[8361]: <info> [1606902320.8125] device (tap428caa60-f0): state change: unmanaged -> unavailable (reason 'connection-assumed', sys-iface-state: 'external') Dec 2 10:45:20 <hostname> NetworkManager[8361]: <info> [1606902320.8135] device (tap428caa60-f0): enslaved to non-master-type device ovs-system; ignoring Dec 2 10:45:20 <hostname> NetworkManager[8361]: <info> [1606902320.8137] device (tap428caa60-f0): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'external') var/log/messages Dec 2 10:25:54 <hostname> systemd[1]: Starting nova_libvirt healthcheck... Dec 2 10:25:54 <hostname> systemd[1]: Starting ovn_metadata_agent healthcheck... Dec 2 10:25:54 <hostname> NetworkManager[6738]: <info> [1606901154.2830] manager: (tap428caa60-f0): new Tun device (/org/freedesktop/NetworkManager/Devices/43) Dec 2 10:25:54 <hostname> systemd-udevd[43585]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable. Dec 2 10:25:54 <hostname> ovs-vsctl[43582]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=5 -- --if-exists del-port tap428caa60-f0 -- add-port br-int tap428caa60-f0 -- set Interface tap428caa60-f0 "external-ids:attached-mac=\"fa:16:3e:b8:6a:9f\"" -- set Interface tap428caa60-f0 "external-ids:iface-id=\"428caa60-f0cc-4b3e-878f-f048b28173ba\"" -- set Interface tap428caa60-f0 "external-ids:vm-id=\"cf6dc934-4cd6-4195-8a81-54201bb2c4b7\"" -- set Interface tap428caa60-f0 external-ids:iface-status=active Dec 2 10:25:54 <hostname> kernel: device tap428caa60-f0 entered promiscuous mode Dec 2 10:25:54 <hostname> NetworkManager[6738]: <info> [1606901154.2937] device (tap428caa60-f0): enslaved to non-master-type device ovs-system; ignoring Dec 2 10:25:54 <hostname> NetworkManager[6738]: <info> [1606901154.2981] device (tap428caa60-f0): state change: unmanaged -> unavailable (reason 'connection-assumed', sys-iface-state: 'external') Dec 2 10:25:54 <hostname> NetworkManager[6738]: <info> [1606901154.2990] device (tap428caa60-f0): enslaved to non-master-type device ovs-system; ignoring Dec 2 10:25:54 <hostname> NetworkManager[6738]: <info> [1606901154.2991] device (tap428caa60-f0): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'external')
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.3 (Train)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:4793
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days