Bug 2042163 - Instance live-migration observes ping lost OVN
Summary: Instance live-migration observes ping lost OVN
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 16.2 (Train)
Hardware: All
OS: Linux
urgent
high
Target Milestone: z3
: 16.2 (Train on RHEL 8.4)
Assignee: smooney
QA Contact: James Parker
URL:
Whiteboard:
Depends On: 2042165
Blocks: 2042162 2089408
TreeView+ depends on / blocked
 
Reported: 2022-01-18 21:53 UTC by smooney
Modified: 2023-09-15 01:51 UTC (History)
14 users (show)

Fixed In Version: openstack-nova-20.6.2-2.20220307124916.a5da31e.el8osttrunk
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2089408 (view as bug list)
Environment:
Last Closed: 2022-06-22 16:03:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 825177 0 None NEW libvirt: Add announce-self post live-migration workaround 2023-02-27 20:50:49 UTC
Red Hat Issue Tracker OSP-12198 0 None None None 2022-01-18 21:57:46 UTC
Red Hat Product Errata RHBA-2022:4793 0 None None None 2022-06-22 16:03:57 UTC

Description smooney 2022-01-18 21:53:33 UTC
This bug was initially created as a copy of Bug #1903653

I am copying this bug because: 

The root cause of the bug is that ovn does not support live migrations
without packet loss. ovn is incpalable of modeling that a port can exist
on 2 chassis at once or modeling that it is live migrating form one host to another. this cannot be fix by a change to neutron or nova.
This bug is tracking a workaround to reannounce the VM on the destination after
the initial packets are lost.


Description of problem:

Customer is migration one instance by command openstack server migrate --live-migration --host  and he is observing a bunch of packets lost (using ping) during migration proccess
Migration ends succesfully

Version-Release number of selected component (if applicable):


  Distro:   [redhat-release] Red Hat Enterprise Linux release 8.2 (Ootpa)
            [rhosp-release] Red Hat OpenStack Platform release 16.1.0 GA (Train)
            [os-release] Red Hat Enterprise Linux 8.2 (Ootpa) 8.2 (Ootpa)


network-scripts-openvswitch2.13-2.13.0-39.el8fdp.x86_64     Wed Jul 22 11:38:29 2020
openvswitch2.13-2.13.0-39.el8fdp.x86_64                     Wed Jul 22 11:39:05 2020
openvswitch-selinux-extra-policy-1.0-22.el8fdp.noarch       Wed Jul 22 11:38:33 2020
rhosp-openvswitch-2.13-8.el8ost.noarch                      Wed Jul 22 11:39:46 2020
puppet-nova-15.5.1-0.20200608173427.c15e37c.el8ost.noarch   Wed Jul 22 11:39:28 2020
python3-novaclient-15.1.0-0.20200310162625.cd396b8.el8ost.noarch Wed Jul 22 11:34:50 2020
puppet-neutron-15.5.1-0.20200514103419.0a45ec7.el8ost.noarch Wed Jul 22 11:39:29 2020
python3-neutronclient-6.14.0-0.20200310192910.115f60f.el8ost.noarch Wed Jul 22 11:35:09 2020



How reproducible:

Create a new instance and move from one compute node to another


Steps to Reproduce:
1. Create instance
2. Migrate instance
3. Packet lost during time migration
4 Instance Migrated

Actual results:

Instance migrated with packet lost

Expected results:

Instance migrated without packet lost

Additional info:

instance ID : cf6dc934-4cd6-4195-8a81-54201bb2c4b7
request : req-f5910bd3-2063-464b-8c1d-c00f8d4e423c 

/ovn_controller.log

2020-12-02T10:47:55.031474209+01:00 stderr F 2020-12-02T09:47:55Z|00221|binding|INFO|Dropped 22 log messages in last 18 seconds (most recently, 18 seconds ago) due to excessive rate
2020-12-02T10:47:55.031554442+01:00 stderr F 2020-12-02T09:47:55Z|00222|binding|INFO|Not claiming lport 428caa60-f0cc-4b3e-878f-f048b28173ba, chassis xxxx requested-chassis <hostname>.<domain>

ovs-vswitchd.log

2020-12-02T02:22:01.692Z|00212|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log
2020-12-02T09:45:20.809Z|00213|bridge|INFO|bridge br-int: added interface tap428caa60-f0 on port 37
2020-12-02T09:45:20.818Z|00214|bridge|INFO|bridge br-int: added interface patch-br-int-to-provnet-ad580e33-e2a2-4f33-85f3-3b699b158e88 on port 38
2020-12-02T09:45:20.818Z|00215|bridge|INFO|bridge br-ex: added interface patch-provnet-ad580e33-e2a2-4f33-85f3-3b699b158e88-to-br-int on port 12
2020-12-02T09:45:30.821Z|00216|connmgr|INFO|br-int<->unix#0: 365 flow_mods 10 s ago (365 adds)
2020-12-02T09:46:30.821Z|00217|connmgr|INFO|br-int<->unix#0: 6 flow_mods 59 s ago (6 adds)
2020-12-02T09:47:58.869Z|00218|bridge|INFO|bridge br-int: added interface tap28039bf1-30 on port 39
2020-12-02T09:48:05.040Z|00219|connmgr|INFO|br-int<->unix#0: 23 flow_mods in the 4 s starting 10 s ago (20 adds, 2 deletes, 1 modifications)
2020-12-02T09:53:55.995Z|00220|timeval|WARN|Unreasonably long 1485ms poll interval (1ms user, 195ms system)
2020-12-02T09:53:55.995Z|00221|timeval|WARN|context switches: 1 voluntary, 0 involuntary

ovs-vswitchd.log

2020-12-02T02:43:01.816Z|00200|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log
2020-12-02T09:25:54.293Z|00201|bridge|INFO|bridge br-int: added interface <span style="background-color:#FCE94F"><font color="#2E3436">tap428caa60</font></span>-f0 on port 34
2020-12-02T09:25:54.305Z|00202|bridge|INFO|bridge br-int: added interface patch-br-int-to-provnet-ad580e33-e2a2-4f33-85f3-3b699b158e88 on port 35
2020-12-02T09:25:54.305Z|00203|bridge|INFO|bridge br-ex: added interface patch-provnet-ad580e33-e2a2-4f33-85f3-3b699b158e88-to-br-int on port 11
2020-12-02T09:25:55.100Z|00204|bridge|INFO|bridge br-int: added interface tap28039bf1-30 on port 36
2020-12-02T09:26:04.300Z|00205|connmgr|INFO|br-int&lt;-&gt;unix#0: 387 flow_mods in the 1 s starting 10 s ago (386 adds, 1 modifications)
2020-12-02T09:31:29.732Z|00206|timeval|WARN|Unreasonably long 3136ms poll interval (0ms user, 124ms system)
2020-12-02T09:31:29.732Z|00207|timeval|WARN|context switches: 2 voluntary, 0 involuntary
</pre>

nova-compute.log

22020-12-02 10:47:59.619 8 DEBUG nova.network.neutronv2.api [req-f5910bd3-2063-464b-8c1d-c00f8d4e423c 967706eb9b764d738d231e6c018c6fc8 f131d5d7dd2b4bb78179fef91daf95f8 - default default] Deleted binding for port 428caa60-f0cc-4b3e-878f-f048b28173ba and host <hostname>.<domain>. delete_port_binding /usr/lib/python3.6/site-packages/nova/network/neutronv2/api.py:1341
2020-12-02 10:48:00.543 8 DEBUG nova.compute.manager [req-53dc026d-dcaf-47ad-b967-e0169c83a340 3d8b5ccbccc148a6a5ea6ccdd3b3f778 9d8322178b4d4d0093297a01615857a5 - default default] [instance: cf6dc934-4cd6-4195-8a81-54201bb2c4b7] Received event network-vif-plugged-428caa60-f0cc-4b3e-878f-f048b28173ba external_instance_event /usr/lib/python3.6/site-packages/nova/compute/manager.py:9272
2020-12-02 10:48:00.543 8 DEBUG oslo_concurrency.lockutils [req-53dc026d-dcaf-47ad-b967-e0169c83a340 3d8b5ccbccc148a6a5ea6ccdd3b3f778 9d8322178b4d4d0093297a01615857a5 - default default] Lock "cf6dc934-4cd6-4195-8a81-54201bb2c4b7-events" acquired by "nova.compute.manager.InstanceEvents.pop_instance_event.<locals>._pop_event" :: waited 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:327
2020-12-02 10:48:00.543 8 DEBUG oslo_concurrency.lockutils [req-53dc026d-dcaf-47ad-b967-e0169c83a340 3d8b5ccbccc148a6a5ea6ccdd3b3f778 9d8322178b4d4d0093297a01615857a5 - default default] Lock "cf6dc934-4cd6-4195-8a81-54201bb2c4b7-events" released by "nova.compute.manager.InstanceEvents.pop_instance_event.<locals>._pop_event" :: held 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:339
2020-12-02 10:48:00.544 8 DEBUG nova.compute.manager [req-53dc026d-dcaf-47ad-b967-e0169c83a340 3d8b5ccbccc148a6a5ea6ccdd3b3f778 9d8322178b4d4d0093297a01615857a5 - default default] [instance: cf6dc934-4cd6-4195-8a81-54201bb2c4b7] No waiting events found dispatching network-vif-plugged-428caa60-f0cc-4b3e-878f-f048b28173ba pop_instance_event /usr/lib/python3.6/site-packages/nova/compute/manager.py:361
2020-12-02 10:48:00.544 8 WARNING nova.compute.manager [req-53dc026d-dcaf-47ad-b967-e0169c83a340 3d8b5ccbccc148a6a5ea6ccdd3b3f778 9d8322178b4d4d0093297a01615857a5 - default default] [instance: cf6dc934-4cd6-4195-8a81-54201bb2c4b7] Received unexpected event network-vif-plugged-428caa60-f0cc-4b3e-878f-f048b28173ba for instance with vm_state active and task_state None.
2020-12-02 10:48:02.601 8 DEBUG nova.compute.manager [req-ce71e2ec-4625-4c7a-be27-5849e945b292 3d8b5ccbccc148a6a5ea6ccdd3b3f778 9d8322178b4d4d0093297a01615857a5 - default default] [instance: cf6dc934-4cd6-4195-8a81-54201bb2c4b7] Received event network-vif-plugged-428caa60-f0cc-4b3e-878f-f048b28173ba external_instance_event /usr/lib/python3.6/site-packages/nova/compute/manager.py:9272
2020-12-02 10:48:02.602 8 DEBUG oslo_concurrency.lockutils [req-ce71e2ec-4625-4c7a-be27-5849e945b292 3d8b5ccbccc148a6a5ea6ccdd3b3f778 9d8322178b4d4d0093297a01615857a5 - default default] Lock "cf6dc934-4cd6-4195-8a81-54201bb2c4b7-events" acquired by "nova.compute.manager.InstanceEvents.pop_instance_event.<locals>._pop_event" :: waited 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:327
2020-12-02 10:48:02.602 8 DEBUG oslo_concurrency.lockutils [req-ce71e2ec-4625-4c7a-be27-5849e945b292 3d8b5ccbccc148a6a5ea6ccdd3b3f778 9d8322178b4d4d0093297a01615857a5 - default default] Lock "cf6dc934-4cd6-4195-8a81-54201bb2c4b7-events" released by "nova.compute.manager.InstanceEvents.pop_instance_event.<locals>._pop_event" :: held 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:339
2020-12-02 10:48:02.602 8 DEBUG nova.compute.manager [req-ce71e2ec-4625-4c7a-be27-5849e945b292 3d8b5ccbccc148a6a5ea6ccdd3b3f778 9d8322178b4d4d0093297a01615857a5 - default default] [instance: cf6dc934-4cd6-4195-8a81-54201bb2c4b7] No waiting events found dispatching network-vif-plugged-428caa60-f0cc-4b3e-878f-f048b28173ba pop_instance_event /usr/lib/python3.6/site-packages/nova/compute/manager.py:361
2020-12-02 10:48:02.603 8 WARNING nova.compute.manager [req-ce71e2ec-4625-4c7a-be27-5849e945b292 3d8b5ccbccc148a6a5ea6ccdd3b3f778 9d8322178b4d4d0093297a01615857a5 - default default] [instance: cf6dc934-4cd6-4195-8a81-54201bb2c4b7] Received unexpected event network-vif-plugged-428caa60-f0cc-4b3e-878f-f048b28173ba for instance with vm_state active and task_state None.
2020-12-02 10:48:02.871 8 DEBUG oslo_service.periodic_task [req-af3534f6-3452-4cdf-91c4-4356e6c7b13c - - - - -] Running periodic task ComputeManager._poll_rebooting_instances run_periodic_tasks /usr/lib/python3.6/site-packages/oslo_service/periodic_task.py:217

/var/log/messages

Dec  2 10:45:20 <hostname> ovs-vsctl[62116]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=5 -- --if-exists del-port tap428caa60-f0 -- add-port br-int tap428caa60-f0 -- set I:q!
nterface tap428caa60-f0 "external-ids:attached-mac=\"fa:16:3e:b8:6a:9f\"" -- set Interface tap428caa60-f0 "external-ids:iface-id=\"428caa60-f0cc-4b3e-878f-f048b28173ba\"" -- set Interface tap428caa60-f0 "external-ids:vm-id=\"cf6dc934-4cd6-4195-8a81-54201bb2c4b7\"" -- set Interface tap428caa60-f0 external-ids:iface-status=active
Dec  2 10:45:20 <hostname> NetworkManager[8361]: <info>  [1606902320.8099] device (tap428caa60-f0): enslaved to non-master-type device ovs-system; ignoring
Dec  2 10:45:20 <hostname> kernel: device tap428caa60-f0 entered promiscuous mode
Dec  2 10:45:20 <hostname> NetworkManager[8361]: <info>  [1606902320.8125] device (tap428caa60-f0): state change: unmanaged -> unavailable (reason 'connection-assumed', sys-iface-state: 'external')
Dec  2 10:45:20 <hostname> NetworkManager[8361]: <info>  [1606902320.8135] device (tap428caa60-f0): enslaved to non-master-type device ovs-system; ignoring
Dec  2 10:45:20 <hostname> NetworkManager[8361]: <info>  [1606902320.8137] device (tap428caa60-f0): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'external')


var/log/messages


Dec  2 10:25:54 <hostname> systemd[1]: Starting nova_libvirt healthcheck...
Dec  2 10:25:54 <hostname> systemd[1]: Starting ovn_metadata_agent healthcheck...
Dec  2 10:25:54 <hostname> NetworkManager[6738]: <info>  [1606901154.2830] manager: (tap428caa60-f0): new Tun device (/org/freedesktop/NetworkManager/Devices/43)
Dec  2 10:25:54 <hostname> systemd-udevd[43585]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Dec  2 10:25:54 <hostname> ovs-vsctl[43582]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=5 -- --if-exists del-port tap428caa60-f0 -- add-port br-int tap428caa60-f0 -- set Interface tap428caa60-f0 "external-ids:attached-mac=\"fa:16:3e:b8:6a:9f\"" -- set Interface tap428caa60-f0 "external-ids:iface-id=\"428caa60-f0cc-4b3e-878f-f048b28173ba\"" -- set Interface tap428caa60-f0 "external-ids:vm-id=\"cf6dc934-4cd6-4195-8a81-54201bb2c4b7\"" -- set Interface tap428caa60-f0 external-ids:iface-status=active
Dec  2 10:25:54 <hostname> kernel: device tap428caa60-f0 entered promiscuous mode
Dec  2 10:25:54 <hostname> NetworkManager[6738]: <info>  [1606901154.2937] device (tap428caa60-f0): enslaved to non-master-type device ovs-system; ignoring
Dec  2 10:25:54 <hostname> NetworkManager[6738]: <info>  [1606901154.2981] device (tap428caa60-f0): state change: unmanaged -> unavailable (reason 'connection-assumed', sys-iface-state: 'external')
Dec  2 10:25:54 <hostname> NetworkManager[6738]: <info>  [1606901154.2990] device (tap428caa60-f0): enslaved to non-master-type device ovs-system; ignoring
Dec  2 10:25:54 <hostname> NetworkManager[6738]: <info>  [1606901154.2991] device (tap428caa60-f0): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'external')

Comment 34 errata-xmlrpc 2022-06-22 16:03:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.3 (Train)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:4793

Comment 35 Red Hat Bugzilla 2023-09-15 01:51:09 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days


Note You need to log in before you can comment on or make changes to this bug.