Bug 2185897 - live-migrated instance in MIGRATING status for a long time
Summary: live-migrated instance in MIGRATING status for a long time
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: z3
: 17.1
Assignee: Rodolfo Alonso
QA Contact: Eduardo Olivares
URL:
Whiteboard:
: 2223997 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-04-11 14:28 UTC by Eduardo Olivares
Modified: 2024-01-18 07:16 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
In ML2/OVN deployments, do not use live migration on instances that use trunk ports. On instances that use trunk ports, live migration can fail due to the flapping of the instance's subport between the Compute nodes. For instances that have trunk ports, use cold migration instead.
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1986003 0 None None None 2023-04-17 15:17:59 UTC
Launchpad 2018289 0 None None None 2023-05-18 07:35:11 UTC
OpenStack gerrit 879331 0 None MERGED Fix concurrent port binding activate 2023-08-22 10:01:01 UTC
OpenStack gerrit 887022 0 None MERGED [OVN][Trunk] Add port binding info on subport when parent is bound 2023-07-11 11:31:20 UTC
OpenStack gerrit 892889 0 None MERGED [OVN] Skip the port status UP update during a live migration 2023-09-01 13:10:19 UTC
OpenStack gerrit 892890 0 None MERGED [OVN][Trunk] Set the subports correct host during live migration 2023-09-01 13:10:19 UTC
Red Hat Bugzilla 2172572 0 high CLOSED [RHOS-17.1] Parent port intermittently remains DOWN after live migration with trunk port and OVN 2024-03-22 14:35:05 UTC
Red Hat Issue Tracker OSP-24092 0 None None None 2023-04-11 15:23:49 UTC

Description Eduardo Olivares 2023-04-11 14:28:33 UTC
Description of problem:
This bug is similar to BZ2172873. That one was closed because the tobiko migration tests were not correct. The tests have been fixed, but test_7_live_migrate_server_with_host still fails sometimes.


I have reproduced the issue manually, so I'll focus in the manual reproducer, which is simple.
Apparently, the operation only fails when the live-migrated VM has a trunked port.

A VM is created with a trunk port using the following commands:
$ openstack port create --network heat_tempestconf_network parent-trunk-port
$ openstack network trunk create --parent-port parent-trunk-port parent-trunk
$ openstack network create heat_tempestconf_network-trunk
$ openstack subnet create --network heat_tempestconf_network-trunk --subnet-range 10.111.222.0/24 heat_tempestconf_network-trunk-subnet
$ openstack port create --network heat_tempestconf_network-trunk --mac-address fa:16:3e:5c:51:b8 subport-trunk-port  # the mac corresponds with the parent-trunk-port mac
$ openstack network trunk set --subport port=subport-trunk-port,segmentation-type=vlan,segmentation-id=55 parent-trunk
$ openstack server create --flavor ubuntu --image tobiko.openstack.stacks._ubuntu.UbuntuImageFixture --port parent-trunk-port ubuntu0

The VM is successfully created and its status is ACTIVE.

Then, the following command is run to perform the live migration:
$ openstack server migrate --live-migration ubuntu0

It often fails the second time and sometimes fails the third time the live-migration operation is performed. I don't know if it is a coincidence or not, but the first live-migration never failed during my tests.
The VM remains with status=MIGRATING and task_state=migrating for hours. Then, apparently nova cancels the migration and the VM status changes to ACTIVE, but the migration has not really occured (the VM's hypervisor is the previous one before the migration).


I'm assigning this bug to component neutron because it can't be reproduced when there is no trunk port, so I assume there may be something wrong in neutron, but it could be a nova issue too.

I will provide logs in a later comment.


Version-Release number of selected component (if applicable):
RHOS-17.1-RHEL-9-20230404.n.1

How reproducible:
100% (the migration command has to be executed two or three times on the VM)

Comment 1 Eduardo Olivares 2023-04-11 15:39:24 UTC
Created attachment 1956886 [details]
neutron server logs

parent port is c0c59b6c-30ff-4f86-ac87-bfad39f8b37b
subport is 2dbc7155-9cf9-415a-a761-66eaeda76cd1
ubuntu VM is c21fb3a9-cafc-4cc0-9635-a83cbb2cfbfc

First live-migration (successful) happens around 14:03.
Second live-migration (failed) is run around 14:06.


At ctrl-2-0, logs show that migrating_to cmp-3-1 is going to happen:
2023-04-11 14:06:27.799 16 DEBUG ovsdbapp.backend.ovs_idl.transaction [req-e1b5c517-788f-4350-b362-4133656d2a48 - - - - -] Running txn n=1 command(idx=0): CheckRevisionNumberCommand(name=c0c59b6c-30ff-4f86-ac87-bfad39f8b37b, resource={'id': 'c0c59b6c-30ff-4f86-ac87-bfad39f8b37b', 'name': 'parent-trunk-port', 'network_id': '2204a95f-cd88-4c4f-a929-a5bf809fb512', 'tenant_id': '727c122779644204997e0800699adb87', 'mac_address': 'fa:16:3e:5c:51:b8', 'admin_state_up': True, 'status': 'DOWN', 'device_id': '', 'device_owner': '', 'standard_attr_id': 13373, 'fixed_ips': [{'subnet_id': 'eecf0091-89c1-4a87-911d-cf79a5909567', 'ip_address': '192.168.199.123'}], 'allowed_address_pairs': [], 'extra_dhcp_opts': [], 'security_groups': ['c5144630-8157-46e1-99b7-0605f2f5d90f'], 'description': '', 'binding:vnic_type': 'normal', 'binding:profile': {'migrating_to': 'cmp-3-1.redhat.local'}, 'binding:host_id': '', 'binding:vif_type': 'unbound', 'binding:vif_details': {}, 'qos_policy_id': None, 'qos_network_policy_id': None, 'port_security_enabled': True, 'dns_name': '', 'dns_assignment': [{'ip_address': '192.168.199.123', 'hostname': 'host-192-168-199-123', 'fqdn': 'host-192-168-199-123.redhat.local.'}], 'dns_domain': '', 'resource_request': None, 'trunk_details': {'trunk_id': 'a35c27b3-30a2-4d9f-af5a-12672add3de1', 'sub_ports': [{'segmentation_id': 55, 'segmentation_type': 'vlan', 'port_id': '2dbc7155-9cf9-415a-a761-66eaeda76cd1', 'mac_address': 'fa:16:3e:5c:51:b8'}]}, 'ip_allocation': 'immediate', 'tags': [], 'created_at': '2023-04-11T13:54:43Z', 'updated_at': '2023-04-11T14:06:27Z', 'revision_number': 18, 'project_id': '727c122779644204997e0800699adb87', 'network': {'id': '2204a95f-cd88-4c4f-a929-a5bf809fb512', 'name': 'heat_tempestconf_network', 'tenant_id': '501346e22ecc4224a3061b12271da48a', 'admin_state_up': True, 'mtu': 1442, 'status': 'ACTIVE', 'subnets': ['eecf0091-89c1-4a87-911d-cf79a5909567'], 'standard_attr_id': 380, 'shared': False, 'availability_zone_hints': [], 'availability_zones': [], 'ipv4_address_scope': None, 'ipv6_address_scope': None, 'router:external': False, 'vlan_transparent': False, 'description': '', 'qos_policy_id': None, 'port_security_enabled': True, 'dns_domain': '', 'l2_adjacency': True, 'tags': [], 'created_at': '2023-04-10T17:06:57Z', 'updated_at': '2023-04-10T17:06:58Z', 'revision_number': 2, 'project_id': '501346e22ecc4224a3061b12271da48a', 'provider:network_type': 'geneve', 'provider:physical_network': None, 'provider:segmentation_id': 60602}}, resource_type=ports, if_exists=True) do_commit /usr/lib/python3.9/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:89


At ctr-3-0, the following warning is shown:
2023-04-11 14:06:28.360 17 WARNING neutron.plugins.ml2.plugin [req-c4df7b3d-4505-46ca-8775-7be2ec4370a6 f7dbde981ce3481fbbf732006018bb6a 08a53aca4e0342ceb10d2a11c5407a07 - default default] Concurrent port binding operations failed on port c0c59b6c-30ff-4f86-ac87-bfad39f8b37b




At ctr-3-0 too, we can see the subport starts flapping between two chassis around 14:03:37 (when the first migration was completed) and never stops:
$ grep "Matched UPDATE: PortBindingChassisEvent" ctrl-3-0/server.log | grep c0c59b6c-30ff-4f86-ac87-bfad39f8b37b | head -4                                                                              
2023-04-11 14:01:36.342 16 DEBUG ovsdbapp.backend.ovs_idl.event [req-eb509e4e-6066-41fd-9b2a-ed2d4ea34ba2 - - - - -] Matched UPDATE: PortBindingChassisEvent(events=('update',), table='Port_Binding', conditions=(('type', '=', 'chassisredirect'),), old_conditions=None) to row=Port_Binding(mac=['fa:16:3e:5c:51:b8 10.111.222.4'], port_security=['fa:16:3e:5c:51:b8 10.111.222.4'], type=, nat_addresses=[], virtual_parent=[], up=[False], options={'mcast_flood_reports': 'true', 'requested-chassis': ''}, parent_port=['c0c59b6c-30ff-4f86-ac87-bfad39f8b37b'], requested_additional_chassis=[], ha_chassis_group=[], external_ids={'name': 'subport-trunk-port', 'neutron:cidrs': '10.111.222.4/24', 'neutron:device_id': '', 'neutron:device_owner': 'trunk:subport', 'neutron:network_name': 'neutron-3ab0b482-e689-44ed-a163-9aee3f0d6104', 'neutron:port_capabilities': '', 'neutron:port_name': 'subport-trunk-port', 'neutron:project_id': '727c122779644204997e0800699adb87', 'neutron:revision_number': '2', 'neutron:security_group_ids': 'c5144630-8157-46e1-99b7-0605f2f5d90f', 'neutron:subnet_pool_addr_scope4': '', 'neutron:subnet_pool_addr_scope6': '', 'neutron:vnic_type': 'normal'}, additional_chassis=[], tag=[55], additional_encap=[], encap=[], mirror_rules=[], datapath=3eccfbf0-493f-4680-bf46-074b91046462, chassis=[<ovs.db.idl.Row object at 0x7fef78678fd0>], tunnel_key=2, gateway_chassis=[], requested_chassis=[], logical_port=2dbc7155-9cf9-415a-a761-66eaeda76cd1) old=Port_Binding(chassis=[]) matches /usr/lib/python3.9/site-packages/ovsdbapp/backend/ovs_idl/event.py:43                                                                                                   
2023-04-11 14:01:36.385 16 DEBUG ovsdbapp.backend.ovs_idl.event [req-eb509e4e-6066-41fd-9b2a-ed2d4ea34ba2 - - - - -] Matched UPDATE: PortBindingChassisEvent(events=('update',), table='Port_Binding', conditions=(('type', '=', 'chassisredirect'),), old_conditions=None) to row=Port_Binding(mac=['fa:16:3e:5c:51:b8 10.111.222.4'], port_security=['fa:16:3e:5c:51:b8 10.111.222.4'], type=, nat_addresses=[], virtual_parent=[], up=[True], options={'mcast_flood_reports': 'true', 'requested-chassis': ''}, parent_port=['c0c59b6c-30ff-4f86-ac87-bfad39f8b37b'], requested_additional_chassis=[], ha_chassis_group=[], external_ids={'name': 'subport-trunk-port', 'neutron:cidrs': '10.111.222.4/24', 'neutron:device_id': '', 'neutron:device_owner': 'trunk:subport', 'neutron:network_name': 'neutron-3ab0b482-e689-44ed-a163-9aee3f0d6104', 'neutron:port_capabilities': '', 'neutron:port_name': 'subport-trunk-port', 'neutron:project_id': '727c122779644204997e0800699adb87', 'neutron:revision_number': '2', 'neutron:security_group_ids': 'c5144630-8157-46e1-99b7-0605f2f5d90f', 'neutron:subnet_pool_addr_scope4': '', 'neutron:subnet_pool_addr_scope6': '', 'neutron:vnic_type': 'normal'}, additional_chassis=[], tag=[55], additional_encap=[], encap=[], mirror_rules=[], datapath=3eccfbf0-493f-4680-bf46-074b91046462, chassis=[<ovs.db.idl.Row object at 0x7fef78678fd0>], tunnel_key=2, gateway_chassis=[], requested_chassis=[], logical_port=2dbc7155-9cf9-415a-a761-66eaeda76cd1) old=Port_Binding(up=[False]) matches /usr/lib/python3.9/site-packages/ovsdbapp/backend/ovs_idl/event.py:43                                                                                                    
2023-04-11 14:03:37.484 16 DEBUG ovsdbapp.backend.ovs_idl.event [req-eb509e4e-6066-41fd-9b2a-ed2d4ea34ba2 - - - - -] Matched UPDATE: PortBindingChassisEvent(events=('update',), table='Port_Binding', conditions=(('type', '=', 'chassisredirect'),), old_conditions=None) to row=Port_Binding(mac=['fa:16:3e:5c:51:b8 10.111.222.4'], port_security=['fa:16:3e:5c:51:b8 10.111.222.4'], type=, nat_addresses=[], virtual_parent=[], up=[True], options={'mcast_flood_reports': 'true', 'requested-chassis': ''}, parent_port=['c0c59b6c-30ff-4f86-ac87-bfad39f8b37b'], requested_additional_chassis=[], ha_chassis_group=[], external_ids={'name': 'subport-trunk-port', 'neutron:cidrs': '10.111.222.4/24', 'neutron:device_id': '', 'neutron:device_owner': 'trunk:subport', 'neutron:network_name': 'neutron-3ab0b482-e689-44ed-a163-9aee3f0d6104', 'neutron:port_capabilities': '', 'neutron:port_name': 'subport-trunk-port', 'neutron:project_id': '727c122779644204997e0800699adb87', 'neutron:revision_number': '3', 'neutron:security_group_ids': 'c5144630-8157-46e1-99b7-0605f2f5d90f', 'neutron:subnet_pool_addr_scope4': '', 'neutron:subnet_pool_addr_scope6': '', 'neutron:vnic_type': 'normal'}, additional_chassis=[], tag=[55], additional_encap=[], encap=[], mirror_rules=[], datapath=3eccfbf0-493f-4680-bf46-074b91046462, chassis=[<ovs.db.idl.Row object at 0x7fef786807f0>], tunnel_key=2, gateway_chassis=[], requested_chassis=[], logical_port=2dbc7155-9cf9-415a-a761-66eaeda76cd1) old=Port_Binding(chassis=[<ovs.db.idl.Row object at 0x7fef78678fd0>]) matches /usr/lib/python3.9/site-packages/ovsdbapp/backend/ovs_idl/event.py:43                                                           
2023-04-11 14:03:37.503 16 DEBUG ovsdbapp.backend.ovs_idl.event [req-eb509e4e-6066-41fd-9b2a-ed2d4ea34ba2 - - - - -] Matched UPDATE: PortBindingChassisEvent(events=('update',), table='Port_Binding', conditions=(('type', '=', 'chassisredirect'),), old_conditions=None) to row=Port_Binding(mac=['fa:16:3e:5c:51:b8 10.111.222.4'], port_security=['fa:16:3e:5c:51:b8 10.111.222.4'], type=, nat_addresses=[], virtual_parent=[], up=[True], options={'mcast_flood_reports': 'true', 'requested-chassis': ''}, parent_port=['c0c59b6c-30ff-4f86-ac87-bfad39f8b37b'], requested_additional_chassis=[], ha_chassis_group=[], external_ids={'name': 'subport-trunk-port', 'neutron:cidrs': '10.111.222.4/24', 'neutron:device_id': '', 'neutron:device_owner': 'trunk:subport', 'neutron:network_name': 'neutron-3ab0b482-e689-44ed-a163-9aee3f0d6104', 'neutron:port_capabilities': '', 'neutron:port_name': 'subport-trunk-port', 'neutron:project_id': '727c122779644204997e0800699adb87', 'neutron:revision_number': '3', 'neutron:security_group_ids': 'c5144630-8157-46e1-99b7-0605f2f5d90f', 'neutron:subnet_pool_addr_scope4': '', 'neutron:subnet_pool_addr_scope6': '', 'neutron:vnic_type': 'normal'}, additional_chassis=[], tag=[55], additional_encap=[], encap=[], mirror_rules=[], datapath=3eccfbf0-493f-4680-bf46-074b91046462, chassis=[<ovs.db.idl.Row object at 0x7fef78678fd0>], tunnel_key=2, gateway_chassis=[], requested_chassis=[], logical_port=2dbc7155-9cf9-415a-a761-66eaeda76cd1) old=Port_Binding(chassis=[<ovs.db.idl.Row object at 0x7fef786807f0>]) matches /usr/lib/python3.9/site-packages/ovsdbapp/backend/ovs_idl/event.py:43                                                           



$ grep "Matched UPDATE: PortBindingChassisEvent" ctrl-3-0/server.log | grep logical_port=2dbc7155-9cf9-415a-a761-66eaeda76cd1 | grep -c "^2023-04-11 14:01"                                             
2
$ grep "Matched UPDATE: PortBindingChassisEvent" ctrl-3-0/server.log | grep logical_port=2dbc7155-9cf9-415a-a761-66eaeda76cd1 | grep -c "^2023-04-11 14:02"                                             
0
$ grep "Matched UPDATE: PortBindingChassisEvent" ctrl-3-0/server.log | grep logical_port=2dbc7155-9cf9-415a-a761-66eaeda76cd1 | grep -c "^2023-04-11 14:03"                                             
11
$ grep "Matched UPDATE: PortBindingChassisEvent" ctrl-3-0/server.log | grep logical_port=2dbc7155-9cf9-415a-a761-66eaeda76cd1 | grep -c "^2023-04-11 14:04"                                             
186
$ grep "Matched UPDATE: PortBindingChassisEvent" ctrl-3-0/server.log | grep logical_port=2dbc7155-9cf9-415a-a761-66eaeda76cd1 | grep -c "^2023-04-11 14:05"                                             
238
$ grep "Matched UPDATE: PortBindingChassisEvent" ctrl-3-0/server.log | grep logical_port=2dbc7155-9cf9-415a-a761-66eaeda76cd1 | grep -c "^2023-04-11 14:05"                                             
238
$ grep "Matched UPDATE: PortBindingChassisEvent" ctrl-3-0/server.log | grep logical_port=2dbc7155-9cf9-415a-a761-66eaeda76cd1 | grep -c "^2023-04-11 14:07"                                             
11
$ grep "Matched UPDATE: PortBindingChassisEvent" ctrl-3-0/server.log | grep logical_port=2dbc7155-9cf9-415a-a761-66eaeda76cd1 | grep -c "^2023-04-11 14:08"                                             
220
$ grep "Matched UPDATE: PortBindingChassisEvent" ctrl-3-0/server.log | grep logical_port=2dbc7155-9cf9-415a-a761-66eaeda76cd1 | grep -c "^2023-04-11 14:09"                                             
240
$ grep "Matched UPDATE: PortBindingChassisEvent" ctrl-3-0/server.log | grep logical_port=2dbc7155-9cf9-415a-a761-66eaeda76cd1 | grep -c "^2023-04-11 14:10"                                             
240

Comment 41 Rodolfo Alonso 2023-10-10 09:10:58 UTC
*** Bug 2223997 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.