Bug 2068065
Summary: | tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOps.test_server_connectivity_live_migration failing | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Eran Kuris <ekuris> | |
Component: | python-networking-ovn | Assignee: | Slawek Kaplonski <skaplons> | |
Status: | CLOSED ERRATA | QA Contact: | Bharath M V <bmv> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 16.1 (Train) | CC: | apevec, dceara, egarciar, i.maximets, lhh, majopela, mtomaska, scohen | |
Target Milestone: | z9 | Keywords: | AutomationBlocker, Triaged | |
Target Release: | 16.1 (Train on RHEL 8.2) | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | python-networking-ovn-7.3.1-1.20220502142031.4e24f4c.el8ost | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2078544 (view as bug list) | Environment: | ||
Last Closed: | 2022-12-07 20:26:20 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2078544 |
Description
Eran Kuris
2022-03-24 11:38:38 UTC
I took a look at this issue today with Rodolfo and we think that the issue may be in the core OVN. Here's what we found: - Issue happened with instance c7d85029-d347-4661-8dc5-ca2cb451490a which has port "0dea4060-6ef4-454c-ba1f-7e72caf4ff50" (neutron port uuid). - instance was first started on compute-1 and that was done without problems. Here's log from the compute-1: ovs-vswitchd.log 2022-03-16T17:24:04.864Z|00145|bridge|INFO|bridge br-int: added interface tap0dea4060-6e on port 41 ovn-controller.log 2022-03-16T17:24:04.867Z|00101|binding|INFO|Claiming lport 0dea4060-6ef4-454c-ba1f-7e72caf4ff50 for this chassis. 2022-03-16T17:24:04.867Z|00102|binding|INFO|0dea4060-6ef4-454c-ba1f-7e72caf4ff50: Claiming fa:16:3e:79:44:48 10.100.0.3 - next there was live migration of that vm to the compute-0 Port was added by nova to the br-int: 2022-03-16T17:25:28.021Z|00173|bridge|INFO|bridge br-int: added interface tap0dea4060-6e on port 48 but it was not claimed by that chassis: 2022-03-16T17:25:28.024Z|00264|binding|INFO|Not claiming lport 0dea4060-6ef4-454c-ba1f-7e72caf4ff50, chassis 536843b0-df62-441d-9ab4-e11743969074 requested-chassis compute-1.redhat.local This seems to be fine because it wasn't yet released by the compute-1 chassis but that happened 2 seconds later: 2022-03-16T17:25:30.666Z|00252|binding|INFO|Releasing lport 0dea4060-6ef4-454c-ba1f-7e72caf4ff50 from this chassis. And even then, ovn-controller on compute-0 couldn't claim lport: 2022-03-16T17:26:23.068Z|00270|binding|INFO|Not claiming lport 0dea4060-6ef4-454c-ba1f-7e72caf4ff50, chassis 536843b0-df62-441d-9ab4-e11743969074 requested-chassis compute-1.redhat.local That's all regarding that port what we found in the ovn-controllers on both nodes. In Neutron we can see LogicalSwitchPortDown event after port was deleted from compute-1: 2022-03-16 17:25:30.725 28 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: LogicalSwitchPortUpdateDownEvent(events=('update',), table='Logical_Switch_Port', conditions=(('up', '=', False),), old_conditions=(('up', '=', True),)) to row=Logical_Switch_Port(name=0dea4060-6ef4-454c-ba1f-7e72caf4ff50, up=[False], external_ids={'neutron:cidrs': '10.100.0.3/26', 'neutron:device_id': 'c7d85029-d347-4661-8dc5-ca2cb451490a', 'neutron:device_owner': 'compute:nova', 'neutron:network_name': 'neutron-df2a78e9-94cb-4d56-8874-1a997a2ab3d5', 'neutron:port_name': '', 'neutron:project_id': '428d480e704b4412aa47a08b0fc86265', 'neutron:revision_number': '8', 'neutron:security_group_ids': '4669e182-ec03-43ba-8eae-f61c47b1d788', 'neutron:port_fip': '10.0.0.245'}, ha_chassis_group=[], options={'mcast_flood_reports': 'true', 'requested-chassis': 'compute-1.redhat.local'}, parent_name=[], enabled=[True], type=, dhcpv6_options=[], addresses=['fa:16:3e:79:44:48 10.100.0.3'], port_security=['fa:16:3e:79:44:48 10.100.0.3'], tag_request=[], tag=[], dhcpv4_options=[<ovs.db.idl.Row object at 0x7f627c7925c0>], dynamic_addresses=[]) old=Logical_Switch_Port(up=[True]) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44 After Dumitru's advice I took a look once again in the neutron-server logs and here's what I found: When port was first created, neutron server set LSwitchPort like below: 2022-03-16 17:25:27.362 26 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): CheckRevisionNumberCommand(name=0dea4060-6ef4-454c-ba1f-7e72caf4ff50, resource={'id': '0dea4060-6ef4-454c-ba1f-7e72caf4ff50', 'name': '', 'network_id': 'df2a78e9-94cb-4d56-8874-1a997a2ab3d5', 'tenant_id': '428d480e704b4412aa47a08b0fc86265', 'mac_address': 'fa:16:3e:79:44:48', 'admin_state_up': True, 'status': 'ACTIVE' , 'device_id': 'c7d85029-d347-4661-8dc5-ca2cb451490a', 'device_owner': 'compute:nova', 'fixed_ips': [{'subnet_id': '6d9c998d-7f88-4d9b-bad5-91d1fca80aaf', 'ip_address': '10.100.0.3'}], 'allowed_address_pairs': [], 'extra_dhcp_opts': [], 'security_groups': ['4669e182-ec03-43ba-8eae-f61c47b1d788'], 'description': '', 'binding:vnic_type': 'normal', 'binding:profile': {'migrating_to': 'compute-0.redhat.local'}, 'binding:host_ id': 'compute-1.redhat.local', 'binding:vif_type': 'ovs', 'binding:vif_details': {'port_filter': True}, 'qos_policy_id': None, 'port_security_enabled': True, 'dns_name': 'tempest-testnetworkadvancedserverops-server-309627669', 'dns_assignment': [{'ip_address': '10.100.0.3', 'hostname': 'tempest-testnetworkadvancedserverops-server-309627669', 'fqdn': 'tempest-testnetworkadvancedserverops-server-309627669.openstackgate.loca l.'}], 'resource_request': None, 'ip_allocation': 'immediate', 'tags': [], 'created_at': '2022-03-16T17:24:00Z', 'updated_at': '2022-03-16T17:25:27Z', 'revision_number': 8, 'project_id': '428d480e704b4412aa47a08b0fc86265', 'network': {'id': 'df2a78e9-94cb-4d56-8874-1a997a2ab3d5', 'name': 'tempest-network-smoke--387686606', 'tenant_id': '428d480e704b4412aa47a08b0fc86265', 'admin_state_up': True, 'mtu': 1442, 'status': 'ACT IVE', 'subnets': ['6d9c998d-7f88-4d9b-bad5-91d1fca80aaf'], 'shared': False, 'availability_zone_hints': [], 'availability_zones': [], 'ipv4_address_scope': None, 'ipv6_address_scope': None, 'router:external': False, 'vlan_transparent': False, 'description': '', 'qos_policy_id': None, 'port_security_enabled': True, 'dns_domain': '', 'l2_adjacency': True, 'tags': [], 'created_at': '2022-03-16T17:23:44Z', 'updated_at': '2022- 03-16T17:23:50Z', 'revision_number': 2, 'project_id': '428d480e704b4412aa47a08b0fc86265', 'provider:network_type': 'geneve', 'provider:physical_network': None, 'provider:segmentation_id': 3}}, resource_type=ports, if_exists=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84 2022-03-16 17:25:27.363 26 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=1): SetLSwitchPortCommand(lport=0dea4060-6ef4-454c-ba1f-7e72caf4ff50, columns={'external_ids': {'neutron:port_name': '', 'neutron:device_id': 'c7d85029-d347-4661-8dc5-ca2cb451490a', 'neutron:project_id': '428d480e704b4412aa47a08b0fc86265', 'neutron:cidrs': '10.100.0.3/26', 'neutron:device_owner': 'compute:nova', 'neutron: network_name': 'neutron-df2a78e9-94cb-4d56-8874-1a997a2ab3d5', 'neutron:security_group_ids': '4669e182-ec03-43ba-8eae-f61c47b1d788', 'neutron:revision_number': '8', 'neutron:port_fip': '10.0.0.245'}, 'parent_name': [], 'tag': [], 'options': {'requested-chassis': 'compute-1.redhat.local', 'mcast_flood_reports': 'true'}, 'enabled': True, 'port_security': ['fa:16:3e:79:44:48 10.100.0.3'], 'dhcpv4_options': [UUID('9e6b3339-17 e8-441b-b472-114da2dc9765')], 'dhcpv6_options': [], 'type': '', 'addresses': ['fa:16:3e:79:44:48 10.100.0.3'], 'ha_chassis_group': []}, if_exists=False) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84 2022-03-16 17:25:27.365 26 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=2): PgAddPortCommand(port_group=neutron_pg_drop, lsp=['0dea4060-6ef4-454c-ba1f-7e72caf4ff50'], if_exists=False) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84 But later, after migration was done it was: 2022-03-16 17:25:33.281 28 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): CheckRevisionNumberCommand(name=0dea4060-6ef4-454c-ba1f-7e72caf4ff50, resource={'id': '0dea4060-6ef4-454c-ba1f-7e72caf4ff50', 'name': '', 'network_id': 'df2a78e9-94cb-4d56-8874-1a997a2ab3d5', 'tenant_id': '428d480e704b4412aa47a08b0fc86265', 'mac_address': 'fa:16:3e:79:44:48', 'admin_state_up': True, 'status': 'DOWN', 'device_id': 'c7d85029-d347-4661-8dc5-ca2cb451490a', 'device_owner': 'compute:nova', 'fixed_ips': [{'subnet_id': '6d9c998d-7f88-4d9b-bad5-91d1fca80aaf', 'ip_address': '10.100.0.3'}], 'allowed_address_pairs': [], 'extra_dhcp_opts': [], 'security_groups': ['4669e182-ec03-43ba-8eae-f61c47b1d788'], 'description': '', 'binding:vnic_type': 'normal', 'binding:profile': {}, 'binding:host_id': 'compute-0.redhat.local', 'binding:vif_type': 'ovs', 'binding:vif_details': {'port_filter': True}, 'qos_policy_id': None, 'port_security_enabled': True, 'dns_name': 'tempest-testnetworkadvancedserverops-server-309627669', 'dns_assignment': [{'ip_address': '10.100.0.3', 'hostname': 'tempest-testnetworkadvancedserverops-server-309627669', 'fqdn': 'tempest-testnetworkadvancedserverops-server-309627669.openstackgate.local.'}], 'resource_request': None, 'ip_allocation': 'immediate', 'tags': [], 'created_at': '2022-03-16T17:24:00Z', 'updated_at': '2022-03-16T17:25:32Z', 'revision_number': 10, 'project_id': '428d480e704b4412aa47a08b0fc86265', 'network': {'id': 'df2a78e9-94cb-4d56-8874-1a997a2ab3d5', 'name': 'tempest-network-smoke--387686606', 'tenant_id': '428d480e704b4412aa47a08b0fc86265', 'admin_state_up': True, 'mtu': 1442, 'status': 'ACTIVE', 'subnets': ['6d9c998d-7f88-4d9b-bad5-91d1fca80aaf'], 'shared': False, 'availability_zone_hints': [], 'availability_zones': [], 'ipv4_address_scope': None, 'ipv6_address_scope': None, 'router:external': False, 'vlan_transparent': False, 'description': '', 'qos_policy_id': None, 'port_security_enabled': True, 'dns_domain': '', 'l2_adjacency': True, 'tags': [], 'created_at': '2022-03-16T17:23:44Z', 'updated_at': '2022-03-16T17:23:50Z', 'revision_number': 2, 'project_id': '428d480e704b4412aa47a08b0fc86265', 'provider:network_type': 'geneve', 'provider:physical_network': None, 'provider:segmentation_id': 3}}, resource_type=ports, if_exists=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84 2022-03-16 17:25:33.282 28 ERROR ovsdbapp.backend.ovs_idl.transaction [req-bad09089-e54d-4b5d-9da8-6e34f036b72e 81b2cd2b1d4a4b6f96bb9a6c26f0d3f1 2a80b1acbc7440da80533d75337a7d69 - default default] Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 128, in run txn.results.put(txn.do_commit()) File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 86, in do_commit command.run_idl(txn) File "/usr/lib/python3.6/site-packages/networking_ovn/ovsdb/commands.py", line 953, in run_idl resource_id=self.name, resource_type=self.resource_type) networking_ovn.common.exceptions.RevisionConflict: OVN revision number for 0dea4060-6ef4-454c-ba1f-7e72caf4ff50 (type: ports) is equal or higher than the given resource. Skipping update 2022-03-16 17:25:33.283 28 INFO networking_ovn.ovsdb.impl_idl_ovn [req-bad09089-e54d-4b5d-9da8-6e34f036b72e 81b2cd2b1d4a4b6f96bb9a6c26f0d3f1 2a80b1acbc7440da80533d75337a7d69 - default default] Transaction aborted. Reason: OVN revision number for 0dea4060-6ef4-454c-ba1f-7e72caf4ff50 (type: ports) is equal or higher than the given resource. Skipping update So I guess that because that revision number mismatch, requested-chassis for that LSwitchPort wasn't updated thus port wasn't claimed on the compute-0. Now we need to find out why the revision number didn't match. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.9 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:8795 |