Bug 2068065 - tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOps.test_server_connectivity_live_migration failing
Summary: tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOp...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-networking-ovn
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z9
: 16.1 (Train on RHEL 8.2)
Assignee: Slawek Kaplonski
QA Contact: Bharath M V
URL:
Whiteboard:
Depends On:
Blocks: 2078544
TreeView+ depends on / blocked
 
Reported: 2022-03-24 11:38 UTC by Eran Kuris
Modified: 2022-12-07 20:26 UTC (History)
8 users (show)

Fixed In Version: python-networking-ovn-7.3.1-1.20220502142031.4e24f4c.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2078544 (view as bug list)
Environment:
Last Closed: 2022-12-07 20:26:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1967144 0 None None None 2022-03-30 14:32:29 UTC
OpenStack gerrit 836618 0 None NEW Retry port_update in the OVN if revision mismatch during live-migration 2022-04-05 12:01:47 UTC
Red Hat Issue Tracker OSP-14220 0 None None None 2022-03-24 11:41:37 UTC
Red Hat Product Errata RHBA-2022:8795 0 None None None 2022-12-07 20:26:38 UTC

Description Eran Kuris 2022-03-24 11:38:38 UTC
Description of problem:
After live migration, the VM is not reachable via external connectivity.



Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tempest/common/utils/__init__.py", line 89, in wrapper
    return f(*func_args, **func_kwargs)
  File "/usr/lib/python3.6/site-packages/tempest/scenario/test_network_advanced_server_ops.py", line 263, in test_server_connectivity_live_migration
    server, keypair, floating_ip)
  File "/usr/lib/python3.6/site-packages/tempest/scenario/test_network_advanced_server_ops.py", line 103, in _wait_server_status_and_check_network_connectivity
    self._check_network_connectivity(server, keypair, floating_ip)
  File "/usr/lib/python3.6/site-packages/tempest/scenario/test_network_advanced_server_ops.py", line 96, in _check_network_connectivity
    server)
  File "/usr/lib/python3.6/site-packages/tempest/scenario/manager.py", line 723, in check_vm_connectivity
    msg=msg)
  File "/usr/lib/python3.6/site-packages/unittest2/case.py", line 705, in assertTrue
    raise self.failureException(msg)
AssertionError: False is not true : Public network connectivity check failed
Timed out waiting for 10.0.0.245 to become reachable

Version-Release number of selected component (if applicable):
RHOS-16.1-RHEL-8-20220315.n.1

How reproducible:
15% 
Steps to Reproduce:
1. run ci job:https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-network-networking-ovn-16.1_director-rhel-virthost-3cont_2comp-ipv6-geneve-dvr-rally/20/testReport/tempest.scenario.test_network_advanced_server_ops/TestNetworkAdvancedServerOps/Finally_Steps___test_server_connectivity_live_migration_compute_id_03fd1562_faad_11e7_9ea0_fa163e65f5ce_network_slow_/
2.
3.

Actual results:
VM migration is failing 

Expected results:
pass 

Additional info:
http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-networking-ovn-16.1_director-rhel-virthost-3cont_2comp-ipv6-geneve-dvr-rally/20/compute-1/var/log/containers/nova/nova-compute.log.gz

Comment 1 Slawek Kaplonski 2022-03-29 10:08:55 UTC
I took a look at this issue today with Rodolfo and we think that the issue may be in the core OVN. Here's what we found:

- Issue happened with instance c7d85029-d347-4661-8dc5-ca2cb451490a which has port "0dea4060-6ef4-454c-ba1f-7e72caf4ff50" (neutron port uuid).
- instance was first started on compute-1 and that was done without problems. Here's log from the compute-1:

ovs-vswitchd.log
2022-03-16T17:24:04.864Z|00145|bridge|INFO|bridge br-int: added interface tap0dea4060-6e on port 41

ovn-controller.log
2022-03-16T17:24:04.867Z|00101|binding|INFO|Claiming lport 0dea4060-6ef4-454c-ba1f-7e72caf4ff50 for this chassis.
2022-03-16T17:24:04.867Z|00102|binding|INFO|0dea4060-6ef4-454c-ba1f-7e72caf4ff50: Claiming fa:16:3e:79:44:48 10.100.0.3

- next there was live migration of that vm to the compute-0
Port was added by nova to the br-int:
2022-03-16T17:25:28.021Z|00173|bridge|INFO|bridge br-int: added interface tap0dea4060-6e on port 48

but it was not claimed by that chassis:
2022-03-16T17:25:28.024Z|00264|binding|INFO|Not claiming lport 0dea4060-6ef4-454c-ba1f-7e72caf4ff50, chassis 536843b0-df62-441d-9ab4-e11743969074 requested-chassis compute-1.redhat.local

This seems to be fine because it wasn't yet released by the compute-1 chassis but that happened 2 seconds later:
2022-03-16T17:25:30.666Z|00252|binding|INFO|Releasing lport 0dea4060-6ef4-454c-ba1f-7e72caf4ff50 from this chassis.

And even then, ovn-controller on compute-0 couldn't claim lport:
2022-03-16T17:26:23.068Z|00270|binding|INFO|Not claiming lport 0dea4060-6ef4-454c-ba1f-7e72caf4ff50, chassis 536843b0-df62-441d-9ab4-e11743969074 requested-chassis compute-1.redhat.local

That's all regarding that port what we found in the ovn-controllers on both nodes.
In Neutron we can see LogicalSwitchPortDown event after port was deleted from compute-1:

2022-03-16 17:25:30.725 28 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: LogicalSwitchPortUpdateDownEvent(events=('update',), table='Logical_Switch_Port', conditions=(('up', '=', False),), old_conditions=(('up', '=', True),)) to row=Logical_Switch_Port(name=0dea4060-6ef4-454c-ba1f-7e72caf4ff50, up=[False], external_ids={'neutron:cidrs': '10.100.0.3/26', 'neutron:device_id': 'c7d85029-d347-4661-8dc5-ca2cb451490a', 'neutron:device_owner': 'compute:nova', 'neutron:network_name': 'neutron-df2a78e9-94cb-4d56-8874-1a997a2ab3d5', 'neutron:port_name': '', 'neutron:project_id': '428d480e704b4412aa47a08b0fc86265', 'neutron:revision_number': '8', 'neutron:security_group_ids': '4669e182-ec03-43ba-8eae-f61c47b1d788', 'neutron:port_fip': '10.0.0.245'}, ha_chassis_group=[], options={'mcast_flood_reports': 'true', 'requested-chassis': 'compute-1.redhat.local'}, parent_name=[], enabled=[True], type=, dhcpv6_options=[], addresses=['fa:16:3e:79:44:48 10.100.0.3'], port_security=['fa:16:3e:79:44:48 10.100.0.3'], tag_request=[], tag=[], dhcpv4_options=[<ovs.db.idl.Row object at 0x7f627c7925c0>], dynamic_addresses=[]) old=Logical_Switch_Port(up=[True]) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44

Comment 4 Slawek Kaplonski 2022-03-29 12:24:18 UTC
After Dumitru's advice I took a look once again in the neutron-server logs and here's what I found:

When port was first created, neutron server set LSwitchPort like below:

2022-03-16 17:25:27.362 26 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): CheckRevisionNumberCommand(name=0dea4060-6ef4-454c-ba1f-7e72caf4ff50, resource={'id': '0dea4060-6ef4-454c-ba1f-7e72caf4ff50', 'name': '', 'network_id': 'df2a78e9-94cb-4d56-8874-1a997a2ab3d5', 'tenant_id': '428d480e704b4412aa47a08b0fc86265', 'mac_address': 'fa:16:3e:79:44:48', 'admin_state_up': True, 'status': 'ACTIVE'
, 'device_id': 'c7d85029-d347-4661-8dc5-ca2cb451490a', 'device_owner': 'compute:nova', 'fixed_ips': [{'subnet_id': '6d9c998d-7f88-4d9b-bad5-91d1fca80aaf', 'ip_address': '10.100.0.3'}], 'allowed_address_pairs': [], 'extra_dhcp_opts': [], 'security_groups': ['4669e182-ec03-43ba-8eae-f61c47b1d788'], 'description': '', 'binding:vnic_type': 'normal', 'binding:profile': {'migrating_to': 'compute-0.redhat.local'}, 'binding:host_
id': 'compute-1.redhat.local', 'binding:vif_type': 'ovs', 'binding:vif_details': {'port_filter': True}, 'qos_policy_id': None, 'port_security_enabled': True, 'dns_name': 'tempest-testnetworkadvancedserverops-server-309627669', 'dns_assignment': [{'ip_address': '10.100.0.3', 'hostname': 'tempest-testnetworkadvancedserverops-server-309627669', 'fqdn': 'tempest-testnetworkadvancedserverops-server-309627669.openstackgate.loca
l.'}], 'resource_request': None, 'ip_allocation': 'immediate', 'tags': [], 'created_at': '2022-03-16T17:24:00Z', 'updated_at': '2022-03-16T17:25:27Z', 'revision_number': 8, 'project_id': '428d480e704b4412aa47a08b0fc86265', 'network': {'id': 'df2a78e9-94cb-4d56-8874-1a997a2ab3d5', 'name': 'tempest-network-smoke--387686606', 'tenant_id': '428d480e704b4412aa47a08b0fc86265', 'admin_state_up': True, 'mtu': 1442, 'status': 'ACT
IVE', 'subnets': ['6d9c998d-7f88-4d9b-bad5-91d1fca80aaf'], 'shared': False, 'availability_zone_hints': [], 'availability_zones': [], 'ipv4_address_scope': None, 'ipv6_address_scope': None, 'router:external': False, 'vlan_transparent': False, 'description': '', 'qos_policy_id': None, 'port_security_enabled': True, 'dns_domain': '', 'l2_adjacency': True, 'tags': [], 'created_at': '2022-03-16T17:23:44Z', 'updated_at': '2022-
03-16T17:23:50Z', 'revision_number': 2, 'project_id': '428d480e704b4412aa47a08b0fc86265', 'provider:network_type': 'geneve', 'provider:physical_network': None, 'provider:segmentation_id': 3}}, resource_type=ports, if_exists=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
2022-03-16 17:25:27.363 26 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=1): SetLSwitchPortCommand(lport=0dea4060-6ef4-454c-ba1f-7e72caf4ff50, columns={'external_ids': {'neutron:port_name': '', 'neutron:device_id': 'c7d85029-d347-4661-8dc5-ca2cb451490a', 'neutron:project_id': '428d480e704b4412aa47a08b0fc86265', 'neutron:cidrs': '10.100.0.3/26', 'neutron:device_owner': 'compute:nova', 'neutron:
network_name': 'neutron-df2a78e9-94cb-4d56-8874-1a997a2ab3d5', 'neutron:security_group_ids': '4669e182-ec03-43ba-8eae-f61c47b1d788', 'neutron:revision_number': '8', 'neutron:port_fip': '10.0.0.245'}, 'parent_name': [], 'tag': [], 'options': {'requested-chassis': 'compute-1.redhat.local', 'mcast_flood_reports': 'true'}, 'enabled': True, 'port_security': ['fa:16:3e:79:44:48 10.100.0.3'], 'dhcpv4_options': [UUID('9e6b3339-17
e8-441b-b472-114da2dc9765')], 'dhcpv6_options': [], 'type': '', 'addresses': ['fa:16:3e:79:44:48 10.100.0.3'], 'ha_chassis_group': []}, if_exists=False) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
2022-03-16 17:25:27.365 26 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=2): PgAddPortCommand(port_group=neutron_pg_drop, lsp=['0dea4060-6ef4-454c-ba1f-7e72caf4ff50'], if_exists=False) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84



But later, after migration was done it was:
2022-03-16 17:25:33.281 28 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): CheckRevisionNumberCommand(name=0dea4060-6ef4-454c-ba1f-7e72caf4ff50, resource={'id': '0dea4060-6ef4-454c-ba1f-7e72caf4ff50', 'name': '', 'network_id': 'df2a78e9-94cb-4d56-8874-1a997a2ab3d5', 'tenant_id': '428d480e704b4412aa47a08b0fc86265', 'mac_address': 'fa:16:3e:79:44:48', 'admin_state_up': True, 'status': 'DOWN', 'device_id': 'c7d85029-d347-4661-8dc5-ca2cb451490a', 'device_owner': 'compute:nova', 'fixed_ips': [{'subnet_id': '6d9c998d-7f88-4d9b-bad5-91d1fca80aaf', 'ip_address': '10.100.0.3'}], 'allowed_address_pairs': [], 'extra_dhcp_opts': [], 'security_groups': ['4669e182-ec03-43ba-8eae-f61c47b1d788'], 'description': '', 'binding:vnic_type': 'normal', 'binding:profile': {}, 'binding:host_id': 'compute-0.redhat.local', 'binding:vif_type': 'ovs', 'binding:vif_details': {'port_filter': True}, 'qos_policy_id': None, 'port_security_enabled': True, 'dns_name': 'tempest-testnetworkadvancedserverops-server-309627669', 'dns_assignment': [{'ip_address': '10.100.0.3', 'hostname': 'tempest-testnetworkadvancedserverops-server-309627669', 'fqdn': 'tempest-testnetworkadvancedserverops-server-309627669.openstackgate.local.'}], 'resource_request': None, 'ip_allocation': 'immediate', 'tags': [], 'created_at': '2022-03-16T17:24:00Z', 'updated_at': '2022-03-16T17:25:32Z', 'revision_number': 10, 'project_id': '428d480e704b4412aa47a08b0fc86265', 'network': {'id': 'df2a78e9-94cb-4d56-8874-1a997a2ab3d5', 'name': 'tempest-network-smoke--387686606', 'tenant_id': '428d480e704b4412aa47a08b0fc86265', 'admin_state_up': True, 'mtu': 1442, 'status': 'ACTIVE', 'subnets': ['6d9c998d-7f88-4d9b-bad5-91d1fca80aaf'], 'shared': False, 'availability_zone_hints': [], 'availability_zones': [], 'ipv4_address_scope': None, 'ipv6_address_scope': None, 'router:external': False, 'vlan_transparent': False, 'description': '', 'qos_policy_id': None, 'port_security_enabled': True, 'dns_domain': '', 'l2_adjacency': True, 'tags': [], 'created_at': '2022-03-16T17:23:44Z', 'updated_at': '2022-03-16T17:23:50Z', 'revision_number': 2, 'project_id': '428d480e704b4412aa47a08b0fc86265', 'provider:network_type': 'geneve', 'provider:physical_network': None, 'provider:segmentation_id': 3}}, resource_type=ports, if_exists=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
2022-03-16 17:25:33.282 28 ERROR ovsdbapp.backend.ovs_idl.transaction [req-bad09089-e54d-4b5d-9da8-6e34f036b72e 81b2cd2b1d4a4b6f96bb9a6c26f0d3f1 2a80b1acbc7440da80533d75337a7d69 - default default] Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 128, in run
    txn.results.put(txn.do_commit())
  File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 86, in do_commit
    command.run_idl(txn)
  File "/usr/lib/python3.6/site-packages/networking_ovn/ovsdb/commands.py", line 953, in run_idl
    resource_id=self.name, resource_type=self.resource_type)
networking_ovn.common.exceptions.RevisionConflict: OVN revision number for 0dea4060-6ef4-454c-ba1f-7e72caf4ff50 (type: ports) is equal or higher than the given resource. Skipping update

2022-03-16 17:25:33.283 28 INFO networking_ovn.ovsdb.impl_idl_ovn [req-bad09089-e54d-4b5d-9da8-6e34f036b72e 81b2cd2b1d4a4b6f96bb9a6c26f0d3f1 2a80b1acbc7440da80533d75337a7d69 - default default] Transaction aborted. Reason: OVN revision number for 0dea4060-6ef4-454c-ba1f-7e72caf4ff50 (type: ports) is equal or higher than the given resource. Skipping update

So I guess that because that revision number mismatch, requested-chassis for that LSwitchPort wasn't updated thus port wasn't claimed on the compute-0. Now we need to find out why the revision number didn't match.

Comment 15 errata-xmlrpc 2022-12-07 20:26:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.9 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8795


Note You need to log in before you can comment on or make changes to this bug.