Bug 2166012

Summary: [Neutron][OVN] - VLAN-aware instances - sub-ports are taking 15-30 min to be available
Product: Red Hat OpenStack Reporter: Matt Flusche <mflusche>
Component: python-networking-ovnAssignee: Slawek Kaplonski <skaplons>
Status: CLOSED EOL QA Contact: Eran Kuris <ekuris>
Severity: high Docs Contact:
Priority: high    
Version: 16.1 (Train)CC: apevec, bcafarel, chrisw, dhill, dhruv, fesilva, froyo, gthiemon, lhh, majopela, mburns, nalmond, ralonsoh, rpawlik, scohen, skaplons
Target Milestone: asyncKeywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: python-networking-ovn-7.3.1-1.20230331143541.4e24f4c.el8ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2169673 (view as bug list) Environment:
Last Closed: 2023-07-24 09:59:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2169673, 2169676    

Description Matt Flusche 2023-01-31 17:09:04 UTC
Description of problem:

This issues only occurs in larger environments.  Not able to reproduce in lab or smaller deployments.

Standard OVN ports seem to work fine and are available immediately.

VLAN sub-ports take up to 30 min to be available.

This setup is deployed via Heat.  I'll provide more details, debug logs, and example heat template in private comments.

From the neutron server debug logs, I don't see obvious error; however, we see the neutron maintenance task fix the sub-port which correlates to the port functioning (ping test works).

Here is an example of this occurrence:

Main port: da4b8088-0f94-48d2-ad7b-ee86a524eeb9

sub-ports: 339c394d-f133-4d68-988b-83ec39cc165d (vlan 10)
           c5d7e07f-fc6f-4881-bbcd-4ccfc2590183 (vlan 20)

Trunk port: dc394e91-41ca-4d27-9c0d-44952973569a

From the Heat stack, note the timestamps for the trunk & ports:

| servera-1_trunk        | dc394e91-41ca-4d27-9c0d-44952973569a | OS::Neutron::Trunk    | CREATE_COMPLETE | 2023-01-30T23:08:17Z | novello-admins-0001 |
| servera-1_vlan20_port  | c5d7e07f-fc6f-4881-bbcd-4ccfc2590183 | OS::Neutron::Port     | CREATE_COMPLETE | 2023-01-30T23:08:17Z | novello-admins-0001 |
| servera-1_vlan10_port  | 339c394d-f133-4d68-988b-83ec39cc165d | OS::Neutron::Port     | CREATE_COMPLETE | 2023-01-30T23:08:17Z | novello-admins-0001 |
| servera-1_port         | da4b8088-0f94-48d2-ad7b-ee86a524eeb9 | OS::Neutron::Port     | CREATE_COMPLETE | 2023-01-30T23:08:18Z | novello-admins-0001 |

From the neutron debug log we see the port is ACTIVE with vlan sub-ports

2023-01-30 23:09:30.669 32 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): CheckRevisionNumberCommand(name=da4b8088-0f94-48d2-ad7b-ee86a524eeb9, resource={'id': 'da4b8088-0f94-48d2-ad7b-ee86a524eeb9', 'name': 'novello-admins-0001-servera-1_port-qkmeent3pb5c', 'network_id': '688bc77f-5a59-4288-9583-65a9cee9bdd8', 'tenant_id': '5419d14db9aa45599c0458b71b237046', 'mac_address': '52:54:00:00:FA:FA', 'admin_state_up': True, 'status': 'ACTIVE', 'device_id': '1b93b197-a200-47d5-b864-f7f7d4fccc5b', 'device_owner': 'compute:FAIL', 'fixed_ips': [{'subnet_id': '800b1dd1-8efd-4c2a-8db0-7ddb7bf7de07', 'ip_address': '172.25.252.115'}], 'allowed_address_pairs': [], 'extra_dhcp_opts': [], 'security_groups': [], 'description': '', 'binding:vnic_type': 'normal', 'binding:profile': {}, 'binding:host_id': 'REMOVED', 'binding:vif_type': 'ovs', 'binding:vif_details': {'port_filter': True}, 'port_security_enabled': False, 'dns_name': 'servera', 'dns_assignment': [{'ip_address': '172.25.252.115', 'hostname': 'servera', 'fqdn': 'servera.example.com.'}], 'trunk_details': {'trunk_id': 'dc394e91-41ca-4d27-9c0d-44952973569a', 'sub_ports': [{'segmentation_id': 10, 'segmentation_type': 'vlan', 'port_id': '339c394d-f133-4d68-988b-83ec39cc165d', 'mac_address': '52:54:00:00:FA:FA'}, {'segmentation_id': 20, 'segmentation_type': 'vlan', 'port_id': 'c5d7e07f-fc6f-4881-bbcd-4ccfc2590183', 'mac_address': '52:54:00:00:FA:FA'}]}, 'ip_allocation': 'immediate', 'tags': [], 'created_at': '2023-01-30T23:08:23Z', 'updated_at': '2023-01-30T23:09:30Z', 'revision_number': 4, 'project_id': '5419d14db9aa45599c0458b71b237046', 'network': {'id': '688bc77f-5a59-4288-9583-65a9cee9bdd8', 'name': 'classroom_network', 'tenant_id': '5419d14db9aa45599c0458b71b237046', 'admin_state_up': True, 'mtu': 8942, 'status': 'ACTIVE', 'subnets': ['800b1dd1-8efd-4c2a-8db0-7ddb7bf7de07'], 'shared': False, 'availability_zone_hints': [], 'availability_zones': [], 'ipv4_address_scope': None, 'ipv6_address_scope': None, 'router:external': False, 'vlan_transparent': None, 'description': '', 'port_security_enabled': True, 'dns_domain': '', 'l2_adjacency': True, 'tags': [], 'created_at': '2023-01-30T23:08:21Z', 'updated_at': '2023-01-30T23:08:22Z', 'revision_number': 2, 'project_id': '5419d14db9aa45599c0458b71b237046', 'provider:network_type': 'geneve', 'provider:physical_network': None, 'provider:segmentation_id': 171}}, resource_type=ports, if_exists=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84 

However, the sub-ports dont work until this maintenance task completes about 15 min later.

2023-01-30 23:24:29.476 39 DEBUG networking_ovn.common.maintenance [req-fe4556bb-84bf-41ba-afbd-434888fe8730 - - - - -] Maintenance task: Fixing resource 339c394d-f133-4d68-988b-83ec39cc165d
 (type: ports) at create/update check_for_inconsistencies /usr/lib/python3.6/site-packages/networking_ovn/common/maintenance.py:353
2023-01-30 23:24:29.558 31 DEBUG neutron.wsgi [-] (31) accepted ('10.212.200.5', 57878) server /usr/lib/python3.6/site-packages/eventlet/wsgi.py:985
2023-01-30 23:24:29.602 39 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): CheckRevisionNumberCommand(name=339c394d-f133-4d68-988b-83ec39cc165d, resource={'id': '339c394d-f133-4d68-988b-83ec39cc165d', 'name': 'REMOVED-admins-0001-servera-1_vlan10_port-aeisn2s5skp6', 'network_id': '389d732f-a6b9-49eb-a51f-d450f9a7b37e', 'tenant_id': '5419d14db9aa45599c0458b71b237046', 'mac_address': '52:54:00:00:FA:FA', 'admin_state_up': True, 'status': 'ACTIVE', 'device_id': '', 'device_owner': 'trunk:subport', 'fixed_ips': [], 'allowed_address_pairs': [], 'extra_dhcp_opts': [], 'security_groups': [], 'description': '', 'binding:vnic_type': 'normal', 'binding:profile': {'parent_name': 'da4b8088-0f94-48d2-ad7b-ee86a524eeb9', 'tag': 10}, 'binding:host_id': '', 'binding:vif_type': 'ovs', 'binding:vif_details': {}, 'port_security_enabled': False, 'dns_name': '', 'dns_assignment': [], 'ip_allocation': 'immediate', 'tags': [], 'created_at': '2023-01-30T23:08:27Z', 'updated_at': '2023-01-30T23:08:33Z', 'revision_number': 2, 'project_id': '5419d14db9aa45599c0458b71b237046'}, resource_type=ports, if_exists=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
2023-01-30 23:24:29.602 39 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=1): SetLSwitchPortCommand(lport=339c394d-f133-4d68-988b-83ec39cc165d, columns={'external_ids': {'neutron:port_name': 'novello-admins-0001-servera-1_vlan10_port-aeisn2s5skp6', 'neutron:device_id': '', 'neutron:project_id': '5419d14db9aa45599c0458b71b237046', 'neutron:cidrs': '', 'neutron:device_owner': 'trunk:subport', 'neutron:network_name': 'neutron-389d732f-a6b9-49eb-a51f-d450f9a7b37e', 'neutron:security_group_ids': '', 'neutron:revision_number': '2'}, 'parent_name': 'da4b8088-0f94-48d2-ad7b-ee86a524eeb9', 'tag': 10, 'options': {'requested-chassis': '', 'mcast_flood_reports': 'true'}, 'enabled': True, 'port_security': [], 'dhcpv4_options': [], 'dhcpv6_options': [], 'type': '', 'addresses': ['52:54:00:00:FA:FA', 'unknown'], 'ha_chassis_group': []}, if_exists=False) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84

I can't pinpoint an obvious issue specific to the sub-ports; however, there are a lot of failing maintenance tasks in the logs, example:

ERROR networking_ovn.common.maintenance [req-UUID - - - - -] Maintenance task: Failed to fix deleted resource UUID (type: subnets): KeyError: 'uuid'


Version-Release number of selected component (if applicable):
16.1.8

How reproducible:
These environments

Steps to Reproduce:
1. heat template attached in these specific environments
2.
3.

Additional info:
Provided in additional comments