Bug 2037332

Summary: [OSP16.2] Slow creation of instances (on OVN) versus other environments with OVS
Product: Red Hat OpenStack Reporter: ggrimaux
Component: python-networking-ovnAssignee: Rodolfo Alonso <ralonsoh>
Status: CLOSED ERRATA QA Contact: Eduardo Olivares <eolivare>
Severity: high Docs Contact:
Priority: high    
Version: 16.2 (Train)CC: apevec, ccamposr, cfields, chrisw, dhill, jlibosva, lhh, majopela, ralonsoh, scohen, shtiwari
Target Milestone: z2Keywords: Triaged
Target Release: 16.2 (Train on RHEL 8.4)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-networking-ovn-7.4.2-2.20211223144852.8107638.el8ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-23 22:12:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description ggrimaux 2022-01-05 13:08:07 UTC
Description of problem:
Client has processes creating a lot of instances at the same time.
Where previously on setup with OVS it was working fine, now with OVN it can take up to 600 seconds for instances to become available.

When max_concurrent_builds is set to 2 and up, a lot of error are happening like:
openstack.exceptions.HttpException: HttpException: 503: Server Error for url: https://$URL:13774/v2.1/servers/$uuid, problems. Please try again later.: 503 Service Unavailable: The server is temporarily unable to service your: Service Unavailable: request due to maintenance downtime or capacity

In other environments with OVS, max_concurrent_builds is set to 3 and it works just fine.

Also a LOT of errors like those are seen in the logs /var/lib/containers/neutron/server.log
2021-12-23 16:26:46.481 48 ERROR networking_ovn.common.maintenance [req-c1605008-f3b7-4014-9665-7ea4455a457d - - - - -] Maintenance task: Failed to fix deleted resource $uuid (type: subnets): KeyError: 'uuid'
2021-12-23 16:26:46.481 48 ERROR networking_ovn.common.maintenance Traceback (most recent call last):
2021-12-23 16:26:46.481 48 ERROR networking_ovn.common.maintenance   File "/usr/lib/python3.6/site-packages/networking_ovn/common/maintenance.py", line 379, in check_for_inconsistencies
2021-12-23 16:26:46.481 48 ERROR networking_ovn.common.maintenance     self._ovn_client.delete_subnet(row.resource_uuid)
2021-12-23 16:26:46.481 48 ERROR networking_ovn.common.maintenance   File "/usr/lib/python3.6/site-packages/networking_ovn/common/ovn_client.py", line 2044, in delete_subnet
2021-12-23 16:26:46.481 48 ERROR networking_ovn.common.maintenance     self._remove_subnet_dhcp_options(subnet_id, txn)
2021-12-23 16:26:46.481 48 ERROR networking_ovn.common.maintenance   File "/usr/lib/python3.6/site-packages/networking_ovn/common/ovn_client.py", line 1910, in _remove_subnet_dhcp_options
2021-12-23 16:26:46.481 48 ERROR networking_ovn.common.maintenance     dhcp_options['subnet']['uuid']))
2021-12-23 16:26:46.481 48 ERROR networking_ovn.common.maintenance KeyError: 'uuid'

We need your help into identifying the bottleneck here.

Anything else you need please let me know.

Thanks!

Version-Release number of selected component (if applicable):
OSP16.2.0

How reproducible:
100%

Steps to Reproduce:
1. Try to create a lot of instances at the same time
2.
3.

Actual results:
Slow instance creation.

Expected results:
Same speed as when using OVS network protocol.

Additional info:
We have sosreport with logs.

Comment 17 errata-xmlrpc 2022-03-23 22:12:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.2), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:1001