Bug 2083245 - Unable to reliably create just one resource
Summary: Unable to reliably create just one resource
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-tripleo
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: OSP Team
QA Contact: Eran Kuris
URL:
Whiteboard:
: 2083244 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-09 14:43 UTC by Michał Dulko
Modified: 2025-01-31 10:12 UTC (History)
10 users (show)

Fixed In Version: puppet-tripleo-11.7.0-2.20220804135307.941ad1a.el8osttrunk
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2025-01-31 10:02:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 843863 0 None NEW Change neutron backend's timeout in haproxy to 20 minutes 2022-05-30 13:29:53 UTC
OpenStack gerrit 851700 0 None NEW Change neutron backend's timeout in haproxy to 10 minutes 2022-08-01 08:08:40 UTC
Red Hat Issue Tracker OSP-15082 0 None None None 2022-05-09 14:45:48 UTC

Description Michał Dulko 2022-05-09 14:43:01 UTC
Description of problem:
This is more or less an iteration of [1]. We've run into similar problems with 504 being returned from the HAProxy after 2 minutes of neutron-server processing the request, but now on calls to create networks or subnets. The main issue here is that then Kuryr attempts to retry the creation. In order to do that it queries Neutron to check if network or subnet of a specified name exists already. If it does not, Kuryr will proceed with recreation. It had happened multiple times that no network or subnet appeared on the list, but we've ended with a duplicate. The only possibility is that the 504'd request was still being processed in the background.

At this point I think request timeouts should be synchronized between the HAProxy and the neutron-servers behind it. Otherwise we can't really know if the request is being processed or not.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2024690


Version-Release number of selected component (if applicable):


How reproducible:
Fairly easy, when some scale is applied to Neutron.

Steps to Reproduce:
1. Install OpenShift with Kuryr.
2. Run creation of several namespaces with several pods in there, so that Kuryr will start to make multiple concurrent requests to Neutron API.

Actual results:
Some calls end up with 504, yet they're still being processed internally by neutron-server. Kuryr will never get back the IDs of the created resources.

Expected results:
If we get error from the API, we should be guaranteed no resource was created.

Additional info:
I'm happy to assist with reproducing this, it should be fairly easy.

Comment 1 Michał Dulko 2022-05-09 15:06:42 UTC
*** Bug 2083244 has been marked as a duplicate of this bug. ***

Comment 5 Slawek Kaplonski 2022-06-29 13:28:37 UTC
Hi Gabriel,

There is u/s patch proposed https://review.opendev.org/c/openstack/puppet-tripleo/+/843863 already. I just addressed comments there. As soon as it will be merged I will propose backports to stable branches.

Comment 9 Pierre Prinetti 2025-01-31 09:58:24 UTC
This can be closed, with TripleO being EoL.


Note You need to log in before you can comment on or make changes to this bug.