Description of problem: This is more or less an iteration of [1]. We've run into similar problems with 504 being returned from the HAProxy after 2 minutes of neutron-server processing the request, but now on calls to create networks or subnets. The main issue here is that then Kuryr attempts to retry the creation. In order to do that it queries Neutron to check if network or subnet of a specified name exists already. If it does not, Kuryr will proceed with recreation. It had happened multiple times that no network or subnet appeared on the list, but we've ended with a duplicate. The only possibility is that the 504'd request was still being processed in the background. At this point I think request timeouts should be synchronized between the HAProxy and the neutron-servers behind it. Otherwise we can't really know if the request is being processed or not. [1] https://bugzilla.redhat.com/show_bug.cgi?id=2024690 Version-Release number of selected component (if applicable): How reproducible: Fairly easy, when some scale is applied to Neutron. Steps to Reproduce: 1. Install OpenShift with Kuryr. 2. Run creation of several namespaces with several pods in there, so that Kuryr will start to make multiple concurrent requests to Neutron API. Actual results: Some calls end up with 504, yet they're still being processed internally by neutron-server. Kuryr will never get back the IDs of the created resources. Expected results: If we get error from the API, we should be guaranteed no resource was created. Additional info: I'm happy to assist with reproducing this, it should be fairly easy.
*** Bug 2083244 has been marked as a duplicate of this bug. ***
Hi Gabriel, There is u/s patch proposed https://review.opendev.org/c/openstack/puppet-tripleo/+/843863 already. I just addressed comments there. As soon as it will be merged I will propose backports to stable branches.
This can be closed, with TripleO being EoL.