Description of problem: Running Openshift on Openstack with Kuryr we rely on Neutron. Getting a subnet sometimes takes almost 2 minutes which have an impact for creating Openshift resources. From Kuryr logs: 2023-07-26 14:55:23.955 1 ERROR kuryr_kubernetes.utils [-] get_subnet(): test-846c6ffb69-n6dbc 2023-07-26 14:55:23.956 1 ERROR kuryr_kubernetes.utils [-] Looking for subnet test-846c6ffb69-n6dbc 2023-07-26 14:57:32.582 1 ERROR kuryr_kubernetes.utils [-] Looking for network test-846c6ffb69-n6dbc 2023-07-26 14:57:32.762 1 ERROR kuryr_kubernetes.utils [-] Got network test-846c6ffb69-n6dbc Kuryr Code: https://github.com/dulek/kuryr-kubernetes/blob/ae47fd58522700c1541ac6c5155c8bf36bf71d39/kuryr_kubernetes/utils.py#L300-L318 Version-Release number of selected component (if applicable): OSP RHOS-17.1-RHEL-9-20230719.n.1 OCP 4.13.0-0.nightly-2023-07-25-104746 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
This is the list of all ReST GET requests for subnets that were handled by neutron-server that took more than 1 second: ``` ihrachys@ihrachys-mac sos-report-kuryr-slow-subnet-fetch % find . -name 'server.log*' -exec zgrep -H 'INFO neutron.wsgi' {} \; | grep 'time: [1-9]' | grep GET | grep subnets ./sosreport-controller-1-2023-07-26-wxyppvr/var/log/containers/neutron/server.log.8.gz:2023-07-26 06:46:53.660 22 INFO neutron.wsgi [req-328a80da-a24c-45b3-a156-438f224cdc62 f781795994a64b2ba865228a8ceae943 ea7bc1e64616447f8dfdc00f96f465f2 - default default] 10.46.43.130,172.17.1.135,::1 "GET /v2.0/subnets?tags-any=openshiftClusterID%3Dostest-fp9fx HTTP/1.1" status: 200 len: 884 time: 2.2028453 ./sosreport-controller-2-2023-07-26-ibynqrd/var/log/containers/neutron/server.log.5.gz:2023-07-26 09:13:05.646 22 INFO neutron.wsgi [req-6f934d77-c907-478d-bd18-7fa19643d81d ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets?name=ostest-pgmsp-nodes&tags=openshiftClusterID%3Dostest-pgmsp HTTP/1.1" status: 200 len: 884 time: 1.0180192 ./sosreport-controller-0-2023-07-26-oqznozp/var/log/containers/neutron/server.log.4.gz:2023-07-26 09:13:05.638 17 INFO neutron.wsgi [req-4a42f6cf-c921-43e1-8db8-420dd82026d3 ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets HTTP/1.1" status: 200 len: 35461 time: 1.0277364 ``` There's no slow request in the time specified in the original description for the bug (14:55:23.956), and there's not a single case of a request taking more than 3 seconds.
The list of requests for subnets at the timestamp listed: ``` ihrachys@ihrachys-mac sos-report-kuryr-slow-subnet-fetch % find . -name 'server.log*' -exec zgrep -H 14:55:23 {} \; | grep subnet ./sosreport-controller-1-2023-07-26-wxyppvr/var/log/containers/neutron/server.log.1:2023-07-26 14:55:23.443 22 INFO neutron.wsgi [req-75c786fd-bd9a-4b8e-85a7-a1a21089b2c1 ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets?network_id=efac5648-06a5-4ef6-8932-e9d81f7b1f51¬-tags=openshiftClusterID%3Dostest-pgmsp HTTP/1.1" status: 200 len: 212 time: 0.0325458 ./sosreport-controller-2-2023-07-26-ibynqrd/var/log/containers/neutron/server.log.1:2023-07-26 14:55:23.177 17 INFO neutron.wsgi [req-4ea0526e-8519-46ee-aded-73513ec5443f ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets?network_id=ec1f4833-6781-4e2c-b2ca-6b38b6188b99¬-tags=openshiftClusterID%3Dostest-pgmsp HTTP/1.1" status: 200 len: 212 time: 0.0327132 ./sosreport-controller-2-2023-07-26-ibynqrd/var/log/containers/neutron/server.log.1:2023-07-26 14:55:23.982 18 INFO neutron.wsgi [req-2afb71f6-a747-47e1-80e0-c502f7eea6fa ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets/bf15ca6b-9a3c-4a02-a849-20a586bd04a5 HTTP/1.1" status: 200 len: 934 time: 0.0683410 ./sosreport-controller-0-2023-07-26-oqznozp/var/log/containers/neutron/server.log.1:2023-07-26 14:55:23.650 19 INFO neutron.wsgi [req-76c6cb1a-6c35-4bfc-bc0a-1860b0be8d67 ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets?network_id=f8b6305e-9996-4fa0-ad09-bcdc56d526ba¬-tags=openshiftClusterID%3Dostest-pgmsp HTTP/1.1" status: 200 len: 212 time: 0.0383773 ``` All were processed near instantly.
These are subnet requests at the moment when kuryr moves to fetching network: ``` ./sosreport-controller-1-2023-07-26-wxyppvr/var/log/containers/neutron/server.log.1:2023-07-26 14:57:32.576 21 INFO neutron.wsgi [req-f81a7bc0-ab72-4d95-9eba-673c4abf2498 ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets/bf15ca6b-9a3c-4a02-a849-20a586bd04a5 HTTP/1.1" status: 200 len: 934 time: 0.0905654 ./sosreport-controller-1-2023-07-26-wxyppvr/var/log/containers/neutron/server.log.1:2023-07-26 14:57:32.961 17 INFO neutron.wsgi [req-333ab594-882e-4718-bad1-13033e47538d ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets?tenant_id=f9a91d8507154559b2d2f8a354159323 HTTP/1.1" status: 200 len: 34877 time: 0.3402753 ``` Also processed immediately.
get_subnet call in openstacksdk (that is supposedly taking 2 minutes) fetches the subnet by its ID, so the candidates for the requests are: ``` ./sosreport-controller-2-2023-07-26-ibynqrd/var/log/containers/neutron/server.log.1:2023-07-26 14:55:23.982 18 INFO neutron.wsgi [req-2afb71f6-a747-47e1-80e0-c502f7eea6fa ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets/bf15ca6b-9a3c-4a02-a849-20a586bd04a5 HTTP/1.1" status: 200 len: 934 time: 0.0683410 ./sosreport-controller-1-2023-07-26-wxyppvr/var/log/containers/neutron/server.log.1:2023-07-26 14:57:32.576 21 INFO neutron.wsgi [req-f81a7bc0-ab72-4d95-9eba-673c4abf2498 ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets/bf15ca6b-9a3c-4a02-a849-20a586bd04a5 HTTP/1.1" status: 200 len: 934 time: 0.0905654 ``` Note that both fetch the same subnet. I wonder if perhaps client fails to receive a reply, so it repeated after 2 minutes?
From Neutron API perspective, it looks like we receives the request and immediately processed. You haven't posted request-ID you used to fetch the subnet, so it's inconclusive whether what I see in the server log is what you sent. Please add logging for HTTP headers, including request-id. I think you can configure your client accordingly, as per https://docs.openstack.org/openstacksdk/latest/user/guides/logging.html#simple-usage setting http_debug=True. For now, I consider it is not a bug in neutron but in client.
Moving component to kuryr to allow Shift-On-Stack team continue investigation of their client behavior.