Bug 2226806
| Summary: | Getting a subnet takes a long time | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Itzik Brown <itbrown> |
| Component: | openstack-kuryr-kubernetes | Assignee: | Antoni Segura Puimedon <asegurap> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | GenadiC <gcheresh> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 17.1 (Wallaby) | CC: | asegurap, chrisw, ihrachys, mdulko, scohen |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-07-31 07:26:49 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Itzik Brown
2023-07-26 15:28:01 UTC
This is the list of all ReST GET requests for subnets that were handled by neutron-server that took more than 1 second:
```
ihrachys@ihrachys-mac sos-report-kuryr-slow-subnet-fetch % find . -name 'server.log*' -exec zgrep -H 'INFO neutron.wsgi' {} \; | grep 'time: [1-9]' | grep GET | grep subnets
./sosreport-controller-1-2023-07-26-wxyppvr/var/log/containers/neutron/server.log.8.gz:2023-07-26 06:46:53.660 22 INFO neutron.wsgi [req-328a80da-a24c-45b3-a156-438f224cdc62 f781795994a64b2ba865228a8ceae943 ea7bc1e64616447f8dfdc00f96f465f2 - default default] 10.46.43.130,172.17.1.135,::1 "GET /v2.0/subnets?tags-any=openshiftClusterID%3Dostest-fp9fx HTTP/1.1" status: 200 len: 884 time: 2.2028453
./sosreport-controller-2-2023-07-26-ibynqrd/var/log/containers/neutron/server.log.5.gz:2023-07-26 09:13:05.646 22 INFO neutron.wsgi [req-6f934d77-c907-478d-bd18-7fa19643d81d ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets?name=ostest-pgmsp-nodes&tags=openshiftClusterID%3Dostest-pgmsp HTTP/1.1" status: 200 len: 884 time: 1.0180192
./sosreport-controller-0-2023-07-26-oqznozp/var/log/containers/neutron/server.log.4.gz:2023-07-26 09:13:05.638 17 INFO neutron.wsgi [req-4a42f6cf-c921-43e1-8db8-420dd82026d3 ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets HTTP/1.1" status: 200 len: 35461 time: 1.0277364
```
There's no slow request in the time specified in the original description for the bug (14:55:23.956), and there's not a single case of a request taking more than 3 seconds.
The list of requests for subnets at the timestamp listed:
```
ihrachys@ihrachys-mac sos-report-kuryr-slow-subnet-fetch % find . -name 'server.log*' -exec zgrep -H 14:55:23 {} \; | grep subnet
./sosreport-controller-1-2023-07-26-wxyppvr/var/log/containers/neutron/server.log.1:2023-07-26 14:55:23.443 22 INFO neutron.wsgi [req-75c786fd-bd9a-4b8e-85a7-a1a21089b2c1 ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets?network_id=efac5648-06a5-4ef6-8932-e9d81f7b1f51¬-tags=openshiftClusterID%3Dostest-pgmsp HTTP/1.1" status: 200 len: 212 time: 0.0325458
./sosreport-controller-2-2023-07-26-ibynqrd/var/log/containers/neutron/server.log.1:2023-07-26 14:55:23.177 17 INFO neutron.wsgi [req-4ea0526e-8519-46ee-aded-73513ec5443f ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets?network_id=ec1f4833-6781-4e2c-b2ca-6b38b6188b99¬-tags=openshiftClusterID%3Dostest-pgmsp HTTP/1.1" status: 200 len: 212 time: 0.0327132
./sosreport-controller-2-2023-07-26-ibynqrd/var/log/containers/neutron/server.log.1:2023-07-26 14:55:23.982 18 INFO neutron.wsgi [req-2afb71f6-a747-47e1-80e0-c502f7eea6fa ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets/bf15ca6b-9a3c-4a02-a849-20a586bd04a5 HTTP/1.1" status: 200 len: 934 time: 0.0683410
./sosreport-controller-0-2023-07-26-oqznozp/var/log/containers/neutron/server.log.1:2023-07-26 14:55:23.650 19 INFO neutron.wsgi [req-76c6cb1a-6c35-4bfc-bc0a-1860b0be8d67 ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets?network_id=f8b6305e-9996-4fa0-ad09-bcdc56d526ba¬-tags=openshiftClusterID%3Dostest-pgmsp HTTP/1.1" status: 200 len: 212 time: 0.0383773
```
All were processed near instantly.
These are subnet requests at the moment when kuryr moves to fetching network: ``` ./sosreport-controller-1-2023-07-26-wxyppvr/var/log/containers/neutron/server.log.1:2023-07-26 14:57:32.576 21 INFO neutron.wsgi [req-f81a7bc0-ab72-4d95-9eba-673c4abf2498 ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets/bf15ca6b-9a3c-4a02-a849-20a586bd04a5 HTTP/1.1" status: 200 len: 934 time: 0.0905654 ./sosreport-controller-1-2023-07-26-wxyppvr/var/log/containers/neutron/server.log.1:2023-07-26 14:57:32.961 17 INFO neutron.wsgi [req-333ab594-882e-4718-bad1-13033e47538d ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets?tenant_id=f9a91d8507154559b2d2f8a354159323 HTTP/1.1" status: 200 len: 34877 time: 0.3402753 ``` Also processed immediately. get_subnet call in openstacksdk (that is supposedly taking 2 minutes) fetches the subnet by its ID, so the candidates for the requests are: ``` ./sosreport-controller-2-2023-07-26-ibynqrd/var/log/containers/neutron/server.log.1:2023-07-26 14:55:23.982 18 INFO neutron.wsgi [req-2afb71f6-a747-47e1-80e0-c502f7eea6fa ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets/bf15ca6b-9a3c-4a02-a849-20a586bd04a5 HTTP/1.1" status: 200 len: 934 time: 0.0683410 ./sosreport-controller-1-2023-07-26-wxyppvr/var/log/containers/neutron/server.log.1:2023-07-26 14:57:32.576 21 INFO neutron.wsgi [req-f81a7bc0-ab72-4d95-9eba-673c4abf2498 ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets/bf15ca6b-9a3c-4a02-a849-20a586bd04a5 HTTP/1.1" status: 200 len: 934 time: 0.0905654 ``` Note that both fetch the same subnet. I wonder if perhaps client fails to receive a reply, so it repeated after 2 minutes? From Neutron API perspective, it looks like we receives the request and immediately processed. You haven't posted request-ID you used to fetch the subnet, so it's inconclusive whether what I see in the server log is what you sent. Please add logging for HTTP headers, including request-id. I think you can configure your client accordingly, as per https://docs.openstack.org/openstacksdk/latest/user/guides/logging.html#simple-usage setting http_debug=True. For now, I consider it is not a bug in neutron but in client. Moving component to kuryr to allow Shift-On-Stack team continue investigation of their client behavior. |