Bug 2226806 - Getting a subnet takes a long time
Summary: Getting a subnet takes a long time
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-kuryr-kubernetes
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Antoni Segura Puimedon
QA Contact: GenadiC
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-26 15:28 UTC by Itzik Brown
Modified: 2023-08-07 10:30 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-07-31 07:26:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-26925 0 None None None 2023-07-26 15:45:38 UTC

Description Itzik Brown 2023-07-26 15:28:01 UTC
Description of problem:
Running Openshift on Openstack with Kuryr we rely on Neutron.
Getting a subnet sometimes takes almost 2 minutes which have an impact for creating Openshift resources.

From Kuryr logs:
2023-07-26 14:55:23.955 1 ERROR kuryr_kubernetes.utils [-] get_subnet(): test-846c6ffb69-n6dbc
2023-07-26 14:55:23.956 1 ERROR kuryr_kubernetes.utils [-] Looking for subnet test-846c6ffb69-n6dbc
2023-07-26 14:57:32.582 1 ERROR kuryr_kubernetes.utils [-] Looking for network test-846c6ffb69-n6dbc
2023-07-26 14:57:32.762 1 ERROR kuryr_kubernetes.utils [-] Got network test-846c6ffb69-n6dbc

Kuryr Code:
https://github.com/dulek/kuryr-kubernetes/blob/ae47fd58522700c1541ac6c5155c8bf36bf71d39/kuryr_kubernetes/utils.py#L300-L318

Version-Release number of selected component (if applicable):
OSP RHOS-17.1-RHEL-9-20230719.n.1
OCP 4.13.0-0.nightly-2023-07-25-104746


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Ihar Hrachyshka 2023-07-28 13:27:57 UTC
This is the list of all ReST GET requests for subnets that were handled by neutron-server that took more than 1 second:

```
ihrachys@ihrachys-mac sos-report-kuryr-slow-subnet-fetch % find . -name 'server.log*' -exec zgrep -H 'INFO neutron.wsgi' {} \; | grep 'time: [1-9]' | grep GET | grep subnets
./sosreport-controller-1-2023-07-26-wxyppvr/var/log/containers/neutron/server.log.8.gz:2023-07-26 06:46:53.660 22 INFO neutron.wsgi [req-328a80da-a24c-45b3-a156-438f224cdc62 f781795994a64b2ba865228a8ceae943 ea7bc1e64616447f8dfdc00f96f465f2 - default default] 10.46.43.130,172.17.1.135,::1 "GET /v2.0/subnets?tags-any=openshiftClusterID%3Dostest-fp9fx HTTP/1.1" status: 200  len: 884 time: 2.2028453
./sosreport-controller-2-2023-07-26-ibynqrd/var/log/containers/neutron/server.log.5.gz:2023-07-26 09:13:05.646 22 INFO neutron.wsgi [req-6f934d77-c907-478d-bd18-7fa19643d81d ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets?name=ostest-pgmsp-nodes&tags=openshiftClusterID%3Dostest-pgmsp HTTP/1.1" status: 200  len: 884 time: 1.0180192
./sosreport-controller-0-2023-07-26-oqznozp/var/log/containers/neutron/server.log.4.gz:2023-07-26 09:13:05.638 17 INFO neutron.wsgi [req-4a42f6cf-c921-43e1-8db8-420dd82026d3 ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets HTTP/1.1" status: 200  len: 35461 time: 1.0277364
```

There's no slow request in the time specified in the original description for the bug (14:55:23.956), and there's not a single case of a request taking more than 3 seconds.

Comment 3 Ihar Hrachyshka 2023-07-28 13:29:24 UTC
The list of requests for subnets at the timestamp listed:

```
ihrachys@ihrachys-mac sos-report-kuryr-slow-subnet-fetch % find . -name 'server.log*' -exec zgrep -H 14:55:23 {} \; | grep subnet
./sosreport-controller-1-2023-07-26-wxyppvr/var/log/containers/neutron/server.log.1:2023-07-26 14:55:23.443 22 INFO neutron.wsgi [req-75c786fd-bd9a-4b8e-85a7-a1a21089b2c1 ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets?network_id=efac5648-06a5-4ef6-8932-e9d81f7b1f51&not-tags=openshiftClusterID%3Dostest-pgmsp HTTP/1.1" status: 200  len: 212 time: 0.0325458
./sosreport-controller-2-2023-07-26-ibynqrd/var/log/containers/neutron/server.log.1:2023-07-26 14:55:23.177 17 INFO neutron.wsgi [req-4ea0526e-8519-46ee-aded-73513ec5443f ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets?network_id=ec1f4833-6781-4e2c-b2ca-6b38b6188b99&not-tags=openshiftClusterID%3Dostest-pgmsp HTTP/1.1" status: 200  len: 212 time: 0.0327132
./sosreport-controller-2-2023-07-26-ibynqrd/var/log/containers/neutron/server.log.1:2023-07-26 14:55:23.982 18 INFO neutron.wsgi [req-2afb71f6-a747-47e1-80e0-c502f7eea6fa ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets/bf15ca6b-9a3c-4a02-a849-20a586bd04a5 HTTP/1.1" status: 200  len: 934 time: 0.0683410
./sosreport-controller-0-2023-07-26-oqznozp/var/log/containers/neutron/server.log.1:2023-07-26 14:55:23.650 19 INFO neutron.wsgi [req-76c6cb1a-6c35-4bfc-bc0a-1860b0be8d67 ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets?network_id=f8b6305e-9996-4fa0-ad09-bcdc56d526ba&not-tags=openshiftClusterID%3Dostest-pgmsp HTTP/1.1" status: 200  len: 212 time: 0.0383773
```

All were processed near instantly.

Comment 4 Ihar Hrachyshka 2023-07-28 13:30:44 UTC
These are subnet requests at the moment when kuryr moves to fetching network:

```
./sosreport-controller-1-2023-07-26-wxyppvr/var/log/containers/neutron/server.log.1:2023-07-26 14:57:32.576 21 INFO neutron.wsgi [req-f81a7bc0-ab72-4d95-9eba-673c4abf2498 ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets/bf15ca6b-9a3c-4a02-a849-20a586bd04a5 HTTP/1.1" status: 200  len: 934 time: 0.0905654
./sosreport-controller-1-2023-07-26-wxyppvr/var/log/containers/neutron/server.log.1:2023-07-26 14:57:32.961 17 INFO neutron.wsgi [req-333ab594-882e-4718-bad1-13033e47538d ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets?tenant_id=f9a91d8507154559b2d2f8a354159323 HTTP/1.1" status: 200  len: 34877 time: 0.3402753
```

Also processed immediately.

Comment 5 Ihar Hrachyshka 2023-07-28 13:34:17 UTC
get_subnet call in openstacksdk (that is supposedly taking 2 minutes) fetches the subnet by its ID, so the candidates for the requests are:

```
./sosreport-controller-2-2023-07-26-ibynqrd/var/log/containers/neutron/server.log.1:2023-07-26 14:55:23.982 18 INFO neutron.wsgi [req-2afb71f6-a747-47e1-80e0-c502f7eea6fa ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets/bf15ca6b-9a3c-4a02-a849-20a586bd04a5 HTTP/1.1" status: 200  len: 934 time: 0.0683410
./sosreport-controller-1-2023-07-26-wxyppvr/var/log/containers/neutron/server.log.1:2023-07-26 14:57:32.576 21 INFO neutron.wsgi [req-f81a7bc0-ab72-4d95-9eba-673c4abf2498 ec6f813a24c148d1998e01e614d08727 f9a91d8507154559b2d2f8a354159323 - default default] 10.46.43.178,172.17.1.135,::1 "GET /v2.0/subnets/bf15ca6b-9a3c-4a02-a849-20a586bd04a5 HTTP/1.1" status: 200  len: 934 time: 0.0905654
```

Note that both fetch the same subnet. I wonder if perhaps client fails to receive a reply, so it repeated after 2 minutes?

Comment 6 Ihar Hrachyshka 2023-07-28 13:47:33 UTC
From Neutron API perspective, it looks like we receives the request and immediately processed. You haven't posted request-ID you used to fetch the subnet, so it's inconclusive whether what I see in the server log is what you sent. Please add logging for HTTP headers, including request-id.

I think you can configure your client accordingly, as per https://docs.openstack.org/openstacksdk/latest/user/guides/logging.html#simple-usage setting http_debug=True.

For now, I consider it is not a bug in neutron but in client.

Comment 7 Ihar Hrachyshka 2023-07-28 19:41:49 UTC
Moving component to kuryr to allow Shift-On-Stack team continue investigation of their client behavior.


Note You need to log in before you can comment on or make changes to this bug.