Bug 2050230

Summary:	Implement LIST call chunking in openshift-sdn
Product:	OpenShift Container Platform	Reporter:	Amit Kesarkar <akesarka>
Component:	Networking	Assignee:	Jaime Caamaño Ruiz <jcaamano>
Networking sub component:	openshift-sdn	QA Contact:	Qiujie Li <qili>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	medium	CC:	ancollin, gvaughn, jcaamano, jpradhan, qili, rravaiol, trozet, zzhao
Version:	4.8
Target Milestone:	---
Target Release:	4.13.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-05-17 22:46:32 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Amit Kesarkar 2022-02-03 13:58:07 UTC

Description of problem:
In a large cluster, sdn daemonset can DoS the kube-apiserver with un-paginated LIST calls on high count resources.


Version-Release number of selected component (if applicable):


How reproducible:
NA 

Steps to Reproduce:
NA

Actual results:
Kube API Server and Openshift API Server in one of the cluster keeps restarting, without  proper exception. The cluster is not accessible.

Expected results:
Kube API Server and Openshift API Server should be stable.

Additional info:

Comment 2 Andrew Collins 2022-02-14 17:56:04 UTC

Filling in more information on the bug description.
Please let me know if I can provide anything else here, I am happy to assist.

Description of problem:
In a large cluster, sdn daemonset can DoS the kube-apiserver with un-paginated LIST calls on high count resources.


Version-Release number of selected component (if applicable):
4.8.23


How reproducible:
100%

Steps to Reproduce:
1. Create greater than 500 pods, networkpolicies, services, endpoints, netnamespaces, or projects in a project.
2. Restart one or more SDN pods.


Actual results:
Verify through kube-apiserver audit events that LIST calls on these resources are executed without paging, and are thus querying >500 resources in a single LIST request.
Repeated significantly large list requests (>15k) can cause the kube-apiserver, openshift-apiserver, and etcd to consume extremely large amounts of memory, which can lead to other issues.

Expected results:
SDN should make fixed-size LIST calls using pagination as to limit the amount of memory balooning on the controlPlane.

Additional info:
These are all the counts of resources in a cluster environment that are being executed at high frequency when controlPlane becomes unstable, which only negatively contribute to the controlPlane instability.
```
$ oc get --raw '/api/v1/endpoints?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&resourceVersion=480361449' | jq -s ' .[].items[].metadata.name'  | wc
   10694   10694  251354

$ oc get --raw '/apis/network.openshift.io/v1/netnamespaces?resourceVersion=480360525' | jq -s ' .[].items[].metadata.name'  | wc
    4984    4984  106907

$ oc get --raw '/apis/network.openshift.io/v1/hostsubnets?resourceVersion=480361230' | jq -s ' .[].items[].metadata.name'  | wc
     256     256   10365

$ oc get --raw '/api/v1/pods?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&resourceVersion=480361631' | jq -s ' .[].items[].metadata.name'  | wc
   18134   18134  586113

$ oc get --raw '/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&resourceVersion=480361112' | jq -s ' .[].items[].metadata.name'  | wc
   11012   11012  260087

$ oc get --raw '/apis/networking.k8s.io/v1/networkpolicies?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&resourceVersion=480360489' | jq -s ' .[].items[].metadata.name'  | wc
   15438   15438  456408
```

Comment 4 zhaozhanqi 2022-03-09 11:24:22 UTC

@qili Hi, Qiujie Could you help take look if you can help verified this bug during your testing. thanks.

Comment 38 Andrew Collins 2022-04-27 19:36:08 UTC

> @ancollin is there any way to verify if this change has actually helped at all?

I took a look through the audit logs (Thank you Qiujie for uploading).

I do still see list calls with `limit=500&resourceVersion=0`, but they appear to be followed by a watch request.
These look like the consequence of ListWatch that we can't get around.

I looked for the request parameters that I originally filed on (i.e. "labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&resourceVersion=480361631") and I do see fewer occurrences of these.
Only two list calls: services and endpointslices.
Unfortunately both of these also have the page-negating `resourceVersion=0`, so I gather these are also the initial List of the ListWatch.

To restate the problem: The large un-paginated list calls only come after the API has already been unstable, and will only generate additional load on already-burdened API servers.
Even though these un-paginated calls only happen in certain conditions, I believe the times when these certain conditions are met are precisely when the paginated calls are needed most.

I will not know whether these make a difference in the customer environment until they are released in a Z stream, but based on these results, I do expect the unpaginated list calls to continue to be disruptive.

I do see pagination being used in two calls, evident by the subsequent "continue" calls (resources: namespaces and netnamespaces, time: 2022-04-26T09:18:03 ), so I believe you have done as much as you can from the sdn side, and the rest is chasing down client-go (as you said).

Thank you for your help to bring this to a close.

If there is some way to add this as data to support removing unpaginated ListWatch bugs, other to improve API stability on large clusters, or similar client-go bugs, I am all for that and please let me know how you best think to approach those maintainers.

Comment 56 Qiujie Li 2022-12-06 05:42:04 UTC

@jcaamano I didn't see where I can remove FailedQA.

Comment 60 errata-xmlrpc 2023-05-17 22:46:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.13.0 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:1326