This is a continued attempt to get a feature in via a bug. The current supported limit is 250 pods/node. However, Robert could you try to find an explanation? Don't spent too much time. I might actually be the watch-based secret manager back before we disabled it.
Created attachment 1571714 [details] oc_get_events_oc_describe_terminating_pods Added oc get events, oc describe pod output and oc logs for the terminating pods.
Can you also get me kubelet logs? I was hoping there would be enough in oc describe pod but there isn't.
Created attachment 1592947 [details] Pause test Run this: $ oc apply -f minipause.yaml $ for f in $(seq 0 649) ; do oc apply -f - <<< $(oc process minipause -p SERIAL=$f -p PODS=1 -p NAMESPACE=minipause-$f); done If you want to use just one namespace: $ for f in $(seq 0 649) ; do oc apply -f - <<< $(oc process minipause -p SERIAL=$f -p PODS=1); done
One kubelet related thought occurs to me. Kubelets have to set up watches on a per secret basis. Each pod comes with a secret, but in a single namespace it would all be the same secret, so you have one get/list/watch. With multiple namespaces, you have many. Maybe you're using up your rate limit on secrets and your client is ratelimiting the patch. You could test this by setting the pod.spec.automountServiceAccountToken . If you still experience the problem, then it's worth it for us to build an unratelimited `oc` to push patches through. If we see patches as slow on the server-side, then this can come to the apiserver team for a weird scaling problem.
Adding spec.automountServiceAccountToken: false to the pod definition allows things to work correctly.
Alright, this suggests that the kubelet is running out of QPS for its clients in these cases. Not completely sure what you want to do about that. You could increase QPS, you could change the test harness, you could consider the watch based secret/configmap refresh, you could do something else, but the API server appears to be functioning.
With the following kubelet config: apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: name: set-max-pods spec: machineConfigPoolSelector: matchLabels: custom-kubelet: large-pods kubeletConfig: maxPods: 750 KubeAPIBurst: 200 KubeAPIQPS: 100 I ran 2000 pods on 3 nodes without anything getting hung up (not using spec.automountServiceAccountToken: false).
Also worked fine at 50/25
At 30/15 this started happening around 1250 total pods (~400/node on average).
xref https://github.com/kubernetes/kubernetes/issues/80647 xref https://github.com/kubernetes/kubernetes/pull/80649
At 30/5 the issue started if anything even earlier than with the default 10/5; raising the burst alone does not appear to help.
I re-ran the initial 500 pods per node scale test with nodejs and mongodb quickstart apps and verified that the bottlenecks preventing us from getting to 500 pods per node were resolved by increasing kubeAPIQPS to 20 and kubeAPIBurst to 40 according to this kubelet config: apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: name: set-max-pods spec: machineConfigPoolSelector: matchLabels: custom-kubelet: large-pods kubeletConfig: maxPods: 500 kubeAPIBurst: 40 kubeAPIQPS: 20 I was able to deploy up to 483 pods per node (maxPods 500) before we ran out of IP addresses (hostPrefix was 23) on each of 2 worker node with instance type m5.24xlarge. We may want to update our docs for increasing maxPods and mention the need to also increase kubeAPIQPS and kubeAPIBurst values to achieve the desired pod density when working with large number of namespaces. It would also be helpful if we could mention specific messages in the logs or metrics we could track that could indicate when we are approaching our QPS limits.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062