Description of problem: the customer wants to run client applications that run with more than 1024 threads. he follows the documentation describing a similar process about configuring the maximum number of Pods per Node [1]. He adds a CRD "set-max-pids", sepecify podPidsLimit: 4096 and apply it but he didn't get 4096 threads per pod as specified but only 1024 thread as maximum. please find below all relevant informations : [mqperf@mqperfx1 ~]$ oc get kubeletconfig set-max-pids -o yaml apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: creationTimestamp: "2020-05-07T11:33:56Z" finalizers: - 99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-kubelet - 99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-kubelet - 99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-kubelet - 99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-kubelet - 99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-kubelet - 99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-kubelet - 99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-kubelet - 99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-kubelet - 99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-kubelet - 99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-kubelet generation: 4 name: set-max-pids resourceVersion: "124402074" selfLink: /apis/machineconfiguration.openshift.io/v1/kubeletconfigs/set-max-pids uid: a2f22cad-9056-11ea-b677-000af7e9cc10 spec: kubeletConfig: maxPods: 506 podPidsLimit: 4096 machineConfigPoolSelector: matchLabels: custom-kubelet: pids-4K status: conditions: - lastTransitionTime: "2020-05-07T11:33:56Z" message: Success status: "True" type: Success - lastTransitionTime: "2020-05-12T16:03:11Z" message: Success status: "True" type: Success - lastTransitionTime: "2020-05-14T11:27:05Z" message: Success status: "True" type: Success - lastTransitionTime: "2020-05-19T10:15:09Z" message: Success status: "True" type: Success - lastTransitionTime: "2020-05-19T10:18:37Z" message: Success status: "True" type: Success - lastTransitionTime: "2020-05-19T10:44:25Z" message: Success status: "True" type: Success - lastTransitionTime: "2020-05-20T07:56:30Z" message: Success status: "True" type: Success - lastTransitionTime: "2020-05-22T16:41:00Z" message: Success status: "True" type: Success - lastTransitionTime: "2020-05-22T16:58:40Z" message: Success status: "True" type: Success - lastTransitionTime: "2020-05-22T17:06:38Z" message: Success status: "True" type: Success [core@worker1 ~]$ ulimit -u 384096 [mqperf@mqperfx1 ~]$ oc describe machineconfigpool worker Name: worker Namespace: Labels: custom-kubelet=pids-4K machineconfiguration.openshift.io/mco-built-in= Annotations: <none> API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfigPool Metadata: Creation Timestamp: 2020-01-17T12:16:30Z Generation: 16 Resource Version: 124442909 Self Link: /apis/machineconfiguration.openshift.io/v1/machineconfigpools/worker UID: 3177eb7a-3923-11ea-bec6-000af7e9b1e0 Spec: Configuration: Name: rendered-worker-dce6ce5440633143fe9d129e893134f3 Source: API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 00-worker API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 01-worker-container-runtime API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 01-worker-kubelet API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-kubelet API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-registries API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-ssh Machine Config Selector: Match Labels: machineconfiguration.openshift.io/role: worker Node Selector: Match Labels: node-role.kubernetes.io/worker: Paused: false Status: Conditions: Last Transition Time: 2020-01-17T12:16:46Z Message: Reason: Status: False Type: RenderDegraded Last Transition Time: 2020-05-13T15:07:29Z Message: Reason: Status: False Type: NodeDegraded Last Transition Time: 2020-05-13T15:07:29Z Message: Reason: Status: False Type: Degraded Last Transition Time: 2020-05-22T18:00:22Z Message: All nodes are updated with rendered-worker-dce6ce5440633143fe9d129e893134f3 Reason: Status: True Type: Updated Last Transition Time: 2020-05-22T18:00:22Z Message: Reason: Status: False Type: Updating Configuration: Name: rendered-worker-dce6ce5440633143fe9d129e893134f3 Source: API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 00-worker API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 01-worker-container-runtime API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 01-worker-kubelet API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-kubelet API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-registries API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-ssh Degraded Machine Count: 0 Machine Count: 9 Observed Generation: 16 Ready Machine Count: 9 Unavailable Machine Count: 0 Updated Machine Count: 9 Events: <none> [mqperf@mqperfx1 ~]$ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED master rendered-master-1b805237bbad455f9915ca8a95d36d69 True False False worker rendered-worker-dce6ce5440633143fe9d129e893134f3 True False False [1] - https://docs.openshift.com/container-platform/4.2/nodes/nodes/nodes-nodes-managing-max-pods.html
Linux does not limit the number of threads per process. It appears everything is working ok on this BZ.
Please reopen if a problem persists. Note, the duplicate finalizers were fixed in 4.3 and above. 4.2 is only getting critical (ie: security) bug fixes.
I'm reopening this with what I think the original reported meant: The pid limits do not seem to be picked up by container runtime. I have a reproducer here: 1) Create new OCP 4.5 cluster 2) Configure podPidsLimit to something high 3) Create a Pod that generates a ton of processes and it should fail at 1024 no matter podPidsLimit configuration Demonstration of issue using https://github.com/bostrt/spew-procs $ oc debug node/ip-10-0-221-49.us-west-2.compute.internal -- cat /host/etc/kubernetes/kubelet.conf | jq '.podPidsLimit, .featureGates' Starting pod/ip-10-0-221-49us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` Removing debug pod ... 2048 { "LegacyNodeRoleBehavior": false, "NodeDisruptionExclusion": true, "RotateKubeletServerCertificate": true, "SCTPSupport": true, "ServiceNodeExclusion": true, "SupportPodPidsLimit": true } $ oc get pod spew-procs-2-tlm2d -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES spew-procs-2-tlm2d 0/1 CrashLoopBackOff 5 6m26s 10.128.2.6 ip-10-0-221-49.us-west-2.compute.internal <none> <none> $ oc logs spew-procs-2-tlm2d yup 1 yup 2 yup 3 yup 4 ... ... ... yup 1021 yup 1022 yup 1023 ./run.sh: fork: retry: Resource temporarily unavailable ./run.sh: fork: retry: Resource temporarily unavailable ./run.sh: fork: retry: Resource temporarily unavailable ./run.sh: fork: retry: Resource temporarily unavailable ./run.sh: fork: Resource temporarily unavailable
interesting, I believe this config variable is clashing with the one in the ContainerRuntimeConfig/crio.conf pids_limit. 1024 is the default in cri-o. perhaps setting the kubelet config variable should also configure it in cri-o? or maybe, one of the knobs could be dropped
Hello Team, I have a customer, who wants to know what is now the official and supported way to increase the pid limit per container from 1024 to a higher value in OpenShift 4.x ? As written in our OpenShift official documentation[1], the feature to increase the Pod PID limit is a part of "TechPreviewNoUpgrade" feature set. And in the documentation, it is written that enabling the TechPreviewNoUpgrade feature sets cannot be undone and prevents upgrades. These feature sets are not recommended on production clusters. S [1] https://docs.openshift.com/container-platform/4.7/nodes/clusters/nodes-cluster-enabling-features.html#nodes-cluster-enabling-features-about_nodes-cluster-enabling The customer has also shared the KCS[2], which states the above requirement. So, is the workaround mentioned in the KCS supported by Red Hat? [2] https://access.redhat.com/solutions/5366631 As the customer is facing a huge business impact, and the issue is urgent, a proactive response will be highly appreciated. Best Regards, Mridul Markandey
(In reply to Mridul Markandey from comment #18) > The customer has also shared the KCS[2], which states the above requirement. > So, is the workaround mentioned in the KCS supported by Red Hat? > > [2] https://access.redhat.com/solutions/5366631 > > As the customer is facing a huge business impact, and the issue is urgent, a > proactive response will be highly appreciated. > > Best Regards, > Mridul Markandey KCS articles are supported solutions
Hello Team, Thank you for your response. Can you please provide some information like in which 4.7 minor release is this bug is expected to release? Also, after that, are we officially supporting the "TechPreviewNoUpgrade" feature set as mentioned in the documentation[1]? [1] https://docs.openshift.com/container-platform/4.7/nodes/clusters/nodes-cluster-enabling-features.html#nodes-cluster-enabling-features-about_nodes-cluster-enabling Even though the KCS article shared has a workaround but in the private comments, it is mentioned that the solution is not supported yet. So, customers will only apply the workaround if that is officially supported by Red Hat. So, this is the ask. Regards, Mridul Markandey
verified by method in https://access.redhat.com/solutions/5366631 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2021-10-29-212209 True False 4h55m Cluster version is 4.7.0-0.nightly-2021-10-29-212209 sh-4.4# chroot /host sh-4.4# cat /etc/kubernetes/kubelet.conf { ... "podPidsLimit": 4096, sh-4.4# crio config | grep pid INFO[0000] Starting CRI-O, version: 1.20.5-7.rhaos4.7.gite80c8db.el8, git: () INFO Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_NET_BIND_SERVICE, CAP_KILL pids_limit = -1 $ oc create deployment --image=quay.io/harpatil/spew-process:latest spew-process deployment.apps/spew-process created $ oc logs spew-process-868bc96f7f-szpkm .... yup 4077 yup 4078 yup 4079 yup 4080 yup 4081 yup 4082 yup 4083 yup 4084 ./run.sh: fork: retry: Resource temporarily unavailable ./run.sh: fork: retry: Resource temporarily unavailable ./run.sh: fork: retry: Resource temporarily unavailable ./run.sh: fork: retry: Resource temporarily unavailable ./run.sh: fork: Resource temporarily unavailable
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.7.37 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4572
Hello, Where is this fix in 4.8.z and 4.9.z?
the comment described in https://bugzilla.redhat.com/show_bug.cgi?id=1844447#c8 describes the correct way to work around this situation
(In reply to Peter Hunt from comment #33) > the comment described in > https://bugzilla.redhat.com/show_bug.cgi?id=1844447#c8 describes the correct > way to work around this situation Was that a reply to my comment? I don't understand how that answers my request. Please explain.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days