Bug 1844447 - increasing the thread limits within a pod - PodPidsLimit [NEEDINFO]
Summary: increasing the thread limits within a pod - PodPidsLimit
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: 4.7.0
Assignee: Harshal Patil
QA Contact: MinLi
URL:
Whiteboard:
Depends On:
Blocks: 2039187
TreeView+ depends on / blocked
 
Reported: 2020-06-05 12:38 UTC by mchebbi@redhat.com
Modified: 2022-01-11 07:35 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2039187 (view as bug list)
Environment:
Last Closed: 2021-11-17 21:27:56 UTC
Target Upstream Version:
harpatil: needinfo? (shujadha)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5366631 0 None None None 2020-09-02 21:44:07 UTC
Red Hat Product Errata RHBA-2021:4572 0 None None None 2021-11-17 21:28:18 UTC

Description mchebbi@redhat.com 2020-06-05 12:38:48 UTC
Description of problem:
the customer wants to run client applications that run with more than 1024 threads. he follows the documentation describing a similar process about configuring the maximum number of Pods per Node [1].

He adds a CRD "set-max-pids", sepecify podPidsLimit: 4096 and apply it but he didn't get 4096 threads per pod as specified but only 1024 thread as maximum.

please find below all relevant informations :

[mqperf@mqperfx1 ~]$ oc get kubeletconfig set-max-pids -o yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  creationTimestamp: "2020-05-07T11:33:56Z"
  finalizers:
  - 99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-kubelet
  - 99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-kubelet
  - 99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-kubelet
  - 99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-kubelet
  - 99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-kubelet
  - 99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-kubelet
  - 99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-kubelet
  - 99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-kubelet
  - 99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-kubelet
  - 99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-kubelet
  generation: 4
  name: set-max-pids
  resourceVersion: "124402074"
  selfLink: /apis/machineconfiguration.openshift.io/v1/kubeletconfigs/set-max-pids
  uid: a2f22cad-9056-11ea-b677-000af7e9cc10
spec:
  kubeletConfig:
    maxPods: 506
    podPidsLimit: 4096
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: pids-4K
status:
  conditions:
  - lastTransitionTime: "2020-05-07T11:33:56Z"
    message: Success
    status: "True"
    type: Success
  - lastTransitionTime: "2020-05-12T16:03:11Z"
    message: Success
    status: "True"
    type: Success
  - lastTransitionTime: "2020-05-14T11:27:05Z"
    message: Success
    status: "True"
    type: Success
  - lastTransitionTime: "2020-05-19T10:15:09Z"
    message: Success
    status: "True"
    type: Success
  - lastTransitionTime: "2020-05-19T10:18:37Z"
    message: Success
    status: "True"
    type: Success
  - lastTransitionTime: "2020-05-19T10:44:25Z"
    message: Success
    status: "True"
    type: Success
  - lastTransitionTime: "2020-05-20T07:56:30Z"
    message: Success
    status: "True"
    type: Success
  - lastTransitionTime: "2020-05-22T16:41:00Z"
    message: Success
    status: "True"
    type: Success
  - lastTransitionTime: "2020-05-22T16:58:40Z"
    message: Success
    status: "True"
    type: Success
  - lastTransitionTime: "2020-05-22T17:06:38Z"
    message: Success
    status: "True"
    type: Success


[core@worker1 ~]$ ulimit -u
384096


[mqperf@mqperfx1 ~]$ oc describe machineconfigpool worker
Name:         worker
Namespace:
Labels:       custom-kubelet=pids-4K
              machineconfiguration.openshift.io/mco-built-in=
Annotations:  <none>
API Version:  machineconfiguration.openshift.io/v1
Kind:         MachineConfigPool
Metadata:
  Creation Timestamp:  2020-01-17T12:16:30Z
  Generation:          16
  Resource Version:    124442909
  Self Link:           /apis/machineconfiguration.openshift.io/v1/machineconfigpools/worker
  UID:                 3177eb7a-3923-11ea-bec6-000af7e9b1e0
Spec:
  Configuration:
    Name:  rendered-worker-dce6ce5440633143fe9d129e893134f3
    Source:
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         00-worker
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-worker-container-runtime
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-worker-kubelet
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-kubelet
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-registries
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-worker-ssh
  Machine Config Selector:
    Match Labels:
      machineconfiguration.openshift.io/role:  worker
  Node Selector:
    Match Labels:
      node-role.kubernetes.io/worker:
  Paused:                              false
Status:
  Conditions:
    Last Transition Time:  2020-01-17T12:16:46Z
    Message:
    Reason:
    Status:                False
    Type:                  RenderDegraded
    Last Transition Time:  2020-05-13T15:07:29Z
    Message:
    Reason:
    Status:                False
    Type:                  NodeDegraded
    Last Transition Time:  2020-05-13T15:07:29Z
    Message:
    Reason:
    Status:                False
    Type:                  Degraded
    Last Transition Time:  2020-05-22T18:00:22Z
    Message:               All nodes are updated with rendered-worker-dce6ce5440633143fe9d129e893134f3
    Reason:
    Status:                True
    Type:                  Updated
    Last Transition Time:  2020-05-22T18:00:22Z
    Message:
    Reason:
    Status:                False
    Type:                  Updating
  Configuration:
    Name:  rendered-worker-dce6ce5440633143fe9d129e893134f3
    Source:
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   00-worker
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   01-worker-container-runtime
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   01-worker-kubelet
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-kubelet
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-worker-3177eb7a-3923-11ea-bec6-000af7e9b1e0-registries
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-worker-ssh
  Degraded Machine Count:     0
  Machine Count:              9
  Observed Generation:        16
  Ready Machine Count:        9
  Unavailable Machine Count:  0
  Updated Machine Count:      9
Events:                       <none>


[mqperf@mqperfx1 ~]$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED
master   rendered-master-1b805237bbad455f9915ca8a95d36d69   True      False      False
worker   rendered-worker-dce6ce5440633143fe9d129e893134f3   True      False      False


[1] - https://docs.openshift.com/container-platform/4.2/nodes/nodes/nodes-nodes-managing-max-pods.html

Comment 1 Ryan Phillips 2020-06-05 18:20:01 UTC
Linux does not limit the number of threads per process. It appears everything is working ok on this BZ.

Comment 2 Ryan Phillips 2020-06-15 18:00:45 UTC
Please reopen if a problem persists. 

Note, the duplicate finalizers were fixed in 4.3 and above. 4.2 is only getting critical (ie: security) bug fixes.

Comment 3 Robert Bost 2020-08-27 23:29:57 UTC
I'm reopening this with what I think the original reported meant: 

The pid limits do not seem to be picked up by container runtime. I have a reproducer here:

1) Create new OCP 4.5 cluster
2) Configure podPidsLimit to something high
3) Create a Pod that generates a ton of processes and it should fail at 1024 no matter podPidsLimit configuration

Demonstration of issue using https://github.com/bostrt/spew-procs

$ oc debug node/ip-10-0-221-49.us-west-2.compute.internal -- cat /host/etc/kubernetes/kubelet.conf | jq '.podPidsLimit, .featureGates'
Starting pod/ip-10-0-221-49us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Removing debug pod ...

2048
{
  "LegacyNodeRoleBehavior": false,
  "NodeDisruptionExclusion": true,
  "RotateKubeletServerCertificate": true,
  "SCTPSupport": true,
  "ServiceNodeExclusion": true,
  "SupportPodPidsLimit": true
}

$ oc get pod spew-procs-2-tlm2d -o wide 
NAME                 READY   STATUS             RESTARTS   AGE     IP           NODE                                        NOMINATED NODE   READINESS GATES
spew-procs-2-tlm2d   0/1     CrashLoopBackOff   5          6m26s   10.128.2.6   ip-10-0-221-49.us-west-2.compute.internal   <none>           <none>

$ oc logs spew-procs-2-tlm2d
yup  1
yup  2
yup  3
yup  4
...
...
...
yup  1021
yup  1022
yup  1023
./run.sh: fork: retry: Resource temporarily unavailable
./run.sh: fork: retry: Resource temporarily unavailable
./run.sh: fork: retry: Resource temporarily unavailable
./run.sh: fork: retry: Resource temporarily unavailable
./run.sh: fork: Resource temporarily unavailable

Comment 5 Peter Hunt 2020-08-31 13:54:29 UTC
interesting, I believe this config variable is clashing with the one in the ContainerRuntimeConfig/crio.conf pids_limit. 1024 is the default in cri-o. perhaps setting the kubelet config variable should also configure it in cri-o? or maybe, one of the knobs could be dropped

Comment 18 Mridul Markandey 2021-10-13 11:20:46 UTC
Hello Team,

I have a customer, who wants to know what is now the official and supported way to increase the pid limit per container from 1024  to a higher value in OpenShift 4.x ?

As written in our OpenShift official documentation[1], the feature to increase the Pod PID limit is a part of "TechPreviewNoUpgrade" feature set. And in the documentation, it is written that enabling the TechPreviewNoUpgrade feature sets cannot be undone and prevents upgrades. These feature sets are not recommended on production clusters. S

[1] https://docs.openshift.com/container-platform/4.7/nodes/clusters/nodes-cluster-enabling-features.html#nodes-cluster-enabling-features-about_nodes-cluster-enabling

The customer has also shared the KCS[2], which states the above requirement. So, is the workaround mentioned in the KCS supported by Red Hat?

[2] https://access.redhat.com/solutions/5366631

As the customer is facing a huge business impact, and the issue is urgent, a proactive response will be highly appreciated.

Best Regards,
Mridul Markandey

Comment 20 Harshal Patil 2021-10-13 14:03:00 UTC
(In reply to Mridul Markandey from comment #18)

> The customer has also shared the KCS[2], which states the above requirement.
> So, is the workaround mentioned in the KCS supported by Red Hat?
> 
> [2] https://access.redhat.com/solutions/5366631
> 
> As the customer is facing a huge business impact, and the issue is urgent, a
> proactive response will be highly appreciated.
> 
> Best Regards,
> Mridul Markandey

KCS articles are supported solutions

Comment 21 Mridul Markandey 2021-10-13 14:11:58 UTC
Hello Team,

Thank you for your response.

Can you please provide some information like in which 4.7 minor release is this bug is expected to release? Also, after that, are we officially supporting the "TechPreviewNoUpgrade" feature set as mentioned in the documentation[1]? 

[1] https://docs.openshift.com/container-platform/4.7/nodes/clusters/nodes-cluster-enabling-features.html#nodes-cluster-enabling-features-about_nodes-cluster-enabling

Even though the KCS article shared has a workaround but in the private comments, it is mentioned that the solution is not supported yet. So, customers will only apply the workaround if that is officially supported by Red Hat. So, this is the ask.

Regards,
Mridul Markandey

Comment 28 MinLi 2021-11-03 10:43:46 UTC
verified by method in https://access.redhat.com/solutions/5366631

$ oc get clusterversion 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-10-29-212209   True        False         4h55m   Cluster version is 4.7.0-0.nightly-2021-10-29-212209

sh-4.4# chroot /host 
sh-4.4# cat /etc/kubernetes/kubelet.conf 
{
  ...
  "podPidsLimit": 4096,

sh-4.4# crio config | grep pid
INFO[0000] Starting CRI-O, version: 1.20.5-7.rhaos4.7.gite80c8db.el8, git: () 
INFO Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_NET_BIND_SERVICE, CAP_KILL 
pids_limit = -1

$ oc create deployment --image=quay.io/harpatil/spew-process:latest spew-process
deployment.apps/spew-process created

$ oc logs spew-process-868bc96f7f-szpkm
....
yup  4077
yup  4078
yup  4079
yup  4080
yup  4081
yup  4082
yup  4083
yup  4084
./run.sh: fork: retry: Resource temporarily unavailable
./run.sh: fork: retry: Resource temporarily unavailable
./run.sh: fork: retry: Resource temporarily unavailable
./run.sh: fork: retry: Resource temporarily unavailable
./run.sh: fork: Resource temporarily unavailable

Comment 31 errata-xmlrpc 2021-11-17 21:27:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.37 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4572

Comment 32 David Johnston 2021-11-22 15:59:42 UTC
Hello,

Where is this fix in 4.8.z and 4.9.z?

Comment 33 Peter Hunt 2021-11-22 18:38:59 UTC
the comment described in https://bugzilla.redhat.com/show_bug.cgi?id=1844447#c8 describes the correct way to work around this situation

Comment 34 David Johnston 2021-11-22 19:16:08 UTC
(In reply to Peter Hunt from comment #33)
> the comment described in
> https://bugzilla.redhat.com/show_bug.cgi?id=1844447#c8 describes the correct
> way to work around this situation

Was that a reply to my comment?

I don't understand how that answers my request. Please explain.


Note You need to log in before you can comment on or make changes to this bug.