Description of problem: During pod density test pods are finishing in Error state (29 from 4000 pods) Version-Release number of selected component (if applicable): 4.5.0-0.nightly-2020-06-26-215024 How reproducible: 100% install-config.yaml: --- apiVersion: v1 controlPlane: architecture: amd64 hyperthreading: Enabled name: master platform: aws: type: m5.xlarge replicas: 3 compute: - architecture: amd64 hyperthreading: Enabled name: worker platform: aws: type: m5.xlarge replicas: 20 metadata: name: skordas platform: aws: region: us-east-2 pullSecret: *** networking: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 serviceNetwork: - 172.30.0.0/16 machineNetwork: - cidr: 10.0.0.0/16 networkType: OpenShiftSDN publish: External fips: true baseDomain: qe.devcluster.openshift.com sshKey: *** Steps to Reproduce: 1. Scale up cluster to 20 working nodes. 2. Create 2000 projects (200 per node): - git clone https://github.com/openshift/svt.git - cd svt openshift_scalability - touch test.yaml - vim test.yaml ```yaml projects: - num: 2000 basename: svt- templates: - num: 1 file: ./content/deployment-config-1rep-pause-template.json ``` - cp $KUBECONFIG ~/.kube/config - python cluster-loader.py -f test.yaml -p 5 3. Delete projects: oc delete project -l purpose=test 4. Change number of projects to 4000: vim test.yaml 5. Create 4000 projects python cluster-loader.py -f test.yaml -p 5 Actual results: $ oc logs deploymentconfig0-1-deploy -n svt-3620 error: couldn't get deployment deploymentconfig0-1: replicationcontrollers "deploymentconfig0-1" is forbidden: User "system:serviceaccount:svt-3620:deployer" cannot get resource "replicationcontrollers" in API group "" in the namespace "svt-3620" $ oc get replicationcontrollers -n svt-3620 NAME DESIRED CURRENT READY AGE deploymentconfig0-1 0 0 0 54m $ oc describe replicationcontrollers deploymentconfig0-1 -n svt-3620 Name: deploymentconfig0-1 Namespace: svt-3620 Selector: deployment=deploymentconfig0-1,deploymentconfig=deploymentconfig0,name=replicationcontroller0 Labels: openshift.io/deployment-config.name=deploymentconfig0 template=deploymentConfigTemplate Annotations: kubectl.kubernetes.io/desired-replicas: 1 openshift.io/deployer-pod.completed-at: 2020-06-29 16:28:04 +0000 UTC openshift.io/deployer-pod.created-at: 2020-06-29 16:28:00 +0000 UTC openshift.io/deployer-pod.name: deploymentconfig0-1-deploy openshift.io/deployment-config.latest-version: 1 openshift.io/deployment-config.name: deploymentconfig0 openshift.io/deployment.phase: Failed openshift.io/deployment.replicas: 0 openshift.io/deployment.status-reason: config change openshift.io/encoded-deployment-config: {"kind":"DeploymentConfig","apiVersion":"apps.openshift.io/v1","metadata":{"name":"deploymentconfig0","namespace":"svt-3620","selfLink":"/... Replicas: 0 current / 0 desired Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed Pod Template: Labels: deployment=deploymentconfig0-1 deploymentconfig=deploymentconfig0 name=replicationcontroller0 Annotations: openshift.io/deployment-config.latest-version: 1 openshift.io/deployment-config.name: deploymentconfig0 openshift.io/deployment.name: deploymentconfig0-1 Containers: pause0: Image: gcr.io/google-containers/pause-amd64:3.0 Port: 8080/TCP Host Port: 0/TCP Environment: ENVVAR1_0: BdcklwgETdYUyEgCAFrFwmd2qYKRG3yH7UH2LNDY2SMusBeSX4gHj0OxOTsXDqe0RhYKdJMd14yIsiHVhiwKvSxqSL2wcrv52jxSMqfTowqp8DtJ6WRYO8qTRH0Rx0PJleyIs6itCFHB5eEl8nk0Q5re3us25TW042RAXrYfqao4J46Nnd3sJw3ekgN1b2NyAc2pI447vdr3Pw3jQjxl5sCoSM37uxV616AWeAluYGBHvJ0xFWG5OXyMSpYhPvU ENVVAR2_0: BdcklwgETdYUyEgCAFrFwmd2qYKRG3yH7UH2LNDY2SMusBeSX4gHj0OxOTsXDqe0RhYKdJMd14yIsiHVhiwKvSxqSL2wcrv52jxSMqfTowqp8DtJ6WRYO8qTRH0Rx0PJleyIs6itCFHB5eEl8nk0Q5re3us25TW042RAXrYfqao4J46Nnd3sJw3ekgN1b2NyAc2pI447vdr3Pw3jQjxl5sCoSM37uxV616AWeAluYGBHvJ0xFWG5OXyMSpYhPvU ENVVAR3_0: BdcklwgETdYUyEgCAFrFwmd2qYKRG3yH7UH2LNDY2SMusBeSX4gHj0OxOTsXDqe0RhYKdJMd14yIsiHVhiwKvSxqSL2wcrv52jxSMqfTowqp8DtJ6WRYO8qTRH0Rx0PJleyIs6itCFHB5eEl8nk0Q5re3us25TW042RAXrYfqao4J46Nnd3sJw3ekgN1b2NyAc2pI447vdr3Pw3jQjxl5sCoSM37uxV616AWeAluYGBHvJ0xFWG5OXyMSpYhPvU ENVVAR4_0: BdcklwgETdYUyEgCAFrFwmd2qYKRG3yH7UH2LNDY2SMusBeSX4gHj0OxOTsXDqe0RhYKdJMd14yIsiHVhiwKvSxqSL2wcrv52jxSMqfTowqp8DtJ6WRYO8qTRH0Rx0PJleyIs6itCFHB5eEl8nk0Q5re3us25TW042RAXrYfqao4J46Nnd3sJw3ekgN1b2NyAc2pI447vdr3Pw3jQjxl5sCoSM37uxV616AWeAluYGBHvJ0xFWG5OXyMSpYhPvU Mounts: <none> Volumes: <none> Events: <none> $ oc describe pod deploymentconfig0-1-deploy -n svt-3620 Name: deploymentconfig0-1-deploy Namespace: svt-3620 Priority: 0 Node: ip-10-0-176-141.us-east-2.compute.internal/10.0.176.141 Start Time: Mon, 29 Jun 2020 12:28:01 -0400 Labels: openshift.io/deployer-pod-for.name=deploymentconfig0-1 Annotations: k8s.v1.cni.cncf.io/network-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.131.8.214" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.131.8.214" ], "default": true, "dns": {} }] openshift.io/deployment-config.name: deploymentconfig0 openshift.io/deployment.name: deploymentconfig0-1 openshift.io/scc: restricted Status: Failed IP: 10.131.8.214 IPs: IP: 10.131.8.214 Containers: deployment: Container ID: cri-o://f78de2ebfdebfd8e7ca0825064e7eddba79aa4cd13b7281fd813b35d4608c56b Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:54ffefca329af4c95b8e17000fdf952d0bf2963f46108588fb03708e8861f5aa Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:54ffefca329af4c95b8e17000fdf952d0bf2963f46108588fb03708e8861f5aa Port: <none> Host Port: <none> State: Terminated Reason: Error Exit Code: 1 Started: Mon, 29 Jun 2020 12:28:04 -0400 Finished: Mon, 29 Jun 2020 12:28:04 -0400 Ready: False Restart Count: 0 Environment: OPENSHIFT_DEPLOYMENT_NAME: deploymentconfig0-1 OPENSHIFT_DEPLOYMENT_NAMESPACE: svt-3620 Mounts: /var/run/secrets/kubernetes.io/serviceaccount from deployer-token-qqm22 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: deployer-token-qqm22: Type: Secret (a volume populated by a Secret) SecretName: deployer-token-qqm22 Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned svt-3620/deploymentconfig0-1-deploy to ip-10-0-176-141.us-east-2.compute.internal Normal AddedInterface 69m multus Add eth0 [10.131.8.214/23] Normal Pulled 69m kubelet, ip-10-0-176-141.us-east-2.compute.internal Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:54ffefca329af4c95b8e17000fdf952d0bf2963f46108588fb03708e8861f5aa" already pres ent on machine Normal Created 69m kubelet, ip-10-0-176-141.us-east-2.compute.internal Created container deployment Normal Started 69m kubelet, ip-10-0-176-141.us-east-2.compute.internal Started container deployment Expected results: All pods will be created with no problems.
You claim "Failed to create pod due forbidden user for replicationcontrollers", yet I don't see that messages in your failure description. I see a failed pod, but you don't attach the yaml to actually understand what's going on. > All pods will be created with no problems. Creation in the API (or a failure of that like forbidden user) is not the same as a failed pod (= pod failing to start). Please attach the svt-3620 namespace objects (including RBAC). Must-gather only contains system namespaces.
From Slack conversation with Simon, we agree to move it to 4.5.0z and off the blocker list.
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.
This bug hasn't had any activity 7 days after it was marked as LifecycleStale, so we are closing this bug as WONTFIX. If you consider this bug still valuable, please reopen it or create new bug.
The LifecycleStale keyword was removed because the bug got commented on recently. The bug assignee was notified.