Bug 1852103 - Failed to create pod due forbidden user for replicationcontrollers [NEEDINFO]
Summary: Failed to create pod due forbidden user for replicationcontrollers
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: openshift-apiserver
Version: 4.5
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
: 4.6.0
Assignee: Standa Laznicka
QA Contact: Xingxing Xia
URL:
Whiteboard: LifecycleReset
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-29 18:06 UTC by Simon
Modified: 2020-08-31 14:59 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-31 13:59:54 UTC
Target Upstream Version:
Embargoed:
mfojtik: needinfo?


Attachments (Terms of Use)

Description Simon 2020-06-29 18:06:35 UTC
Description of problem:
During pod density test pods are finishing in Error state (29 from 4000 pods)

Version-Release number of selected component (if applicable):
4.5.0-0.nightly-2020-06-26-215024

How reproducible:
100%

install-config.yaml:
---
apiVersion: v1
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform:
    aws:
      type: m5.xlarge
  replicas: 3
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform:
    aws:
      type: m5.xlarge
  replicas: 20
metadata:
  name: skordas
platform:
  aws:
    region: us-east-2
pullSecret: ***
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  serviceNetwork:
  - 172.30.0.0/16
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: OpenShiftSDN
publish: External
fips: true
baseDomain: qe.devcluster.openshift.com
sshKey: ***

Steps to Reproduce:
1. Scale up cluster to 20 working nodes.
2. Create 2000 projects (200 per node):
  - git clone https://github.com/openshift/svt.git
  - cd svt openshift_scalability
  - touch test.yaml
  - vim test.yaml

```yaml
projects:
  - num: 2000
    basename: svt-
    templates:
      -
        num: 1
        file: ./content/deployment-config-1rep-pause-template.json
```

  - cp $KUBECONFIG ~/.kube/config
  - python cluster-loader.py -f test.yaml -p 5

3. Delete projects: oc delete project -l purpose=test
4. Change number of projects to 4000: vim test.yaml
5. Create 4000 projects
python cluster-loader.py -f test.yaml -p 5

Actual results:
$ oc logs deploymentconfig0-1-deploy -n svt-3620
error: couldn't get deployment deploymentconfig0-1: replicationcontrollers "deploymentconfig0-1" is forbidden: User "system:serviceaccount:svt-3620:deployer" cannot get resource "replicationcontrollers" in API group "" in the namespace "svt-3620"

$ oc get replicationcontrollers -n svt-3620
NAME                  DESIRED   CURRENT   READY   AGE
deploymentconfig0-1   0         0         0       54m

$ oc describe replicationcontrollers deploymentconfig0-1 -n svt-3620
Name:         deploymentconfig0-1
Namespace:    svt-3620
Selector:     deployment=deploymentconfig0-1,deploymentconfig=deploymentconfig0,name=replicationcontroller0
Labels:       openshift.io/deployment-config.name=deploymentconfig0
              template=deploymentConfigTemplate
Annotations:  kubectl.kubernetes.io/desired-replicas: 1
              openshift.io/deployer-pod.completed-at: 2020-06-29 16:28:04 +0000 UTC
              openshift.io/deployer-pod.created-at: 2020-06-29 16:28:00 +0000 UTC
              openshift.io/deployer-pod.name: deploymentconfig0-1-deploy
              openshift.io/deployment-config.latest-version: 1
              openshift.io/deployment-config.name: deploymentconfig0
              openshift.io/deployment.phase: Failed
              openshift.io/deployment.replicas: 0
              openshift.io/deployment.status-reason: config change
              openshift.io/encoded-deployment-config:
                {"kind":"DeploymentConfig","apiVersion":"apps.openshift.io/v1","metadata":{"name":"deploymentconfig0","namespace":"svt-3620","selfLink":"/...
Replicas:     0 current / 0 desired
Pods Status:  0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:       deployment=deploymentconfig0-1
                deploymentconfig=deploymentconfig0
                name=replicationcontroller0
  Annotations:  openshift.io/deployment-config.latest-version: 1
                openshift.io/deployment-config.name: deploymentconfig0
                openshift.io/deployment.name: deploymentconfig0-1
  Containers:
   pause0:
    Image:      gcr.io/google-containers/pause-amd64:3.0
    Port:       8080/TCP
    Host Port:  0/TCP
    Environment:
      ENVVAR1_0:  BdcklwgETdYUyEgCAFrFwmd2qYKRG3yH7UH2LNDY2SMusBeSX4gHj0OxOTsXDqe0RhYKdJMd14yIsiHVhiwKvSxqSL2wcrv52jxSMqfTowqp8DtJ6WRYO8qTRH0Rx0PJleyIs6itCFHB5eEl8nk0Q5re3us25TW042RAXrYfqao4J46Nnd3sJw3ekgN1b2NyAc2pI447vdr3Pw3jQjxl5sCoSM37uxV616AWeAluYGBHvJ0xFWG5OXyMSpYhPvU
      ENVVAR2_0:  BdcklwgETdYUyEgCAFrFwmd2qYKRG3yH7UH2LNDY2SMusBeSX4gHj0OxOTsXDqe0RhYKdJMd14yIsiHVhiwKvSxqSL2wcrv52jxSMqfTowqp8DtJ6WRYO8qTRH0Rx0PJleyIs6itCFHB5eEl8nk0Q5re3us25TW042RAXrYfqao4J46Nnd3sJw3ekgN1b2NyAc2pI447vdr3Pw3jQjxl5sCoSM37uxV616AWeAluYGBHvJ0xFWG5OXyMSpYhPvU
      ENVVAR3_0:  BdcklwgETdYUyEgCAFrFwmd2qYKRG3yH7UH2LNDY2SMusBeSX4gHj0OxOTsXDqe0RhYKdJMd14yIsiHVhiwKvSxqSL2wcrv52jxSMqfTowqp8DtJ6WRYO8qTRH0Rx0PJleyIs6itCFHB5eEl8nk0Q5re3us25TW042RAXrYfqao4J46Nnd3sJw3ekgN1b2NyAc2pI447vdr3Pw3jQjxl5sCoSM37uxV616AWeAluYGBHvJ0xFWG5OXyMSpYhPvU
      ENVVAR4_0:  BdcklwgETdYUyEgCAFrFwmd2qYKRG3yH7UH2LNDY2SMusBeSX4gHj0OxOTsXDqe0RhYKdJMd14yIsiHVhiwKvSxqSL2wcrv52jxSMqfTowqp8DtJ6WRYO8qTRH0Rx0PJleyIs6itCFHB5eEl8nk0Q5re3us25TW042RAXrYfqao4J46Nnd3sJw3ekgN1b2NyAc2pI447vdr3Pw3jQjxl5sCoSM37uxV616AWeAluYGBHvJ0xFWG5OXyMSpYhPvU
    Mounts:       <none>
  Volumes:        <none>
Events:           <none>

$ oc describe pod deploymentconfig0-1-deploy -n svt-3620
Name:         deploymentconfig0-1-deploy
Namespace:    svt-3620
Priority:     0
Node:         ip-10-0-176-141.us-east-2.compute.internal/10.0.176.141
Start Time:   Mon, 29 Jun 2020 12:28:01 -0400
Labels:       openshift.io/deployer-pod-for.name=deploymentconfig0-1
Annotations:  k8s.v1.cni.cncf.io/network-status:
                [{
                    "name": "openshift-sdn",
                    "interface": "eth0",
                    "ips": [
                        "10.131.8.214"
                    ],
                    "default": true,
                    "dns": {}
                }]
              k8s.v1.cni.cncf.io/networks-status:
                [{
                    "name": "openshift-sdn",
                    "interface": "eth0",
                    "ips": [
                        "10.131.8.214"
                    ],
                    "default": true,
                    "dns": {}
                }]
              openshift.io/deployment-config.name: deploymentconfig0
              openshift.io/deployment.name: deploymentconfig0-1
              openshift.io/scc: restricted
Status:       Failed
IP:           10.131.8.214
IPs:
  IP:  10.131.8.214
  Containers:
    deployment:
      Container ID:   cri-o://f78de2ebfdebfd8e7ca0825064e7eddba79aa4cd13b7281fd813b35d4608c56b
      Image:          quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:54ffefca329af4c95b8e17000fdf952d0bf2963f46108588fb03708e8861f5aa
      Image ID:       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:54ffefca329af4c95b8e17000fdf952d0bf2963f46108588fb03708e8861f5aa
      Port:           <none>
      Host Port:      <none>
      State:          Terminated
        Reason:       Error
        Exit Code:    1
        Started:      Mon, 29 Jun 2020 12:28:04 -0400
        Finished:     Mon, 29 Jun 2020 12:28:04 -0400
      Ready:          False
      Restart Count:  0
      Environment:
        OPENSHIFT_DEPLOYMENT_NAME:       deploymentconfig0-1
        OPENSHIFT_DEPLOYMENT_NAMESPACE:  svt-3620
      Mounts:
        /var/run/secrets/kubernetes.io/serviceaccount from deployer-token-qqm22 (ro)
  Conditions:
    Type              Status
    Initialized       True
    Ready             False
    ContainersReady   False
    PodScheduled      True
  Volumes:
    deployer-token-qqm22:
      Type:        Secret (a volume populated by a Secret)
      SecretName:  deployer-token-qqm22
      Optional:    false
  QoS Class:       BestEffort
  Node-Selectors:  <none>
  Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                   node.kubernetes.io/unreachable:NoExecute for 300s
  Events:
    Type    Reason          Age        From                                                 Message
    ----    ------          ----       ----                                                 -------
    Normal  Scheduled       <unknown>  default-scheduler                                    Successfully assigned svt-3620/deploymentconfig0-1-deploy to ip-10-0-176-141.us-east-2.compute.internal
    Normal  AddedInterface  69m        multus                                               Add eth0 [10.131.8.214/23]
    Normal  Pulled          69m        kubelet, ip-10-0-176-141.us-east-2.compute.internal  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:54ffefca329af4c95b8e17000fdf952d0bf2963f46108588fb03708e8861f5aa" already pres
  ent on machine
    Normal  Created         69m        kubelet, ip-10-0-176-141.us-east-2.compute.internal  Created container deployment
    Normal  Started         69m        kubelet, ip-10-0-176-141.us-east-2.compute.internal  Started container deployment

Expected results:
All pods will be created with no problems.

Comment 2 Stefan Schimanski 2020-06-30 07:47:31 UTC
You claim "Failed to create pod due forbidden user for replicationcontrollers", yet I don't see that messages in your failure description.

I see a failed pod, but you don't attach the yaml to actually understand what's going on.

> All pods will be created with no problems.

Creation in the API (or a failure of that like forbidden user) is not the same as a failed pod (= pod failing to start).

Please attach the svt-3620 namespace objects (including RBAC). Must-gather only contains system namespaces.

Comment 4 Stefan Schimanski 2020-07-01 12:49:34 UTC
From Slack conversation with Simon, we agree to move it to 4.5.0z and off the blocker list.

Comment 9 Michal Fojtik 2020-08-24 13:11:49 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.

Comment 10 Michal Fojtik 2020-08-31 13:59:54 UTC
This bug hasn't had any activity 7 days after it was marked as LifecycleStale, so we are closing this bug as WONTFIX. If you consider this bug still valuable, please reopen it or create new bug.

Comment 11 Michal Fojtik 2020-08-31 14:59:55 UTC
The LifecycleStale keyword was removed because the bug got commented on recently.
The bug assignee was notified.


Note You need to log in before you can comment on or make changes to this bug.