Bug 1704201

Summary: scc for ServiceAccount record lost when upgrading from 4.1.0-0.nightly-2019-04-25-121505 to 4.1.0-0.nightly-2019-04-28-064010
Product: OpenShift Container Platform Reporter: Qin Ping <piqin>
Component: apiserver-authAssignee: Standa Laznicka <slaznick>
Status: CLOSED DUPLICATE QA Contact: Chuan Yu <chuyu>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: aos-bugs, eparis, nagrawal, scheng
Target Milestone: ---Keywords: BetaBlocker
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-02 15:48:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Qin Ping 2019-04-29 10:16:33 UTC
Description of problem:
scc for ServiceAccount record lost when upgrading from 4.1.0-0.nightly-2019-04-25-121505 to 4.1.0-0.nightly-2019-04-28-064010

Version-Release number of selected component (if applicable):
from 4.1.0-0.nightly-2019-04-25-121505 to 4.1.0-0.nightly-2019-04-28-064010

How reproducible:


Steps to Reproduce:
1. Create a ns federation-system
2. Add sa system:serviceaccount:federation-system:deployer to privileged scc with cmd:
oc adm policy add-scc-to-user privileged system:serviceaccount:federation-system:deployer
3. Create a deployment with the following yaml file:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: test-deployment
  namespace: federation-system
  labels:
    app: nginx
spec:
  replicas: 5
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      serviceAccountName: deployer
      containers:
      - image: nginx
        name: nginx
        securityContext:
          privileged: true
4. Check there are 5 pod with app=nginx label running under the federation-system namespace
5. Upgrade OCP cluster
6. After upgrading successfully, check the deployment.

Actual results:
No Pod was running with app=nginx label
$ oc get deployment test-deployment -n federation-system  -ojson |jq .status
{
  "conditions": [
    {
      "lastTransitionTime": "2019-04-29T10:06:49Z",
      "lastUpdateTime": "2019-04-29T10:06:49Z",
      "message": "Created new replica set \"test-deployment-694b4dbc8c\"",
      "reason": "NewReplicaSetCreated",
      "status": "True",
      "type": "Progressing"
    },
    {
      "lastTransitionTime": "2019-04-29T10:06:49Z",
      "lastUpdateTime": "2019-04-29T10:06:49Z",
      "message": "Deployment does not have minimum availability.",
      "reason": "MinimumReplicasUnavailable",
      "status": "False",
      "type": "Available"
    },
    {
      "lastTransitionTime": "2019-04-29T10:06:49Z",
      "lastUpdateTime": "2019-04-29T10:06:49Z",
      "message": "pods \"test-deployment-694b4dbc8c-\" is forbidden: unable to validate against any security context constraint: [spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed]",
      "reason": "FailedCreate",
      "status": "True",
      "type": "ReplicaFailure"
    }
  ],
  "observedGeneration": 7,
  "unavailableReplicas": 5
}

Check privileged scc, there is no record for system:serviceaccount:federation-system:deployer

Expected results:
The system:serviceaccount:federation-system:deployer record is exists after the upgrading.

Additional info:

Comment 1 Standa Laznicka 2019-05-02 09:57:55 UTC
SCCs are currently on fire and even a simple restart of OAS pods leads to them crashlooping, won't be able to reproduce until that is fixed.

Comment 2 Standa Laznicka 2019-05-02 15:48:48 UTC
I tried the upgrade with versions 4.1.0-0.nightly-2019-04-22-005054 --> 4.1.0-0.nightly-2019-04-22-192604 as they were the only green two nightly versions in CI with green upgrades and did not have SCCs wedged.

However, note that 22nd April is the day that https://github.com/openshift/origin/pull/22606 merged, which might have caused the issue. It could possibly mean that the still on-going work might affect upgrades in between betas in such a manner.

I'm going to close this bug since the other work is still in progress (https://bugzilla.redhat.com/show_bug.cgi?id=1698625), and I will point out the issue to the developers working on the other bug.

*** This bug has been marked as a duplicate of bug 1698625 ***