1704201 – scc for ServiceAccount record lost when upgrading from 4.1.0-0.nightly-2019-04-25-121505 to 4.1.0-0.nightly-2019-04-28-064010

Bug 1704201 - scc for ServiceAccount record lost when upgrading from 4.1.0-0.nightly-2019-04-25-121505 to 4.1.0-0.nightly-2019-04-28-064010

Summary: scc for ServiceAccount record lost when upgrading from 4.1.0-0.nightly-2019-0...

Keywords:
Status:	CLOSED DUPLICATE of bug 1698625
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	apiserver-auth
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.1.0
Assignee:	Standa Laznicka
QA Contact:	Chuan Yu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-04-29 10:16 UTC by Qin Ping
Modified:	2019-05-02 18:04 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-05-02 15:48:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Qin Ping 2019-04-29 10:16:33 UTC

Description of problem:
scc for ServiceAccount record lost when upgrading from 4.1.0-0.nightly-2019-04-25-121505 to 4.1.0-0.nightly-2019-04-28-064010

Version-Release number of selected component (if applicable):
from 4.1.0-0.nightly-2019-04-25-121505 to 4.1.0-0.nightly-2019-04-28-064010

How reproducible:


Steps to Reproduce:
1. Create a ns federation-system
2. Add sa system:serviceaccount:federation-system:deployer to privileged scc with cmd:
oc adm policy add-scc-to-user privileged system:serviceaccount:federation-system:deployer
3. Create a deployment with the following yaml file:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: test-deployment
  namespace: federation-system
  labels:
    app: nginx
spec:
  replicas: 5
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      serviceAccountName: deployer
      containers:
      - image: nginx
        name: nginx
        securityContext:
          privileged: true
4. Check there are 5 pod with app=nginx label running under the federation-system namespace
5. Upgrade OCP cluster
6. After upgrading successfully, check the deployment.

Actual results:
No Pod was running with app=nginx label
$ oc get deployment test-deployment -n federation-system  -ojson |jq .status
{
  "conditions": [
    {
      "lastTransitionTime": "2019-04-29T10:06:49Z",
      "lastUpdateTime": "2019-04-29T10:06:49Z",
      "message": "Created new replica set \"test-deployment-694b4dbc8c\"",
      "reason": "NewReplicaSetCreated",
      "status": "True",
      "type": "Progressing"
    },
    {
      "lastTransitionTime": "2019-04-29T10:06:49Z",
      "lastUpdateTime": "2019-04-29T10:06:49Z",
      "message": "Deployment does not have minimum availability.",
      "reason": "MinimumReplicasUnavailable",
      "status": "False",
      "type": "Available"
    },
    {
      "lastTransitionTime": "2019-04-29T10:06:49Z",
      "lastUpdateTime": "2019-04-29T10:06:49Z",
      "message": "pods \"test-deployment-694b4dbc8c-\" is forbidden: unable to validate against any security context constraint: [spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed]",
      "reason": "FailedCreate",
      "status": "True",
      "type": "ReplicaFailure"
    }
  ],
  "observedGeneration": 7,
  "unavailableReplicas": 5
}

Check privileged scc, there is no record for system:serviceaccount:federation-system:deployer

Expected results:
The system:serviceaccount:federation-system:deployer record is exists after the upgrading.

Additional info:

Comment 1 Standa Laznicka 2019-05-02 09:57:55 UTC

SCCs are currently on fire and even a simple restart of OAS pods leads to them crashlooping, won't be able to reproduce until that is fixed.

Comment 2 Standa Laznicka 2019-05-02 15:48:48 UTC

I tried the upgrade with versions 4.1.0-0.nightly-2019-04-22-005054 --> 4.1.0-0.nightly-2019-04-22-192604 as they were the only green two nightly versions in CI with green upgrades and did not have SCCs wedged.

However, note that 22nd April is the day that https://github.com/openshift/origin/pull/22606 merged, which might have caused the issue. It could possibly mean that the still on-going work might affect upgrades in between betas in such a manner.

I'm going to close this bug since the other work is still in progress (https://bugzilla.redhat.com/show_bug.cgi?id=1698625), and I will point out the issue to the developers working on the other bug.

*** This bug has been marked as a duplicate of bug 1698625 ***

Note You need to log in before you can comment on or make changes to this bug.