Description of problem: While downgrading ocp 4.6 to 4.5 cloud-credential operator pod goes in CrashLoopBackOff and blocking downgrade. $ oc get pods -n openshift-cloud-credential-operator NAME READY STATUS RESTARTS AGE cloud-credential-operator-5898d86997-zqh5s 0/1 CrashLoopBackOff 76 6h8m pod-identity-webhook-596ff668d-sc96x 1/1 Running 0 6h55m $ oc get pods cloud-credential-operator-5898d86997-zqh5s -n openshift-cloud-credential-operator -oyaml ... status: conditions: - lastProbeTime: null lastTransitionTime: "2020-08-12T06:35:34Z" status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: "2020-08-12T12:44:02Z" message: 'containers with unready status: [cloud-credential-operator]' reason: ContainersNotReady status: "False" type: Ready - lastProbeTime: null lastTransitionTime: "2020-08-12T12:44:02Z" message: 'containers with unready status: [cloud-credential-operator]' reason: ContainersNotReady status: "False" type: ContainersReady - lastProbeTime: null lastTransitionTime: "2020-08-12T06:35:34Z" status: "True" type: PodScheduled containerStatuses: - containerID: cri-o://f7eb2f196fc3b720b47c84afc94af5397f6cd8d2b77b680e37ea9ecc3a270b24 image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:534642e97f55406840394474970a39f2828732c6b2d98870da8734d7aadca2a4 imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:534642e97f55406840394474970a39f2828732c6b2d98870da8734d7aadca2a4 lastState: terminated: containerID: cri-o://f7eb2f196fc3b720b47c84afc94af5397f6cd8d2b77b680e37ea9ecc3a270b24 exitCode: 1 finishedAt: "2020-08-12T12:44:01Z" message: | Copying system trust bundle time="2020-08-12T12:43:59Z" level=debug msg="debug logging enabled" time="2020-08-12T12:43:59Z" level=info msg="setting up client for manager" time="2020-08-12T12:43:59Z" level=info msg="setting up manager" time="2020-08-12T12:44:01Z" level=info msg="registering components" time="2020-08-12T12:44:01Z" level=info msg="setting up scheme" time="2020-08-12T12:44:01Z" level=info msg="setting up controller" time="2020-08-12T12:44:01Z" level=fatal msg="infrastructures.config.openshift.io \"cluster\" is forbidden: User \"system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator\" cannot get resource \"infrastructures\" in API group \"config.openshift.io\" at the cluster scope" reason: Error startedAt: "2020-08-12T12:43:59Z" name: cloud-credential-operator ready: false restartCount: 76 started: false state: waiting: message: back-off 5m0s restarting failed container=cloud-credential-operator pod=cloud-credential-operator-5898d86997-zqh5s_openshift-cloud-credential-operator(04e00acd-c083-4b9c-9c70-a159fb05851e) reason: CrashLoopBackOff hostIP: 10.x.x.x phase: Running podIP: 10.x.x.x podIPs: - ip: 10.x.x.x qosClass: Burstable startTime: "2020-08-12T06:35:34Z" Version-Release number of selected component (if applicable): 4.6.0-0.nightly-2020-08-12-003456 4.5.0-0.nightly-2020-08-08-162221 How reproducible: Always Steps to Reproduce: 1.Down grade from 4.6.0-0.nightly-2020-08-12-003456 to 4.5.0-0.nightly-2020-08-08-162221 2. 3. Actual results: cloud-credential operator pod is in CrashLoopBackOff Expected results: There should not be any issue Additional info:
will investigate next sprint
Moving to 4.5. The issue appears to be that the Deployment in the release-4.5 branch for cloud-cred-operator doesn't specify a ServiceAccount, and the one in 4.6 does specify one (one that is new to 4.6). After downgrade to 4.5, the cloud-cred-operator deployment has the reference to the orphaned ServiceAccount (named "cloud-credential-operator") instead of to the ServiceAccount named "default" (the one we actually use in 4.5).
Can you please confirm that this is a test case where you started with 4.5, upgraded to 4.6, then downgraded back to 4.5?
To keep Eric's bot happy, we'll probably want to move this bug to MODIFIED so we can VERIFY with "4.6->4.6" does not crash-loop the cred operator. Then we can clone back to a bug targeting 4.5.z and actually fix it.
downgrading 4.6 -> 4.6, cco won't crash. downgrade from 4.6.0-0.nightly-2020-08-26-215737 to 4.6.0-0.nightly-2020-08-21-084833 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-08-26-215737 True True 9s Working towards registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-08-21-084833: downloading update $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-08-21-084833 True False 6m23s Cluster version is 4.6.0-0.nightly-2020-08-21-084833 $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.6.0-0.nightly-2020-08-21-084833 True False False 8m56s cloud-credential 4.6.0-0.nightly-2020-08-21-084833 True False False 139m cluster-autoscaler 4.6.0-0.nightly-2020-08-21-084833 True False False 127m config-operator 4.6.0-0.nightly-2020-08-21-084833 True False False 131m console 4.6.0-0.nightly-2020-08-21-084833 True False False 11m csi-snapshot-controller 4.6.0-0.nightly-2020-08-21-084833 True False False 11m dns 4.6.0-0.nightly-2020-08-21-084833 True False False 22m etcd 4.6.0-0.nightly-2020-08-21-084833 True False False 129m image-registry 4.6.0-0.nightly-2020-08-21-084833 True False False 122m ingress 4.6.0-0.nightly-2020-08-21-084833 True False False 122m insights 4.6.0-0.nightly-2020-08-21-084833 True False False 127m kube-apiserver 4.6.0-0.nightly-2020-08-21-084833 True False False 129m kube-controller-manager 4.6.0-0.nightly-2020-08-21-084833 True False False 128m kube-scheduler 4.6.0-0.nightly-2020-08-21-084833 True False False 128m kube-storage-version-migrator 4.6.0-0.nightly-2020-08-21-084833 True False False 11m machine-api 4.6.0-0.nightly-2020-08-21-084833 True False False 123m machine-approver 4.6.0-0.nightly-2020-08-21-084833 True False False 127m machine-config 4.6.0-0.nightly-2020-08-21-084833 True False False 8m16s marketplace 4.6.0-0.nightly-2020-08-21-084833 True False False 12m monitoring 4.6.0-0.nightly-2020-08-21-084833 True False False 121m network 4.6.0-0.nightly-2020-08-21-084833 True False False 122m node-tuning 4.6.0-0.nightly-2020-08-21-084833 True False False 32m openshift-apiserver 4.6.0-0.nightly-2020-08-21-084833 True False False 9m5s openshift-controller-manager 4.6.0-0.nightly-2020-08-21-084833 True False False 125m openshift-samples 4.6.0-0.nightly-2020-08-21-084833 True False False 32m operator-lifecycle-manager 4.6.0-0.nightly-2020-08-21-084833 True False False 130m operator-lifecycle-manager-catalog 4.6.0-0.nightly-2020-08-21-084833 True False False 130m operator-lifecycle-manager-packageserver 4.6.0-0.nightly-2020-08-21-084833 True False False 9m7s service-ca 4.6.0-0.nightly-2020-08-21-084833 True False False 131m storage 4.6.0-0.nightly-2020-08-21-084833 True False False 11m $ oc get pods -n openshift-cloud-credential-operator NAME READY STATUS RESTARTS AGE cloud-credential-operator-869b565fc5-gcws4 2/2 Running 0 14m pod-identity-webhook-7f99757f4c-nj7tq 1/1 Running 0 14m
(In reply to Scott Dodson from comment #5) > Can you please confirm that this is a test case where you started with 4.5, upgraded to 4.6, then downgraded back to 4.5? Yes. I tried today, still hit. I launched 4.5.0-0.nightly-2020-08-31-101523 ipi gcp env, then upgraded successfully to 4.6.0-0.nightly-2020-09-01-042030, then tried the downgrade to 4.5.0-0.nightly-2020-08-31-10152, hit it.
The fix haven't backport to 4.5, so downgrading from 4.6 to 4.5 still hits this issue. isn't this a verify 4.6 -> 4.6 downgrade ? refer to https://bugzilla.redhat.com/show_bug.cgi?id=1868376#c6 And this one (https://bugzilla.redhat.com/show_bug.cgi?id=1873345) is an actual fix for this downgrade issue. Is my understanding wrong?
Sorry I didn't notice carefully there was already a 4.5 clone.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196