Config CCO in manual mode, upgrade cluster from 4.5.6 to 4.6 nightly build was failed due to the Secret for openshift-cluster-csi-drivers is missing: ./oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.6 True True 3h39m Unable to apply 4.6.0-0.nightly-2020-08-18-165040: the cluster operator storage has not yet successfully rolled out ./oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE storage 4.6.0-0.nightly-2020-08-18-165040 False True False 22h ./oc get pod -n openshift-cluster-csi-drivers NAME READY STATUS RESTARTS AGE aws-ebs-csi-driver-controller-6c679787fd-4th6m 4/5 CreateContainerConfigError 0 22h aws-ebs-csi-driver-node-cl7rz 3/3 Running 0 22h aws-ebs-csi-driver-node-jc22w 3/3 Running 0 22h aws-ebs-csi-driver-node-pcmwz 3/3 Running 0 22h aws-ebs-csi-driver-node-r7g75 3/3 Running 0 22h aws-ebs-csi-driver-node-rkhn5 3/3 Running 0 22h aws-ebs-csi-driver-node-zdbnc 3/3 Running 0 22h aws-ebs-csi-driver-operator-f574b4569-w2tmc 1/1 Running 2 22h ./oc describe pod/aws-ebs-csi-driver-controller-6c679787fd-4th6m -n openshift-cluster-csi-drivers Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Failed 7m32s (x6145 over 22h) kubelet, ip-10-0-51-226.us-east-2.compute.internal Error: secret "aws-cloud-credentials" not found > Compare CR between 4.5 and 4.6: CredentialsRequest for 4.6 cloud-credential-operator-iam-ro 24h cloud-credential-operator-s3 23h <— new in 4.6, but it doesn’t impact upgrade openshift-cluster-csi-drivers 23h <— new in 4.6, missing this Secret will cause upgrade fail. openshift-image-registry 24h openshift-image-registry-azure 24h openshift-image-registry-gcs 24h openshift-image-registry-openstack 24h openshift-ingress 24h openshift-ingress-azure 24h openshift-ingress-gcp 24h openshift-machine-api-aws 24h openshift-machine-api-azure 24h openshift-machine-api-gcp 24h openshift-machine-api-openstack 24h openshift-machine-api-ovirt 24h openshift-machine-api-vsphere 24h openshift-network 24h > CredentialsRequest for 4.5 cloud-credential-operator-iam-ro 53m openshift-image-registry 64m openshift-image-registry-azure 65m openshift-image-registry-gcs 65m openshift-image-registry-openstack 64m openshift-ingress 65m openshift-ingress-azure 65m openshift-ingress-gcp 64m openshift-machine-api-aws 65m openshift-machine-api-azure 65m openshift-machine-api-gcp 64m openshift-machine-api-openstack 65m openshift-machine-api-ovirt 64m openshift-machine-api-vsphere 64m openshift-network 65m After provide Secret for openshift-cluster-csi-drivers, the cluster was upgraded to 4.6.0-0.nightly-2020-08-18-165040 successfully. cat <<EOF >csi.yaml apiVersion: v1 kind: Secret metadata: name: aws-cloud-credentials namespace: openshift-cluster-csi-drivers data: aws_access_key_id: <HIDDEN> aws_secret_access_key: <HIDDEN> EOF ./oc create -f csi.yaml ./oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-08-18-165040 True False 5m4s Cluster version is 4.6.0-0.nightly-2020-08-18-165040 ./oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.6.0-0.nightly-2020-08-18-165040 True False False 6m10s cloud-credential 4.6.0-0.nightly-2020-08-18-165040 True False False 23h cluster-autoscaler 4.6.0-0.nightly-2020-08-18-165040 True False False 26h config-operator 4.6.0-0.nightly-2020-08-18-165040 True False False 26h console 4.6.0-0.nightly-2020-08-18-165040 True False False 17m csi-snapshot-controller 4.6.0-0.nightly-2020-08-18-165040 True False False 23m dns 4.6.0-0.nightly-2020-08-18-165040 True False False 26h etcd 4.6.0-0.nightly-2020-08-18-165040 True False False 26h image-registry 4.6.0-0.nightly-2020-08-18-165040 True False False 25h ingress 4.6.0-0.nightly-2020-08-18-165040 True False False 23h insights 4.6.0-0.nightly-2020-08-18-165040 True False False 26h kube-apiserver 4.6.0-0.nightly-2020-08-18-165040 True False False 26h kube-controller-manager 4.6.0-0.nightly-2020-08-18-165040 True False False 26h kube-scheduler 4.6.0-0.nightly-2020-08-18-165040 True False False 26h kube-storage-version-migrator 4.6.0-0.nightly-2020-08-18-165040 True False False 18m machine-api 4.6.0-0.nightly-2020-08-18-165040 True False False 26h machine-approver 4.6.0-0.nightly-2020-08-18-165040 True False False 26h machine-config 4.6.0-0.nightly-2020-08-18-165040 True False False 6m37s marketplace 4.6.0-0.nightly-2020-08-18-165040 True False False 17m monitoring 4.6.0-0.nightly-2020-08-18-165040 True False False 9m39s network 4.6.0-0.nightly-2020-08-18-165040 True False False 26h node-tuning 4.6.0-0.nightly-2020-08-18-165040 True False False 23h openshift-apiserver 4.6.0-0.nightly-2020-08-18-165040 True False False 165m openshift-controller-manager 4.6.0-0.nightly-2020-08-18-165040 True False False 26h openshift-samples 4.6.0-0.nightly-2020-08-18-165040 True False False 2m25s operator-lifecycle-manager 4.6.0-0.nightly-2020-08-18-165040 True False False 26h operator-lifecycle-manager-catalog 4.6.0-0.nightly-2020-08-18-165040 True False False 26h operator-lifecycle-manager-packageserver 4.6.0-0.nightly-2020-08-18-165040 True False False 10m service-ca 4.6.0-0.nightly-2020-08-18-165040 True False False 26h storage 4.6.0-0.nightly-2020-08-18-165040 True False False 13m Version-Release number of the following components: 4.6.0-0.nightly-2020-08-18-165040 How reproducible: Always Steps to Reproduce: 1. Create 4.5 cluster with CCO in manual mode, refer to https://github.com/openshift/cloud-credential-operator/blob/master/docs/mode-manual-creds.md 2. upgrade to 4.6 nightly build. Actual results: storage operator upgrade failed due to Error: secret "aws-cloud-credentials" not found Expected results: storage operator upgrade successfully Additional info: It is better to notice user before upgrading cluster to 4.6 while CCO in manual mode.
Nice catch Yunfei, it looks like storage is incorrectly using the root/admin cloud credential which is often admin level and will not be present in manual mode. In-cluster components that need cloud credentials should create a CredentialsRequest in the openshift-cloud-credential-operator namespace containing the exact permissions the component needs, this CredentialsRequest would be included in the release image. Documentation here: https://github.com/openshift/cloud-credential-operator Moving to storage.
I misread, this looks more like what happens when you are in manual mode, and upgrade without pre-baking the credential needed for the new release image. This should be covered by the documentation. Yunfei did you perform the documented steps for a manual mode cluster upgrade? This would entail precreating namespaces and secrets from the release image you intend to upgrade to. However in talking with Hemant this isn't even going to work because they are dynamically creating the credentials request in the operator itself, it is not carried in the release payload. This means that our audutiability story of all CredRequests in the payload is not accurate, and will break for users in manual mode attempting to upgrade. We are keeping this bug open to address this issue. We hope to have tooling to automate this in future but it's not there yet, users opting into manual mode must perform these steps themselves
hello Devan, >> Yunfei did you perform the documented steps for a manual mode cluster upgrade? I'm not sure that if [1] and [2] are the documents you mentioned above And as you mentioned, the CR is not in the release payload, so it could not be extracted by user, they do not know there is a new Secret for cluster. I tried to create a Secret for openshift-cluster-csi-drivers in 4.5 cluster before upgrade, but failed: `Error from server (NotFound): error when creating "csi.yaml": namespaces "openshift-cluster-csi-drivers" not found` what I expect are: 1. Document the detail upgrade process (the important step is provide the corresponding Secret before upgrading) in a more clear location, e.g. Upgrade chapter in official document or Release Note, instead of a component document, e.g. [1] or [2] 2. Perform a pre-check before upgrading, like Secrets validation, stop the upgrade command if it does not meet the upgrade requirement. [1] https://github.com/openshift/cloud-credential-operator/blob/master/docs/mode-manual-creds.md#upgrades [2] https://github.com/openshift/cloud-credential-operator/blob/master/README.md
Yes you likely need to create the namespace as well, and then the secret. For (1) I will work with the docs team is the org approves this is the best path forward, I think it's our only option. For (2) there is a plan for upgrade preflight checks but it missed 4.6. Hopefully there in 4.7 and we can use it for 4.8 and beyond.
It has come to our attention that we can protect against this by pushing code to 4.5 which checks for the existence of the new cred secrets we know are coming in 4.6 if in manual mode, and then sets Upgradable=False indicating the user needs to take action. Upgradable=false only blocks 4.y upgrades, it will not block a 4.5.z upgrade. Re-opening the bugzilla to use for this purpose.
Fix for this is actively underway, PR coming soon.
I'm bumping this to urgent because I want to ensure that this makes the current merge window so that we can tie the minimum 4.5 to 4.6 version to this release.
verified. FAILED. version: 4.5.0-0.nightly-2020-09-26-194704 according to https://bugzilla.redhat.com/show_bug.cgi?id=1879628#c2 , I did following tests: >> case 1 [PASS] 1. install 4.5 with credentials mode Manual 2. checked Upgradeable=False 3. create secrets for s3 and csi, checked cloud-credential => Upgradeable=True 4. delete s3 or csi secret, checked cloud-credential => Upgradeable=False 5. re-create s3 or csi secret, checked cloud-credential => Upgradeable=True 6. Upgrade to 4.6 successfully >> case 2 [FAILED] 1. install 4.5 with default credentials (no `credentialsMode` in install-config.yaml) 2. checked cloud-credential => Upgradeable=True 3. checked annotations of aws-creds => "mint" 4. removed aws-creds 5. checked cloud-credential => Upgradeable=True (should be Upgradeable=False)
In version 4.5, the upgradeable status will not immediately change to false when we remove the root creds. we need to wait for a long time until cco next reconcile or force a reconcile via adding an annotation to the CloudCredential object.
I spoke with our pillar lead Scott Dodson, he feels we should file a new bug for the missing immediate update when root cred removed/restored. Request we consider this one verified and I will work on the other issue separately. Does this sound ok?
The upgrade function for manual mode has fixed, can mark "VERIFIED" in my side, will wait for Yunfei's suggestion.
Hello @Devan, Agree with you. The original problem we met is that we cannot upgrade cluster smoothly when config CCO in manual mode, now we have tested against this feature/process, it works well. For case 2 in Comment 12 , we will file a new bz to track this issue. Thanks. Mark this bug as VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5.15 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4228