Bug 1871713
Summary: | [AWS 4.6 upgrade] Upgrade failed while CCO in manual mode, Error: secret "aws-cloud-credentials" not found | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Yunfei Jiang <yunjiang> | |
Component: | Cloud Credential Operator | Assignee: | Devan Goodwin <dgoodwin> | |
Status: | CLOSED ERRATA | QA Contact: | Yunfei Jiang <yunjiang> | |
Severity: | urgent | Docs Contact: | ||
Priority: | high | |||
Version: | 4.6 | CC: | aos-bugs, dgoodwin, gshereme, hekumar, lwan, sdodson, wking | |
Target Milestone: | --- | Keywords: | Reopened, Upgrades | |
Target Release: | 4.5.z | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1879628 (view as bug list) | Environment: | ||
Last Closed: | 2020-10-19 14:54:24 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1879628 | |||
Bug Blocks: |
Description
Yunfei Jiang
2020-08-24 05:43:53 UTC
Nice catch Yunfei, it looks like storage is incorrectly using the root/admin cloud credential which is often admin level and will not be present in manual mode. In-cluster components that need cloud credentials should create a CredentialsRequest in the openshift-cloud-credential-operator namespace containing the exact permissions the component needs, this CredentialsRequest would be included in the release image. Documentation here: https://github.com/openshift/cloud-credential-operator Moving to storage. I misread, this looks more like what happens when you are in manual mode, and upgrade without pre-baking the credential needed for the new release image. This should be covered by the documentation. Yunfei did you perform the documented steps for a manual mode cluster upgrade? This would entail precreating namespaces and secrets from the release image you intend to upgrade to. However in talking with Hemant this isn't even going to work because they are dynamically creating the credentials request in the operator itself, it is not carried in the release payload. This means that our audutiability story of all CredRequests in the payload is not accurate, and will break for users in manual mode attempting to upgrade. We are keeping this bug open to address this issue. We hope to have tooling to automate this in future but it's not there yet, users opting into manual mode must perform these steps themselves hello Devan, >> Yunfei did you perform the documented steps for a manual mode cluster upgrade? I'm not sure that if [1] and [2] are the documents you mentioned above And as you mentioned, the CR is not in the release payload, so it could not be extracted by user, they do not know there is a new Secret for cluster. I tried to create a Secret for openshift-cluster-csi-drivers in 4.5 cluster before upgrade, but failed: `Error from server (NotFound): error when creating "csi.yaml": namespaces "openshift-cluster-csi-drivers" not found` what I expect are: 1. Document the detail upgrade process (the important step is provide the corresponding Secret before upgrading) in a more clear location, e.g. Upgrade chapter in official document or Release Note, instead of a component document, e.g. [1] or [2] 2. Perform a pre-check before upgrading, like Secrets validation, stop the upgrade command if it does not meet the upgrade requirement. [1] https://github.com/openshift/cloud-credential-operator/blob/master/docs/mode-manual-creds.md#upgrades [2] https://github.com/openshift/cloud-credential-operator/blob/master/README.md Yes you likely need to create the namespace as well, and then the secret. For (1) I will work with the docs team is the org approves this is the best path forward, I think it's our only option. For (2) there is a plan for upgrade preflight checks but it missed 4.6. Hopefully there in 4.7 and we can use it for 4.8 and beyond. It has come to our attention that we can protect against this by pushing code to 4.5 which checks for the existence of the new cred secrets we know are coming in 4.6 if in manual mode, and then sets Upgradable=False indicating the user needs to take action. Upgradable=false only blocks 4.y upgrades, it will not block a 4.5.z upgrade. Re-opening the bugzilla to use for this purpose. Fix for this is actively underway, PR coming soon. I'm bumping this to urgent because I want to ensure that this makes the current merge window so that we can tie the minimum 4.5 to 4.6 version to this release. verified. FAILED. version: 4.5.0-0.nightly-2020-09-26-194704 according to https://bugzilla.redhat.com/show_bug.cgi?id=1879628#c2 , I did following tests: >> case 1 [PASS] 1. install 4.5 with credentials mode Manual 2. checked Upgradeable=False 3. create secrets for s3 and csi, checked cloud-credential => Upgradeable=True 4. delete s3 or csi secret, checked cloud-credential => Upgradeable=False 5. re-create s3 or csi secret, checked cloud-credential => Upgradeable=True 6. Upgrade to 4.6 successfully >> case 2 [FAILED] 1. install 4.5 with default credentials (no `credentialsMode` in install-config.yaml) 2. checked cloud-credential => Upgradeable=True 3. checked annotations of aws-creds => "mint" 4. removed aws-creds 5. checked cloud-credential => Upgradeable=True (should be Upgradeable=False) In version 4.5, the upgradeable status will not immediately change to false when we remove the root creds. we need to wait for a long time until cco next reconcile or force a reconcile via adding an annotation to the CloudCredential object. I spoke with our pillar lead Scott Dodson, he feels we should file a new bug for the missing immediate update when root cred removed/restored. Request we consider this one verified and I will work on the other issue separately. Does this sound ok? The upgrade function for manual mode has fixed, can mark "VERIFIED" in my side, will wait for Yunfei's suggestion. Hello @Devan, Agree with you. The original problem we met is that we cannot upgrade cluster smoothly when config CCO in manual mode, now we have tested against this feature/process, it works well. For case 2 in Comment 12 , we will file a new bz to track this issue. Thanks. Mark this bug as VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5.15 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4228 |