Installed on bare-metal (VMs) with assisted installer $ oc get clusteroperator -oyaml cloud-credential ... status: conditions: ... - lastTransitionTime: "2020-10-07T18:28:25Z" message: Parent credential secret kube-system/aws-creds must be restored prior to upgrade reason: CredentialsRootSecretMissing status: "False" type: Upgradeable which blocks upgrades and results in an ClusterNotUpgradeable alert Version-Release number of selected component (if applicable): 4.6 How reproducible: Every time Steps to Reproduce: 1. Install a cluster with assisted installer on bare-metal 2. 3. Actual results: Cluster is not upgradeable Expected results: Cluster is upgradeable as cloud provider creds are not a thing on bare-metal Additional info: $ oc get infrastructures.config.openshift.io -oyaml cluster ... spec: cloudConfig: name: "" platformSpec: type: BareMetal status: ... platform: BareMetal platformStatus: baremetal: apiServerInternalIP: 10.42.11.10 ingressIP: 10.42.11.11 type: BareMetal $ oc get cloudcredentials.operator.openshift.io -oyaml cluster ... spec: credentialsMode: "" Seems like this was done in response to this https://bugzilla.redhat.com/show_bug.cgi?id=1879628 https://github.com/openshift/cloud-credential-operator/pull/248
If a workaround is needed, just create an empty dummy secret, we don't check the contents. This will only block an upgrade to 4.7, and now 4.6.1. Will try to determine how AWS code is triggering despite a non-AWS infrastructure.
Discussed w/ Devan, that was a typo s/now/not/ and conclusion is that this is not a 4.6 GA blocker but will need to be resolved in a 4.6.z before 4.7 upgrades are expected to be supported. Moving to 4.7 and cloning for resolution in 4.6.z.
test payload: registry.svc.ci.openshift.org/ocp/release:4.7.0-0.nightly-2020-10-27-051128 I see the similar issue when provision with Manual mode on GCP. Provision on Azure is ok.(when testing bug https://bugzilla.redhat.com/show_bug.cgi?id=1884691) ### $ oc get co cloud-credential -o json | jq -r ".status.conditions" [ { "lastTransitionTime": "2020-11-06T02:54:08Z", "message": "Credential minting is disabled by cluster admin", "reason": "OperatorDisabledByAdmin", "status": "True", "type": "Available" }, { "lastTransitionTime": "2020-11-06T02:54:08Z", "status": "False", "type": "Degraded" }, { "lastTransitionTime": "2020-11-06T02:54:08Z", "status": "False", "type": "Progressing" }, { "lastTransitionTime": "2020-11-06T02:54:08Z", "message": "Parent credential secret must be restored prior to upgrade: kube-system/gcp-credentials", "reason": "MissingRootCredential", "status": "False", "type": "Upgradeable" } ] $ oc get cloudcredential cluster -o json | jq -r ".spec" { "credentialsMode": "Manual", "logLevel": "Normal" } $ oc get infrastructure cluster -o json | jq -r ".status" { "apiServerInternalURI": "https://api-int.lwan-jk-manual-gcp.qe.gcp.devcluster.openshift.com:6443", "apiServerURL": "https://api.lwan-jk-manual-gcp.qe.gcp.devcluster.openshift.com:6443", "etcdDiscoveryDomain": "lwan-jk-manual-gcp.qe.gcp.devcluster.openshift.com", "infrastructureName": "lwan-jk-manual-gcp-wkwnt", "platform": "GCP", "platformStatus": { "gcp": { "projectID": "openshift-qe", "region": "us-central1" }, "type": "GCP" } }
I've looked through the code and can't immediately see how this is possible, we fork the actuator implementation based on the infrastructure status platform (https://github.com/openshift/cloud-credential-operator/blob/master/pkg/operator/controller.go#L84), bare metal platform type should be falling through to the default case and adding a dummy actuator which contains no logic and does nothing, it explicitly codes to always set upgradable true. I can't yet see how a bare metal cluster would be instantiating the AWS actuator, but that appears to be what is happening in this case. It would help a lot to get onto a metal cluster to debug, I'd like to see the infrastructure CR in full (though we have most of that), but most importantly the cloud-credential-operator pod logs. Unfortunately we do not have access to such a cluster/hardware. I will try to track down someone who can get us onto one.
oc adm must-gather would be ideal.
I'm not seeing this issue in 4.6 any more for platform: BareMetal. 4.6.6 and higher to be specific. It could have been fixed before 4.6.6 as well. Wang Lin, is this still an issue for you? If not, feel free to close.
FWIW I think Lin's issue is probably a separate bug, in that case it looks like GCP is correctly running the GCP actuator, but logic may be incorrectly assuming it needs an admin cred.