Description of problem: OSD/ROSA cluster failed to complete installation due to insufficient vCPU quota available in the AWS account. From the machine's .status.errorMessage that failed provision: > error launching instance: You have requested more vCPU capacity than your current vCPU limit of 32 allows for the instance bucket that the specified instance type belongs to. Please visit http://aws.amazon.com/contact-us/ec2-request to request an adjustment to this limit. This should ideally be exposed to the cluster's install logs. Instead, the install logs only hints about a few operators being degraded (like monitoring, ingress, image-registry ) Version-Release number of selected component (if applicable): ROSA (OCP 4.8.4 on AWS) How reproducible: Seems to be easily reproducible with an AWS account lacking the required vCPU quota for the cluster to provision. I have not tried reproducing however. Steps to Reproduce: 1. Use an AWS account with insufficient vCPU quota 2. Provision a ROSA cluster on that AWS account 3. The cluster provision should fail with degraded operators Actual results: The cluster provision fails with the logs only highlighting a few cluster operators being degraded. Expected results: The cluster provision fails with the installer logs suggesting the real root cause - in this case being vCPU quota exceeded. These error messages should be available from the install logs. Additional info: This is not a outlier, as we have seen the same kind of clusterProvision failures occur a few times now. This is similar to https://bugzilla.redhat.com/show_bug.cgi?id=1943376 although that was related to a different component and different quota error.
1. Is there any ETA on this? 2. We see a similar problem when machine-api doesn't have permission to ec2:CreateInstance (because of a bad STS role passed in). The 403 that machine-api is encountering is also not being surfaced to the install log, so the only clue we have that something is wrong is a vague "0 workers created" message. Do you consider that to be this same problem (generically, "we aren't setting cluster operator status") or do you want me to open a new bug about it?
@
i seem to have had a browser malfunction, @jspeed , is there any update from our side?
> 1. Is there any ETA on this? Not presently, we need to have a think about a good way to surface this. It's an install time problem IMO, we haven't historically reported broken machines as a cluster operator issue because they don't affect the general running of a cluster. If we are going to report on the cluster operator, then it needs to be a soft failure that doesn't cause upgrades to block once the cluster is up and running > 2. We see a similar problem when machine-api doesn't have permission to ec2:CreateInstance (because of a bad STS role passed in). The 403 that machine-api is encountering is also not being surfaced to the install log, so the only clue we have that something is wrong is a vague "0 workers created" message. Do you consider that to be this same problem (generically, "we aren't setting cluster operator status") or do you want me to open a new bug about it? As far as I understand the issue, this sounds the same to me
as far as i know, we do not have an update on this bug. we still need to answer the issues that Joel raised in comment 15, specifically about reporting broken machines as a cluster operator issue during installation.
This was discussed on the Cluster Lifecycle architecture call yesterday. We are going to add an intermediate step of making the Machine controller "progress" until it has observed all the Machines from the initial set of Machines are running. We have set up a WG to define what a degraded operator means and will advance on making the Machine API Operator degraded once we have a clearer direction on that
Verified before pr merge 1. Build image with pr openshift/machine-api-operator/pull/1019 2. Create manifests and update one machineset yaml file such as 99_openshift-cluster-api_worker-machineset-0.yaml, change instanceType to invalid. 3. Set up ipi cluster. Cluster setup failed. If we delete the failed machineset, the cluster will become to normal. Tested similar steps on upi cluster, don't remove machineset yaml file, will create machines failed, and cluster setup failed. If remove the failed machineset, cluster will become normal. The same steps with payload 4.11.0-0.nightly-2022-05-20-213928, the cluster installtion is successful. 05-24 16:53:18.383 level=debug msg=Still waiting for the cluster to initialize: Cluster operator machine-api is not available 05-24 17:22:13.376 level=info msg=Cluster operator baremetal Disabled is True with UnsupportedPlatform: Nothing to do on this Platform 05-24 17:22:13.376 level=info msg=Cluster operator cloud-controller-manager TrustedCABundleControllerControllerAvailable is True with AsExpected: Trusted CA Bundle Controller works as expected 05-24 17:22:13.376 level=info msg=Cluster operator cloud-controller-manager TrustedCABundleControllerControllerDegraded is False with AsExpected: Trusted CA Bundle Controller works as expected 05-24 17:22:13.377 level=info msg=Cluster operator cloud-controller-manager CloudConfigControllerAvailable is True with AsExpected: Cloud Config Controller works as expected 05-24 17:22:13.377 level=info msg=Cluster operator cloud-controller-manager CloudConfigControllerDegraded is False with AsExpected: Cloud Config Controller works as expected 05-24 17:22:13.377 level=error msg=Cluster operator cluster-autoscaler Degraded is True with MissingDependency: machine-api not ready 05-24 17:22:13.377 level=error msg=Cluster operator etcd Degraded is True with UpgradeBackupController_Error: UpgradeBackupControllerDegraded: unable to retrieve cluster version, no completed update was found in cluster version status history: [{Partial 2022-05-24 08:32:47 +0000 UTC <nil> 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest registry.build01.ci.openshift.org/ci-ln-h7pstxb/release@sha256:a570f4b607377f7fe9e09157ce4c08d6f07aed81a86ca9856c051997ac300527 false }] 05-24 17:22:13.377 level=info msg=Cluster operator etcd RecentBackup is Unknown with ControllerStarted: The etcd backup controller is starting, and will decide if recent backups are available or if a backup is required 05-24 17:22:13.377 level=info msg=Cluster operator insights SCANotAvailable is True with NotFound: Failed to pull SCA certs from https://api.openshift.com/api/accounts_mgmt/v1/certificates: OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 404: {"id":"7","kind":"Error","href":"/api/accounts_mgmt/v1/errors/7","code":"ACCT-MGMT-7","reason":"The organization (id= 1V6IJrh1cNmDxgNlAAWZRfupr3B) does not have any certificate of type sca. Enable SCA at https://access.redhat.com/management.","operation_id":"adf1f8ad-fa1e-4833-a8b8-aa7a77459db7"} 05-24 17:22:13.377 level=info msg=Cluster operator insights Disabled is False with AsExpected: 05-24 17:22:13.378 level=info msg=Cluster operator machine-api Progressing is True with SyncingResources: Progressing towards operator: 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest 05-24 17:22:13.378 level=error msg=Cluster operator machine-api Degraded is True with SyncingFailed: Failed when progressing towards operator: 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest because found 1 non running machine(s): zhsunaws222-6p2ww-worker-us-east-2a-ps77t 05-24 17:22:13.378 level=info msg=Cluster operator machine-api Available is False with Initializing: Operator is initializing 05-24 17:22:13.378 level=info msg=Cluster operator network ManagementStateDegraded is False with : 05-24 17:22:13.378 level=error msg=Cluster initialization failed because one or more operators are not functioning properly. 05-24 17:22:13.378 level=error msg=The cluster should be accessible for troubleshooting as detailed in the documentation linked below, 05-24 17:22:13.378 level=error msg=https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html 05-24 17:22:13.378 level=error msg=The 'wait-for install-complete' subcommand can then be used to continue the installation 05-24 17:22:13.378 level=error msg=failed to initialize the cluster: Cluster operator machine-api is not available $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 64m Unable to apply 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest: the cluster operator machine-api has not yet successfully rolled out $ oc get machine NAME PHASE TYPE REGION ZONE AGE zhsunaws222-6p2ww-master-0 Running m6i.xlarge us-east-2 us-east-2a 63m zhsunaws222-6p2ww-master-1 Running m6i.xlarge us-east-2 us-east-2b 63m zhsunaws222-6p2ww-master-2 Running m6i.xlarge us-east-2 us-east-2c 63m zhsunaws222-6p2ww-worker-us-east-2a-ps77t Failed 60m zhsunaws222-6p2ww-worker-us-east-2b-lbfb7 Running m6i.xlarge us-east-2 us-east-2b 60m zhsunaws222-6p2ww-worker-us-east-2c-r7qr2 Running m6i.xlarge us-east-2 us-east-2c 60m $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 48m baremetal 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 60m cloud-controller-manager 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 63m cloud-credential 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 62m cluster-autoscaler True False True 60m machine-api not ready config-operator 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 62m console 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 51m csi-snapshot-controller 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 61m dns 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 60m etcd 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False True 60m UpgradeBackupControllerDegraded: unable to retrieve cluster version, no completed update was found in cluster version status history: [{Partial 2022-05-24 08:32:47 +0000 UTC <nil> 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest registry.build01.ci.openshift.org/ci-ln-h7pstxb/release@sha256:a570f4b607377f7fe9e09157ce4c08d6f07aed81a86ca9856c051997ac300527 false }] image-registry 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 55m ingress 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 55m insights 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 55m kube-apiserver 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 56m kube-controller-manager 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 59m kube-scheduler 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 58m kube-storage-version-migrator 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 61m machine-api False True True 61m Operator is initializing $ oc edit co machine-api status: conditions: - lastTransitionTime: "2022-05-24T08:35:53Z" message: 'Progressing towards operator: 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest' reason: SyncingResources status: "True" type: Progressing - lastTransitionTime: "2022-05-24T08:39:33Z" message: 'Failed when progressing towards operator: 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest because found 1 non running machine(s): zhsunaws222-6p2ww-worker-us-east-2a-ps77t' reason: SyncingFailed status: "True" type: Degraded - lastTransitionTime: "2022-05-24T08:35:53Z" message: Operator is initializing reason: Initializing status: "False" type: Available - lastTransitionTime: "2022-05-24T08:35:53Z" status: "True" type: Upgradeable $ oc delete machineset zhsunaws222-6p2ww-worker-us-east-2a [18:24:45] machineset.machine.openshift.io "zhsunaws222-6p2ww-worker-us-east-2a" deleted $ oc get machine [18:24:55] NAME PHASE TYPE REGION ZONE AGE zhsunaws222-6p2ww-master-0 Running m6i.xlarge us-east-2 us-east-2a 111m zhsunaws222-6p2ww-master-1 Running m6i.xlarge us-east-2 us-east-2b 111m zhsunaws222-6p2ww-master-2 Running m6i.xlarge us-east-2 us-east-2c 111m zhsunaws222-6p2ww-worker-us-east-2b-lbfb7 Running m6i.xlarge us-east-2 us-east-2b 108m zhsunaws222-6p2ww-worker-us-east-2c-r7qr2 Running m6i.xlarge us-east-2 us-east-2c 108m $ oc get clusterversion [18:24:59] NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False 6s Cluster version is 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 96m baremetal 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 109m cloud-controller-manager 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 111m cloud-credential 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 111m cluster-autoscaler 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 109m config-operator 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 110m console 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 99m csi-snapshot-controller 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 109m dns 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 109m etcd 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 108m image-registry 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 103m ingress 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 104m insights 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 104m kube-apiserver 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 104m kube-controller-manager 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 107m kube-scheduler 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 106m kube-storage-version-migrator 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 110m machine-api 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 24s machine-approver 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 109m machine-config 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 108m marketplace 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 109m monitoring 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 100m network 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 111m node-tuning 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 109m openshift-apiserver 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 104m openshift-controller-manager 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 109m openshift-samples 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 103m operator-lifecycle-manager 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 109m operator-lifecycle-manager-catalog 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 109m operator-lifecycle-manager-packageserver 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 104m service-ca 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 110m storage 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 109m
I also tried to do this without using the degraded condition, only using progressing, the output of the installer is pretty much the same: INFO Waiting up to 30m0s (until 2:13PM) for bootstrapping to complete... INFO Destroying the bootstrap resources... INFO Waiting up to 40m0s (until 2:31PM) for the cluster at https://api.jspeed-test-2.devcluster.openshift.com:6443 to initialize... INFO Cluster operator baremetal Disabled is True with UnsupportedPlatform: Nothing to do on this Platform INFO Cluster operator cloud-controller-manager CloudConfigControllerAvailable is True with AsExpected: Cloud Config Controller works as expected INFO Cluster operator cloud-controller-manager CloudConfigControllerDegraded is False with AsExpected: Cloud Config Controller works as expected INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerAvailable is True with AsExpected: Trusted CA Bundle Controller works as expected INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerDegraded is False with AsExpected: Trusted CA Bundle Controller works as expected ERROR Cluster operator cluster-autoscaler Degraded is True with MissingDependency: machine-api not ready ERROR Cluster operator etcd Degraded is True with UpgradeBackupController_Error: UpgradeBackupControllerDegraded: unable to retrieve cluster version, no completed update was found in cluster version status history: [{Partial 2022-05-24 12:44:16 +0000 UTC <nil> 4.11.0-0.nightly-2022-05-24-062131 quay.io/jspeed/release@sha256:3e84ce1004b7312c8bedcde5c7f63521c1a7fc89fd8cc4564135acfd65f8562b false }] INFO Cluster operator etcd RecentBackup is Unknown with ControllerStarted: The etcd backup controller is starting, and will decide if recent backups are available or if a backup is required INFO Cluster operator insights Disabled is False with AsExpected: INFO Cluster operator insights SCANotAvailable is True with NotFound: Failed to pull SCA certs from https://api.openshift.com/api/accounts_mgmt/v1/certificates: OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 404: {"id":"7","kind":"Error","href":"/api/accounts_mgmt/v1/errors/7","code":"ACCT-MGMT-7","reason":"The organization (id= 1W4cVqx5p9Ty1StMSTk4reQMa07) does not have any certificate of type sca. Enable SCA at https://access.redhat.com/management.","operation_id":"c99eae58-342a-406b-9e36-13d0f9fa503c"} INFO Cluster operator machine-api Progressing is True with Initializing: found 1 non running machine(s): jspeed-test-2-wk64h-worker-us-east-2c-scxz7 INFO Cluster operator machine-api Available is False with Initializing: Operator is initializing INFO Cluster operator network ManagementStateDegraded is False with : ERROR Cluster initialization failed because one or more operators are not functioning properly. ERROR The cluster should be accessible for troubleshooting as detailed in the documentation linked below, ERROR https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html ERROR The 'wait-for install-complete' subcommand can then be used to continue the installation FATAL failed to initialize the cluster: Cluster operator machine-api is not available
If we look at using progressing with available=true (previous post was available=false), then the error from MAPI is much harder to find, still only and info log and not the headline problem in a cluster that is having other issues (ie other operators aren't up because MAPI hasn't created enough machines): INFO Creating infrastructure resources... INFO Waiting up to 20m0s (until 5:59PM) for the Kubernetes API at https://api.jspeed-test-2.devcluster.openshift.com:6443... INFO API v1.23.3+ad897c4 up INFO Waiting up to 30m0s (until 6:10PM) for bootstrapping to complete... INFO Destroying the bootstrap resources... INFO Waiting up to 40m0s (until 6:39PM) for the cluster at https://api.jspeed-test-2.devcluster.openshift.com:6443 to initialize... ERROR Cluster operator authentication Degraded is True with IngressStateEndpoints_MissingSubsets::OAuthClientsController_SyncError::OAuthServerDeployment_PreconditionNotFulfilled::OAuthServerRouteEndpointAccessibleController_SyncError::OAuthServerServiceEndpointAccessibleController_SyncError::OAuthServerServiceEndpointsEndpointAccessibleController_SyncError::WellKnownReadyController_SyncError: IngressStateEndpointsDegraded: No subsets found for the endpoints of oauth-server ERROR OAuthClientsControllerDegraded: no ingress for host oauth-openshift.apps.jspeed-test-2.devcluster.openshift.com in route oauth-openshift in namespace openshift-authentication ERROR OAuthServerDeploymentDegraded: waiting for the oauth-openshift route to contain an admitted ingress: no admitted ingress for route oauth-openshift in namespace openshift-authentication ERROR OAuthServerDeploymentDegraded: ERROR OAuthServerRouteEndpointAccessibleControllerDegraded: route "openshift-authentication/oauth-openshift": status does not have a valid host address ERROR OAuthServerServiceEndpointAccessibleControllerDegraded: Get "https://172.30.119.56:443/healthz": dial tcp 172.30.119.56:443: connect: connection refused ERROR OAuthServerServiceEndpointsEndpointAccessibleControllerDegraded: oauth service endpoints are not ready ERROR WellKnownReadyControllerDegraded: failed to get oauth metadata from openshift-config-managed/oauth-openshift ConfigMap: configmap "oauth-openshift" not found (check authentication operator, it is supposed to create this) INFO Cluster operator authentication Available is False with OAuthServerDeployment_PreconditionNotFulfilled::OAuthServerServiceEndpointAccessibleController_EndpointUnavailable::OAuthServerServiceEndpointsEndpointAccessibleController_ResourceNotFound::ReadyIngressNodes_NoReadyIngressNodes::WellKnown_NotReady: OAuthServerServiceEndpointAccessibleControllerAvailable: Get "https://172.30.119.56:443/healthz": dial tcp 172.30.119.56:443: connect: connection refused INFO OAuthServerServiceEndpointsEndpointAccessibleControllerAvailable: endpoints "oauth-openshift" not found INFO ReadyIngressNodesAvailable: Authentication requires functional ingress which requires at least one schedulable and ready node. Got 0 worker nodes, 3 master nodes, 0 custom target nodes (none are schedulable or ready for ingress pods). INFO WellKnownAvailable: The well-known endpoint is not yet available: failed to get oauth metadata from openshift-config-managed/oauth-openshift ConfigMap: configmap "oauth-openshift" not found (check authentication operator, it is supposed to create this) INFO Cluster operator baremetal Disabled is True with UnsupportedPlatform: Nothing to do on this Platform INFO Cluster operator cloud-controller-manager CloudConfigControllerAvailable is True with AsExpected: Cloud Config Controller works as expected INFO Cluster operator cloud-controller-manager CloudConfigControllerDegraded is False with AsExpected: Cloud Config Controller works as expected INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerAvailable is True with AsExpected: Trusted CA Bundle Controller works as expected INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerDegraded is False with AsExpected: Trusted CA Bundle Controller works as expected ERROR Cluster operator console Degraded is True with DefaultRouteSync_FailedAdmitDefaultRoute::RouteHealth_RouteNotAdmitted::SyncLoopRefresh_FailedIngress: DefaultRouteSyncDegraded: no ingress for host console-openshift-console.apps.jspeed-test-2.devcluster.openshift.com in route console in namespace openshift-console ERROR RouteHealthDegraded: console route is not admitted ERROR SyncLoopRefreshDegraded: no ingress for host console-openshift-console.apps.jspeed-test-2.devcluster.openshift.com in route console in namespace openshift-console INFO Cluster operator console Available is False with RouteHealth_RouteNotAdmitted: RouteHealthAvailable: console route is not admitted ERROR Cluster operator etcd Degraded is True with UpgradeBackupController_Error: UpgradeBackupControllerDegraded: unable to retrieve cluster version, no completed update was found in cluster version status history: [{Partial 2022-05-24 16:41:17 +0000 UTC <nil> 4.11.0-0.nightly-2022-05-24-062131 quay.io/jspeed/release@sha256:3e84ce1004b7312c8bedcde5c7f63521c1a7fc89fd8cc4564135acfd65f8562b false }] INFO Cluster operator etcd RecentBackup is Unknown with ControllerStarted: The etcd backup controller is starting, and will decide if recent backups are available or if a backup is required INFO Cluster operator image-registry Available is False with NoReplicasAvailable: Available: The deployment does not have available replicas INFO NodeCADaemonAvailable: The daemon set node-ca has available replicas INFO ImagePrunerAvailable: Pruner CronJob has been created INFO Cluster operator image-registry Progressing is True with DeploymentNotCompleted: Progressing: The deployment has not completed ERROR Cluster operator image-registry Degraded is True with Unavailable: Degraded: The deployment does not have available replicas INFO Cluster operator ingress Available is False with IngressUnavailable: The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.) INFO Cluster operator ingress Progressing is True with Reconciling: Not all ingress controllers are available. ERROR Cluster operator ingress Degraded is True with IngressDegraded: The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: PodsScheduled=False (PodsNotScheduled: Some pods are not scheduled: Pod "router-default-85b5cccc4c-gpfqt" cannot be scheduled: 0/3 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. Pod "router-default-85b5cccc4c-xjcm8" cannot be scheduled: 0/3 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. Make sure you have sufficient worker nodes.), DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.), DeploymentReplicasMinAvailable=False (DeploymentMinimumReplicasNotMet: 0/2 of replicas are available, max unavailable is 1), CanaryChecksSucceeding=Unknown (CanaryRouteNotAdmitted: Canary route is not admitted by the default ingress controller) INFO Cluster operator insights SCANotAvailable is True with NotFound: Failed to pull SCA certs from https://api.openshift.com/api/accounts_mgmt/v1/certificates: OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 404: {"id":"7","kind":"Error","href":"/api/accounts_mgmt/v1/errors/7","code":"ACCT-MGMT-7","reason":"The organization (id= 1W4cVqx5p9Ty1StMSTk4reQMa07) does not have any certificate of type sca. Enable SCA at https://access.redhat.com/management.","operation_id":"5f4a53be-16cf-45f7-82f4-9284646b9937"} INFO Cluster operator insights Disabled is False with AsExpected: INFO Cluster operator machine-api Progressing is True with Initializing: found 1 non running machine(s): jspeed-test-2-gsbjg-worker-us-east-2c-bbjlj ERROR Cluster operator monitoring Degraded is True with UpdatingPrometheusOperatorFailed: Failed to rollout the stack. Error: updating prometheus operator: reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: got 2 unavailable replicas INFO Cluster operator monitoring Available is False with UpdatingPrometheusOperatorFailed: Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error. INFO Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack. INFO Cluster operator network ManagementStateDegraded is False with : INFO Cluster operator network Progressing is True with Deploying: Deployment "/openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready ERROR Cluster initialization failed because one or more operators are not functioning properly. ERROR The cluster should be accessible for troubleshooting as detailed in the documentation linked below, ERROR https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html ERROR The 'wait-for install-complete' subcommand can then be used to continue the installation FATAL failed to initialize the cluster: Some cluster operators are still updating: authentication, console, image-registry, ingress, monitoring
*** Bug 2090780 has been marked as a duplicate of this bug. ***
Move to verified, this is verified before pr merge.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069