Bug 1994820
| Summary: | machine controller doesn't send vCPU quota failed messages to cluster install logs | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Karthik Perumal <kramraja> |
| Component: | Cloud Compute | Assignee: | Joel Speed <jspeed> |
| Cloud Compute sub component: | Other Providers | QA Contact: | sunzhaohua <zhsun> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | cblecker, dgoodwin, jspeed, mimccune, pbalogh, wking |
| Version: | 4.8 | Keywords: | ServiceDeliveryImpact |
| Target Milestone: | --- | ||
| Target Release: | 4.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Enhancement | |
| Doc Text: |
Feature: Machine API now reports degraded if an insufficient number of worker machines start when the cluster is installed
Reason: Previously, only operators such as auth and ingress showed as degraded in this scenario, hiding the fact the Machine API was the real issue.
Result: Now, Machine API is included in the list of failed operators, giving users a hint they should look at the state of Machines
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-10 10:36:53 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Karthik Perumal
2021-08-17 22:19:05 UTC
1. Is there any ETA on this? 2. We see a similar problem when machine-api doesn't have permission to ec2:CreateInstance (because of a bad STS role passed in). The 403 that machine-api is encountering is also not being surfaced to the install log, so the only clue we have that something is wrong is a vague "0 workers created" message. Do you consider that to be this same problem (generically, "we aren't setting cluster operator status") or do you want me to open a new bug about it? @ i seem to have had a browser malfunction, @jspeed , is there any update from our side? > 1. Is there any ETA on this? Not presently, we need to have a think about a good way to surface this. It's an install time problem IMO, we haven't historically reported broken machines as a cluster operator issue because they don't affect the general running of a cluster. If we are going to report on the cluster operator, then it needs to be a soft failure that doesn't cause upgrades to block once the cluster is up and running > 2. We see a similar problem when machine-api doesn't have permission to ec2:CreateInstance (because of a bad STS role passed in). The 403 that machine-api is encountering is also not being surfaced to the install log, so the only clue we have that something is wrong is a vague "0 workers created" message. Do you consider that to be this same problem (generically, "we aren't setting cluster operator status") or do you want me to open a new bug about it? As far as I understand the issue, this sounds the same to me as far as i know, we do not have an update on this bug. we still need to answer the issues that Joel raised in comment 15, specifically about reporting broken machines as a cluster operator issue during installation. This was discussed on the Cluster Lifecycle architecture call yesterday. We are going to add an intermediate step of making the Machine controller "progress" until it has observed all the Machines from the initial set of Machines are running. We have set up a WG to define what a degraded operator means and will advance on making the Machine API Operator degraded once we have a clearer direction on that Verified before pr merge
1. Build image with pr openshift/machine-api-operator/pull/1019
2. Create manifests and update one machineset yaml file such as 99_openshift-cluster-api_worker-machineset-0.yaml, change instanceType to invalid.
3. Set up ipi cluster.
Cluster setup failed. If we delete the failed machineset, the cluster will become to normal.
Tested similar steps on upi cluster, don't remove machineset yaml file, will create machines failed, and cluster setup failed. If remove the failed machineset, cluster will become normal.
The same steps with payload 4.11.0-0.nightly-2022-05-20-213928, the cluster installtion is successful.
05-24 16:53:18.383 level=debug msg=Still waiting for the cluster to initialize: Cluster operator machine-api is not available
05-24 17:22:13.376 level=info msg=Cluster operator baremetal Disabled is True with UnsupportedPlatform: Nothing to do on this Platform
05-24 17:22:13.376 level=info msg=Cluster operator cloud-controller-manager TrustedCABundleControllerControllerAvailable is True with AsExpected: Trusted CA Bundle Controller works as expected
05-24 17:22:13.376 level=info msg=Cluster operator cloud-controller-manager TrustedCABundleControllerControllerDegraded is False with AsExpected: Trusted CA Bundle Controller works as expected
05-24 17:22:13.377 level=info msg=Cluster operator cloud-controller-manager CloudConfigControllerAvailable is True with AsExpected: Cloud Config Controller works as expected
05-24 17:22:13.377 level=info msg=Cluster operator cloud-controller-manager CloudConfigControllerDegraded is False with AsExpected: Cloud Config Controller works as expected
05-24 17:22:13.377 level=error msg=Cluster operator cluster-autoscaler Degraded is True with MissingDependency: machine-api not ready
05-24 17:22:13.377 level=error msg=Cluster operator etcd Degraded is True with UpgradeBackupController_Error: UpgradeBackupControllerDegraded: unable to retrieve cluster version, no completed update was found in cluster version status history: [{Partial 2022-05-24 08:32:47 +0000 UTC <nil> 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest registry.build01.ci.openshift.org/ci-ln-h7pstxb/release@sha256:a570f4b607377f7fe9e09157ce4c08d6f07aed81a86ca9856c051997ac300527 false }]
05-24 17:22:13.377 level=info msg=Cluster operator etcd RecentBackup is Unknown with ControllerStarted: The etcd backup controller is starting, and will decide if recent backups are available or if a backup is required
05-24 17:22:13.377 level=info msg=Cluster operator insights SCANotAvailable is True with NotFound: Failed to pull SCA certs from https://api.openshift.com/api/accounts_mgmt/v1/certificates: OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 404: {"id":"7","kind":"Error","href":"/api/accounts_mgmt/v1/errors/7","code":"ACCT-MGMT-7","reason":"The organization (id= 1V6IJrh1cNmDxgNlAAWZRfupr3B) does not have any certificate of type sca. Enable SCA at https://access.redhat.com/management.","operation_id":"adf1f8ad-fa1e-4833-a8b8-aa7a77459db7"}
05-24 17:22:13.377 level=info msg=Cluster operator insights Disabled is False with AsExpected:
05-24 17:22:13.378 level=info msg=Cluster operator machine-api Progressing is True with SyncingResources: Progressing towards operator: 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest
05-24 17:22:13.378 level=error msg=Cluster operator machine-api Degraded is True with SyncingFailed: Failed when progressing towards operator: 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest because found 1 non running machine(s): zhsunaws222-6p2ww-worker-us-east-2a-ps77t
05-24 17:22:13.378 level=info msg=Cluster operator machine-api Available is False with Initializing: Operator is initializing
05-24 17:22:13.378 level=info msg=Cluster operator network ManagementStateDegraded is False with :
05-24 17:22:13.378 level=error msg=Cluster initialization failed because one or more operators are not functioning properly.
05-24 17:22:13.378 level=error msg=The cluster should be accessible for troubleshooting as detailed in the documentation linked below,
05-24 17:22:13.378 level=error msg=https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html
05-24 17:22:13.378 level=error msg=The 'wait-for install-complete' subcommand can then be used to continue the installation
05-24 17:22:13.378 level=error msg=failed to initialize the cluster: Cluster operator machine-api is not available
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version False True 64m Unable to apply 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest: the cluster operator machine-api has not yet successfully rolled out
$ oc get machine
NAME PHASE TYPE REGION ZONE AGE
zhsunaws222-6p2ww-master-0 Running m6i.xlarge us-east-2 us-east-2a 63m
zhsunaws222-6p2ww-master-1 Running m6i.xlarge us-east-2 us-east-2b 63m
zhsunaws222-6p2ww-master-2 Running m6i.xlarge us-east-2 us-east-2c 63m
zhsunaws222-6p2ww-worker-us-east-2a-ps77t Failed 60m
zhsunaws222-6p2ww-worker-us-east-2b-lbfb7 Running m6i.xlarge us-east-2 us-east-2b 60m
zhsunaws222-6p2ww-worker-us-east-2c-r7qr2 Running m6i.xlarge us-east-2 us-east-2c 60m
$ oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 48m
baremetal 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 60m
cloud-controller-manager 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 63m
cloud-credential 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 62m
cluster-autoscaler True False True 60m machine-api not ready
config-operator 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 62m
console 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 51m
csi-snapshot-controller 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 61m
dns 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 60m
etcd 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False True 60m UpgradeBackupControllerDegraded: unable to retrieve cluster version, no completed update was found in cluster version status history: [{Partial 2022-05-24 08:32:47 +0000 UTC <nil> 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest registry.build01.ci.openshift.org/ci-ln-h7pstxb/release@sha256:a570f4b607377f7fe9e09157ce4c08d6f07aed81a86ca9856c051997ac300527 false }]
image-registry 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 55m
ingress 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 55m
insights 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 55m
kube-apiserver 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 56m
kube-controller-manager 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 59m
kube-scheduler 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 58m
kube-storage-version-migrator 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 61m
machine-api False True True 61m Operator is initializing
$ oc edit co machine-api
status:
conditions:
- lastTransitionTime: "2022-05-24T08:35:53Z"
message: 'Progressing towards operator: 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest'
reason: SyncingResources
status: "True"
type: Progressing
- lastTransitionTime: "2022-05-24T08:39:33Z"
message: 'Failed when progressing towards operator: 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest
because found 1 non running machine(s): zhsunaws222-6p2ww-worker-us-east-2a-ps77t'
reason: SyncingFailed
status: "True"
type: Degraded
- lastTransitionTime: "2022-05-24T08:35:53Z"
message: Operator is initializing
reason: Initializing
status: "False"
type: Available
- lastTransitionTime: "2022-05-24T08:35:53Z"
status: "True"
type: Upgradeable
$ oc delete machineset zhsunaws222-6p2ww-worker-us-east-2a [18:24:45]
machineset.machine.openshift.io "zhsunaws222-6p2ww-worker-us-east-2a" deleted
$ oc get machine [18:24:55]
NAME PHASE TYPE REGION ZONE AGE
zhsunaws222-6p2ww-master-0 Running m6i.xlarge us-east-2 us-east-2a 111m
zhsunaws222-6p2ww-master-1 Running m6i.xlarge us-east-2 us-east-2b 111m
zhsunaws222-6p2ww-master-2 Running m6i.xlarge us-east-2 us-east-2c 111m
zhsunaws222-6p2ww-worker-us-east-2b-lbfb7 Running m6i.xlarge us-east-2 us-east-2b 108m
zhsunaws222-6p2ww-worker-us-east-2c-r7qr2 Running m6i.xlarge us-east-2 us-east-2c 108m
$ oc get clusterversion [18:24:59]
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False 6s Cluster version is 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest
$ oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 96m
baremetal 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 109m
cloud-controller-manager 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 111m
cloud-credential 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 111m
cluster-autoscaler 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 109m
config-operator 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 110m
console 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 99m
csi-snapshot-controller 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 109m
dns 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 109m
etcd 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 108m
image-registry 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 103m
ingress 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 104m
insights 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 104m
kube-apiserver 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 104m
kube-controller-manager 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 107m
kube-scheduler 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 106m
kube-storage-version-migrator 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 110m
machine-api 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 24s
machine-approver 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 109m
machine-config 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 108m
marketplace 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 109m
monitoring 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 100m
network 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 111m
node-tuning 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 109m
openshift-apiserver 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 104m
openshift-controller-manager 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 109m
openshift-samples 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 103m
operator-lifecycle-manager 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 109m
operator-lifecycle-manager-catalog 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 109m
operator-lifecycle-manager-packageserver 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 104m
service-ca 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 110m
storage 4.11.0-0.ci.test-2022-05-24-021537-ci-ln-h7pstxb-latest True False False 109m
I also tried to do this without using the degraded condition, only using progressing, the output of the installer is pretty much the same: INFO Waiting up to 30m0s (until 2:13PM) for bootstrapping to complete... INFO Destroying the bootstrap resources... INFO Waiting up to 40m0s (until 2:31PM) for the cluster at https://api.jspeed-test-2.devcluster.openshift.com:6443 to initialize... INFO Cluster operator baremetal Disabled is True with UnsupportedPlatform: Nothing to do on this Platform INFO Cluster operator cloud-controller-manager CloudConfigControllerAvailable is True with AsExpected: Cloud Config Controller works as expected INFO Cluster operator cloud-controller-manager CloudConfigControllerDegraded is False with AsExpected: Cloud Config Controller works as expected INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerAvailable is True with AsExpected: Trusted CA Bundle Controller works as expected INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerDegraded is False with AsExpected: Trusted CA Bundle Controller works as expected ERROR Cluster operator cluster-autoscaler Degraded is True with MissingDependency: machine-api not ready ERROR Cluster operator etcd Degraded is True with UpgradeBackupController_Error: UpgradeBackupControllerDegraded: unable to retrieve cluster version, no completed update was found in cluster version status history: [{Partial 2022-05-24 12:44:16 +0000 UTC <nil> 4.11.0-0.nightly-2022-05-24-062131 quay.io/jspeed/release@sha256:3e84ce1004b7312c8bedcde5c7f63521c1a7fc89fd8cc4564135acfd65f8562b false }] INFO Cluster operator etcd RecentBackup is Unknown with ControllerStarted: The etcd backup controller is starting, and will decide if recent backups are available or if a backup is required INFO Cluster operator insights Disabled is False with AsExpected: INFO Cluster operator insights SCANotAvailable is True with NotFound: Failed to pull SCA certs from https://api.openshift.com/api/accounts_mgmt/v1/certificates: OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 404: {"id":"7","kind":"Error","href":"/api/accounts_mgmt/v1/errors/7","code":"ACCT-MGMT-7","reason":"The organization (id= 1W4cVqx5p9Ty1StMSTk4reQMa07) does not have any certificate of type sca. Enable SCA at https://access.redhat.com/management.","operation_id":"c99eae58-342a-406b-9e36-13d0f9fa503c"} INFO Cluster operator machine-api Progressing is True with Initializing: found 1 non running machine(s): jspeed-test-2-wk64h-worker-us-east-2c-scxz7 INFO Cluster operator machine-api Available is False with Initializing: Operator is initializing INFO Cluster operator network ManagementStateDegraded is False with : ERROR Cluster initialization failed because one or more operators are not functioning properly. ERROR The cluster should be accessible for troubleshooting as detailed in the documentation linked below, ERROR https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html ERROR The 'wait-for install-complete' subcommand can then be used to continue the installation FATAL failed to initialize the cluster: Cluster operator machine-api is not available If we look at using progressing with available=true (previous post was available=false), then the error from MAPI is much harder to find, still only and info log and not the headline problem in a cluster that is having other issues (ie other operators aren't up because MAPI hasn't created enough machines): INFO Creating infrastructure resources... INFO Waiting up to 20m0s (until 5:59PM) for the Kubernetes API at https://api.jspeed-test-2.devcluster.openshift.com:6443... INFO API v1.23.3+ad897c4 up INFO Waiting up to 30m0s (until 6:10PM) for bootstrapping to complete... INFO Destroying the bootstrap resources... INFO Waiting up to 40m0s (until 6:39PM) for the cluster at https://api.jspeed-test-2.devcluster.openshift.com:6443 to initialize... ERROR Cluster operator authentication Degraded is True with IngressStateEndpoints_MissingSubsets::OAuthClientsController_SyncError::OAuthServerDeployment_PreconditionNotFulfilled::OAuthServerRouteEndpointAccessibleController_SyncError::OAuthServerServiceEndpointAccessibleController_SyncError::OAuthServerServiceEndpointsEndpointAccessibleController_SyncError::WellKnownReadyController_SyncError: IngressStateEndpointsDegraded: No subsets found for the endpoints of oauth-server ERROR OAuthClientsControllerDegraded: no ingress for host oauth-openshift.apps.jspeed-test-2.devcluster.openshift.com in route oauth-openshift in namespace openshift-authentication ERROR OAuthServerDeploymentDegraded: waiting for the oauth-openshift route to contain an admitted ingress: no admitted ingress for route oauth-openshift in namespace openshift-authentication ERROR OAuthServerDeploymentDegraded: ERROR OAuthServerRouteEndpointAccessibleControllerDegraded: route "openshift-authentication/oauth-openshift": status does not have a valid host address ERROR OAuthServerServiceEndpointAccessibleControllerDegraded: Get "https://172.30.119.56:443/healthz": dial tcp 172.30.119.56:443: connect: connection refused ERROR OAuthServerServiceEndpointsEndpointAccessibleControllerDegraded: oauth service endpoints are not ready ERROR WellKnownReadyControllerDegraded: failed to get oauth metadata from openshift-config-managed/oauth-openshift ConfigMap: configmap "oauth-openshift" not found (check authentication operator, it is supposed to create this) INFO Cluster operator authentication Available is False with OAuthServerDeployment_PreconditionNotFulfilled::OAuthServerServiceEndpointAccessibleController_EndpointUnavailable::OAuthServerServiceEndpointsEndpointAccessibleController_ResourceNotFound::ReadyIngressNodes_NoReadyIngressNodes::WellKnown_NotReady: OAuthServerServiceEndpointAccessibleControllerAvailable: Get "https://172.30.119.56:443/healthz": dial tcp 172.30.119.56:443: connect: connection refused INFO OAuthServerServiceEndpointsEndpointAccessibleControllerAvailable: endpoints "oauth-openshift" not found INFO ReadyIngressNodesAvailable: Authentication requires functional ingress which requires at least one schedulable and ready node. Got 0 worker nodes, 3 master nodes, 0 custom target nodes (none are schedulable or ready for ingress pods). INFO WellKnownAvailable: The well-known endpoint is not yet available: failed to get oauth metadata from openshift-config-managed/oauth-openshift ConfigMap: configmap "oauth-openshift" not found (check authentication operator, it is supposed to create this) INFO Cluster operator baremetal Disabled is True with UnsupportedPlatform: Nothing to do on this Platform INFO Cluster operator cloud-controller-manager CloudConfigControllerAvailable is True with AsExpected: Cloud Config Controller works as expected INFO Cluster operator cloud-controller-manager CloudConfigControllerDegraded is False with AsExpected: Cloud Config Controller works as expected INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerAvailable is True with AsExpected: Trusted CA Bundle Controller works as expected INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerDegraded is False with AsExpected: Trusted CA Bundle Controller works as expected ERROR Cluster operator console Degraded is True with DefaultRouteSync_FailedAdmitDefaultRoute::RouteHealth_RouteNotAdmitted::SyncLoopRefresh_FailedIngress: DefaultRouteSyncDegraded: no ingress for host console-openshift-console.apps.jspeed-test-2.devcluster.openshift.com in route console in namespace openshift-console ERROR RouteHealthDegraded: console route is not admitted ERROR SyncLoopRefreshDegraded: no ingress for host console-openshift-console.apps.jspeed-test-2.devcluster.openshift.com in route console in namespace openshift-console INFO Cluster operator console Available is False with RouteHealth_RouteNotAdmitted: RouteHealthAvailable: console route is not admitted ERROR Cluster operator etcd Degraded is True with UpgradeBackupController_Error: UpgradeBackupControllerDegraded: unable to retrieve cluster version, no completed update was found in cluster version status history: [{Partial 2022-05-24 16:41:17 +0000 UTC <nil> 4.11.0-0.nightly-2022-05-24-062131 quay.io/jspeed/release@sha256:3e84ce1004b7312c8bedcde5c7f63521c1a7fc89fd8cc4564135acfd65f8562b false }] INFO Cluster operator etcd RecentBackup is Unknown with ControllerStarted: The etcd backup controller is starting, and will decide if recent backups are available or if a backup is required INFO Cluster operator image-registry Available is False with NoReplicasAvailable: Available: The deployment does not have available replicas INFO NodeCADaemonAvailable: The daemon set node-ca has available replicas INFO ImagePrunerAvailable: Pruner CronJob has been created INFO Cluster operator image-registry Progressing is True with DeploymentNotCompleted: Progressing: The deployment has not completed ERROR Cluster operator image-registry Degraded is True with Unavailable: Degraded: The deployment does not have available replicas INFO Cluster operator ingress Available is False with IngressUnavailable: The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.) INFO Cluster operator ingress Progressing is True with Reconciling: Not all ingress controllers are available. ERROR Cluster operator ingress Degraded is True with IngressDegraded: The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: PodsScheduled=False (PodsNotScheduled: Some pods are not scheduled: Pod "router-default-85b5cccc4c-gpfqt" cannot be scheduled: 0/3 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. Pod "router-default-85b5cccc4c-xjcm8" cannot be scheduled: 0/3 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. Make sure you have sufficient worker nodes.), DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.), DeploymentReplicasMinAvailable=False (DeploymentMinimumReplicasNotMet: 0/2 of replicas are available, max unavailable is 1), CanaryChecksSucceeding=Unknown (CanaryRouteNotAdmitted: Canary route is not admitted by the default ingress controller) INFO Cluster operator insights SCANotAvailable is True with NotFound: Failed to pull SCA certs from https://api.openshift.com/api/accounts_mgmt/v1/certificates: OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 404: {"id":"7","kind":"Error","href":"/api/accounts_mgmt/v1/errors/7","code":"ACCT-MGMT-7","reason":"The organization (id= 1W4cVqx5p9Ty1StMSTk4reQMa07) does not have any certificate of type sca. Enable SCA at https://access.redhat.com/management.","operation_id":"5f4a53be-16cf-45f7-82f4-9284646b9937"} INFO Cluster operator insights Disabled is False with AsExpected: INFO Cluster operator machine-api Progressing is True with Initializing: found 1 non running machine(s): jspeed-test-2-gsbjg-worker-us-east-2c-bbjlj ERROR Cluster operator monitoring Degraded is True with UpdatingPrometheusOperatorFailed: Failed to rollout the stack. Error: updating prometheus operator: reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: got 2 unavailable replicas INFO Cluster operator monitoring Available is False with UpdatingPrometheusOperatorFailed: Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error. INFO Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack. INFO Cluster operator network ManagementStateDegraded is False with : INFO Cluster operator network Progressing is True with Deploying: Deployment "/openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready ERROR Cluster initialization failed because one or more operators are not functioning properly. ERROR The cluster should be accessible for troubleshooting as detailed in the documentation linked below, ERROR https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html ERROR The 'wait-for install-complete' subcommand can then be used to continue the installation FATAL failed to initialize the cluster: Some cluster operators are still updating: authentication, console, image-registry, ingress, monitoring *** Bug 2090780 has been marked as a duplicate of this bug. *** Move to verified, this is verified before pr merge. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |