Version: $ openshift-install version 4.8.0-0.nightly-2021-06-11-192710 Platform: AWS - IPI What happened? install failed when various operators couldn't deploy their replicas because only a single worker node existed: INFO[2021-06-12T03:46:21Z] level=error msg=Cluster operator ingress Degraded is True with IngressDegraded: The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: PodsScheduled=False (PodsNotScheduled: Some pods are not scheduled: Pod "router-default-7d49958b56-npnxn" cannot be scheduled: 0/4 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity rules, 1 node(s) didn't match pod anti-affinity rules, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. Make sure you have sufficient worker nodes.) Full logs from job run here: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-canary/1403543544680943616 must gather here: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-canary/1403543544680943616/artifacts/e2e-aws-canary/gather-must-gather/artifacts/must-gather.tar node status here: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-canary/1403543544680943616/artifacts/e2e-aws-canary/gather-extra/artifacts/nodes.json What did you expect to happen? the install to complete successfully How to reproduce it (as minimally and precisely as possible)? unknown, but this job has hit it several times and it uses the standard aws ipi CI install flow. In fact it looks like a lot of jobs are hitting it: https://search.ci.openshift.org/?search=The+%22default%22+ingress+controller+reports+Degraded%3DTrue&maxAge=48h&context=1&type=junit&name=4.8&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job
errorMessage: 'error launching instance: Your requested instance type (m4.xlarge) is not supported in your requested Availability Zone (us-west-2d). Please retry your request by not specifying an Availability Zone or choosing us-west-2a, us-west-2b, us-west-2c.'
@esimard This may be related to the recent CI changes to select the availability zone dynamically. Could you take a look at this?
Hello, I confirm what you suggested. There is an edge case (at minimum) in this region where some specific types of Instances are not available. My assumption of looking for the larger one was not right. Expanding the Instance type lookup per availability zone should fix this.
CI-only fix (verified in CI).
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759