# operator install authentication Operator unavailable (OAuthServiceCheckEndpointAccessibleController_EndpointUnavailable::OAuthServiceEndpointsCheckEndpointAccessibleController_EndpointUnavailable::ReadyIngressNodes_NoReadyIngressNodes): ReadyIngressNodesAvailable: Authentication require functional ingress which requires at least one schedulable and ready node. Got 0 worker nodes and 3 master nodes (none are schedulable or ready for ingress pods). is failing frequently in CI, see search results: https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=4.6&maxMatches=5&maxBytes=20971520&groupBy=job&search=operator+install+authentication Failing in various jobs, not limited by a platform: - https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-serial-4.6/1308046682126028800 - https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-ovirt-4.6/1307958161860202496 - https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-ovn-4.6/1308010268730593280 E0921 12:29:16.791380 36 reflector.go:307] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *v1.ClusterVersion: Get "https://api.ci-op-f98i4h6b-93317.origin-ci-int-aws.dev.rhcloud.com:6443/apis/config.openshift.io/v1/clusterversions?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dversion&resourceVersion=25744&timeoutSeconds=546&watch=true": dial tcp 44.240.39.73:6443: connect: connection refused level=error msg="Cluster operator authentication Degraded is True with IngressStateEndpoints_MissingSubsets::OAuthRouteCheckEndpointAccessibleController_SyncError::OAuthServerDeployment_DeploymentAvailableReplicasCheckFailed::OAuthServerRoute_InvalidCanonicalHost::OAuthServiceCheckEndpointAccessibleController_SyncError::OAuthServiceEndpointsCheckEndpointAccessibleController_SyncError::OAuthVersionDeployment_GetFailed::Route_InvalidCanonicalHost::WellKnownReadyController_SyncError: OAuthServiceEndpointsCheckEndpointAccessibleControllerDegraded: oauth service endpoints are not ready\nOAuthServiceCheckEndpointAccessibleControllerDegraded: Get \"https://172.30.0.10:443/healthz\": dial tcp 172.30.0.10:443: connect: connection refused\nIngressStateEndpointsDegraded: No subsets found for the endpoints of oauth-server\nOAuthRouteCheckEndpointAccessibleControllerDegraded: route status does not have host address\nRouteDegraded: Route is not available at canonical host oauth-openshift.apps.ci-op-f98i4h6b-93317.origin-ci-int-aws.dev.rhcloud.com: route status ingress is empty\nOAuthServerDeploymentDegraded: deployments.apps \"oauth-openshift\" not found\nOAuthServerRouteDegraded: Route is not available at canonical host oauth-openshift.apps.ci-op-f98i4h6b-93317.origin-ci-int-aws.dev.rhcloud.com: route status ingress is empty\nOAuthVersionDeploymentDegraded: Unable to get OAuth server deployment: deployment.apps \"oauth-openshift\" not found\nWellKnownReadyControllerDegraded: failed to get oauth metadata from openshift-config-managed/oauth-openshift ConfigMap: configmap \"oauth-openshift\" not found (check authentication operator, it is supposed to create this)" Sippy associated the failing tests with https://bugzilla.redhat.com/show_bug.cgi?id=1879633 but that looks like it should be limited to proxy jobs.
Plan to use this bugzilla to improve status reporting for network edge. There's a related jira ticket: https://issues.redhat.com/browse/NE-392.
WIP PR: https://github.com/openshift/cluster-ingress-operator/pull/465
Target set to next release version while investigation is either ongoing or pending. Will be considered for earlier release versions when diagnosed and resolved.
Taking a stab at non-QE verification: 1. Install a 4.7 cluster via cluster-bot: launch 4.7 2. Confirm the installed version: $ oc get -o jsonpath='{.status.desired.version}{"\n"}' clusterversion version 4.7.0-0.nightly-2020-10-27-051128 3. Look at our nodes: $ oc get nodes NAME STATUS ROLES AGE VERSION ci-ln-bxhlmm2-f76d1-btg2b-master-0 Ready master 34m v1.19.0+e67f5dc ci-ln-bxhlmm2-f76d1-btg2b-master-1 Ready master 34m v1.19.0+e67f5dc ci-ln-bxhlmm2-f76d1-btg2b-master-2 Ready master 34m v1.19.0+e67f5dc ci-ln-bxhlmm2-f76d1-btg2b-worker-b-qx9t6 Ready worker 26m v1.19.0+e67f5dc ci-ln-bxhlmm2-f76d1-btg2b-worker-c-rvqs5 Ready worker 29m v1.19.0+e67f5dc ci-ln-bxhlmm2-f76d1-btg2b-worker-d-kgvvz Ready worker 26m v1.19.0+e67f5dc 4. Confirm that the two relevant operators are happy: $ oc get -o json clusteroperators | jq -r '.items[] | .metadata.name as $name | select($name == "authentication" or $name == "ingress").status.conditions[] | .lastTransitionTime + " " + $name + " " + .type + " " + .status + " " + (.reason // "-") + " " + (.message // "-")' | sort 2020-11-02T19:17:15Z authentication Upgradeable True AsExpected All is well 2020-11-02T19:22:13Z ingress Available True AsExpected desired and current number of IngressControllers are equal 2020-11-02T19:22:13Z ingress Progressing False AsExpected desired and current number of IngressControllers are equal 2020-11-02T19:24:26Z ingress Degraded False NoIngressControllersDegraded - 2020-11-02T19:41:53Z authentication Progressing False AsExpected All is well 2020-11-02T19:48:29Z authentication Available True AsExpected OAuthServerDeploymentAvailable: availableReplicas==2 2020-11-02T19:48:29Z authentication Degraded False AsExpected All is well 5. Make all the compute nodes 'infra' [1]: $ oc get -o json nodes | jq -r '.items[].metadata.name' | grep worker | while read NODE; do oc label node "${NODE}" node-role.kubernetes.io/infra=; oc label node "${NODE}" node-role.kubernetes.io/worker-; done 6. Check the pods: $ oc -n openshift-ingress get pods NAME READY STATUS RESTARTS AGE router-default-d6668cf74-phpg9 1/1 Running 0 34m router-default-d6668cf74-rfbc9 1/1 Running 0 34m The fact that new pods aren't schedulable yet is ok, as long as the pods stay running. 7. See that auth got mad (bug 1893386), despite the router still being happy: $ oc get -o json clusteroperators | jq -r '.items[] | .metadata.name as $name | select($name == "authentication" or $name == "ingress").status.conditions[] | .lastTransitionTime + " " + $name + " " + .type + " " + .status + " " + (.reason // "-") + " " + (.message // "-")' | sort 2020-11-02T19:17:15Z authentication Upgradeable True AsExpected All is well 2020-11-02T19:22:13Z ingress Available True AsExpected desired and current number of IngressControllers are equal 2020-11-02T19:22:13Z ingress Progressing False AsExpected desired and current number of IngressControllers are equal 2020-11-02T19:24:26Z ingress Degraded False NoIngressControllersDegraded - 2020-11-02T19:41:53Z authentication Progressing False AsExpected All is well 2020-11-02T19:48:29Z authentication Degraded False AsExpected All is well 2020-11-02T19:51:34Z authentication Available False ReadyIngressNodes_NoReadyIngressNodes ReadyIngressNodesAvailable: Authentication requires functional ingress which requires at least one schedulable and ready node. Got 0 worker nodes and 3 master nodes (none are schedulable or ready for ingress pods). 8. Kill the router pods. The node role changes should keep them from being rescheduled: $ oc -n openshift-ingress get -o json pods | jq -r '.items[].metadata.name' | while read POD; do oc -n openshift-ingress delete pod "${POD}"; done 9. Check the pods: $ oc -n openshift-ingress get pods NAME READY STATUS RESTARTS AGE router-default-d6668cf74-9hxld 0/1 Pending 0 2m36s router-default-d6668cf74-jk85x 0/1 Pending 0 4m2s 10. Wait 10 minutes [2]: $ sleep 600 11. See that both auth and ingress are mad: $ oc get -o json clusteroperators | jq -r '.items[] | .metadata.name as $name | select($name == "authentication" or $name == "ingress").status.conditions[] | .lastTransitionTime + " " + $name + " " + .type + " " + .status + " " + (.reason // "-") + " " + (.message // "-")' 2020-11-02T19:57:15Z authentication Degraded True OAuthRouteCheckEndpointAccessibleController_SyncError OAuthRouteCheckEndpointAccessibleControllerDegraded: Get "https://oauth-openshift.apps.ci-ln-bxhlmm2-f76d1.origin-ci-int-gce.dev.openshift.com/healthz": dial tcp 35.231.72.50:443: connect: connection refused 2020-11-02T19:55:15Z authentication Progressing True OAuthVersionRoute_WaitingForRoute OAuthVersionRouteProgressing: Request to "https://oauth-openshift.apps.ci-ln-bxhlmm2-f76d1.origin-ci-int-gce.dev.openshift.com/healthz" not successfull yet 2020-11-02T19:51:34Z authentication Available False OAuthRouteCheckEndpointAccessibleController_EndpointUnavailable::OAuthVersionRoute_RequestFailed::ReadyIngressNodes_NoReadyIngressNodes ReadyIngressNodesAvailable: Authentication requires functional ingress which requires at least one schedulable and ready node. Got 0 worker nodes and 3 master nodes (none are schedulable or ready for ingress pods). OAuthVersionRouteAvailable: HTTP request to "https://oauth-openshift.apps.ci-ln-bxhlmm2-f76d1.origin-ci-int-gce.dev.openshift.com/healthz" failed: dial tcp 35.231.72.50:443: connect: connection refused OAuthRouteCheckEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.ci-ln-bxhlmm2-f76d1.origin-ci-int-gce.dev.openshift.com/healthz": dial tcp 35.231.72.50:443: connect: connection refused 2020-11-02T19:17:15Z authentication Upgradeable True AsExpected All is well 2020-11-02T19:54:48Z ingress Available False IngressUnavailable Not all ingress controllers are available. 2020-11-02T19:54:48Z ingress Progressing True Reconciling Not all ingress controllers are available. 2020-11-02T20:05:18Z ingress Degraded True IngressControllersDegraded Some ingresscontrollers are degraded: ingresscontroller "default" is degraded: DegradedConditions: One or more other status conditions indicate a degraded state: PodsScheduled=False (PodsNotScheduled: Some pods are not scheduled: Pod "router-default-d6668cf74-jk85x" cannot be scheduled: 0/6 nodes are available: 6 node(s) didn't match node selector. Pod "router-default-d6668cf74-9hxld" cannot be scheduled: 0/6 nodes are available: 6 node(s) didn't match node selector. Make sure you have sufficient worker nodes.), DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.), DeploymentReplicasMinAvailable=False (DeploymentMinimumReplicasNotMet: 0/2 of replicas are available, max unavailable is 1) You can see the "0/6 nodes are available: 6 node(s) didn't match node selector" in the ingress Degraded=True message. 12. Configure the IngressController [3] NodePlacement [4] to allow scheduling on infra nodes: $ oc -n openshift-ingress-operator patch ingresscontroller default --type json -p '[{"op": "add", "path": "/spec/nodePlacement", "value": {"nodeSelector": {"matchLabels": {"node-role.kubernetes.io/infra": ""}}}}]' 13: See that the pods are happy again: $ oc -n openshift-ingress get pods NAME READY STATUS RESTARTS AGE router-default-65b7fc9b4f-59bw5 1/1 Running 0 23s router-default-65b7fc9b4f-pml2d 1/1 Running 0 23s 14: See that auth is still sad (bug 1893386) but that ingress is correctly happy again: $ oc get -o json clusteroperators | jq -r '.items[] | .metadata.name as $name | select($name == "authentication" or $name == "ingress").status.conditions[] | .lastTransitionTime + " " + $name + " " + .type + " " + .status + " " + (.reason // "-") + " " + (.message // "-")' | sort 2020-11-02T19:17:15Z authentication Upgradeable True AsExpected All is well 2020-11-02T19:51:34Z authentication Available False ReadyIngressNodes_NoReadyIngressNodes ReadyIngressNodesAvailable: Authentication requires functional ingress which requires at least one schedulable and ready node. Got 0 worker nodes and 3 master nodes (none are schedulable or ready for ingress pods). 2020-11-02T20:10:33Z ingress Available True AsExpected desired and current number of IngressControllers are equal 2020-11-02T20:10:33Z ingress Degraded False NoIngressControllersDegraded - 2020-11-02T20:10:33Z ingress Progressing False AsExpected desired and current number of IngressControllers are equal 2020-11-02T20:10:46Z authentication Degraded False AsExpected All is well 2020-11-02T20:10:48Z authentication Progressing False AsExpected All is well [1]: https://github.com/openshift/machine-config-operator/blob/0170e082a8b8228373bd841d17555fff2cfb51b7/docs/custom-pools.md#creating-a-custom-pool [2]: https://github.com/openshift/cluster-ingress-operator/pull/465/files#diff-56b131774a926e7a0e30a9be7dac7bf5c5cec11ff709aa6604cecc9ef117ede2R360 [3]: https://docs.openshift.com/container-platform/4.6/networking/ingress-operator.html#configuring-ingress-controller [4]: https://github.com/openshift/api/blob/9252afb032e11093b53406ae80e0acb3410603b2/operator/v1/types_ingress.go#L131-L137
In reference to the attached PR related to enhancement with reporting of ingress, with "4.6.0-0.nightly-2020-11-07-035509" it is noted that clearer deployment degrades reasoning and "PodsScheduled" state field are getting displayed: ----- 2020-11-09T09:13:25Z ingress Available False IngressUnavailable Not all ingress controllers are available. 2020-11-09T09:13:25Z ingress Progressing True Reconciling Not all ingress controllers are available. 2020-11-09T09:23:26Z ingress Degraded True IngressControllersDegraded Some ingresscontrollers are degraded: ingresscontroller "internalapps" is degraded: DegradedConditions: One or more other status conditions indicate a degraded state: PodsScheduled=False (PodsNotScheduled: Some pods are not scheduled: Pod "router-internalapps-66bc4c5dc-hjqqn" cannot be scheduled: 0/6 nodes are available: 6 node(s) didn't match node selector. Make sure you have sufficient worker nodes.), DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.), DeploymentReplicasMinAvailable=False (DeploymentMinimumReplicasNotMet: 0/1 of replicas are available, max unavailable is 0) ----- Hence marking as "verified"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633