Description of problem: In 4.6, the auth operator grew logic to check if the router could schedule pods, but that logic assumes the router will be scheduled on "worker"-labeled nodes [1]. Version-Release number of selected component (if applicable): 4.6 and later. How reproducible: 100%, when you have no vanilla 'worker' nodes. Steps to Reproduce: 1. Have no vanilla 'worker' nodes, but have a bunch of custom compute pools [1] 2. Try to survive on 4.6+ Actual results: Watch the authentication operator complain: Available=False ReadyIngressNodes_NoReadyIngressNodes ReadyIngressNodesAvailable: Authentication require functional ingress which requires at least one schedulable and ready node. Got 0 worker nodes and 3 master nodes (none are schedulable or ready for ingress pods). Expected results: Auth operator minds its own business, and the ingress operator complains when it is unscheduled (bug 1881155, [3]) ;). Additional info: Auth operator going Available=False on this can hang updates, e.g. updates from 4.5 into 4.6 for folks without vanilla compute nodes. Workaround: scale up at least one node with a 'node-role.kubernetes.io/worker' label and an empty value. [1]: https://github.com/openshift/cluster-authentication-operator/pull/344/files#diff-74035431d399f5431916d8624ce3080db323d3b4762cb875651311d703168425R66 [2]: https://github.com/openshift/machine-config-operator/blob/0170e082a8b8228373bd841d17555fff2cfb51b7/docs/custom-pools.md#custom-pools [3]: https://github.com/openshift/cluster-ingress-operator/pull/465
The check for "am I impacted by this?" looks like a ReadyIngressNodes_NoReadyIngressNodes Available=False authentication operator: $ oc get -o json clusteroperator authentication | jq -r '.status.conditions[] | select(.type == "Available") | .lastTransitionTime + " " + .type + " " + .status + " " + (.reason // "-") + " " + (.message // "-")' 2020-10-29T12:49:29Z Available False ReadyIngressNodes_NoReadyIngressNodes ReadyIngressNodesAvailable: Authentication require functional ingress which requires at least one schedulable and ready node. Got 0 worker nodes and 3 master nodes (none are schedulable or ready for ingress pods). combined with a lack of 'worker' nodes: $ oc get -o json nodes | jq -r '.items[] | [.status.conditions[] | select(.type == "Ready")][0] as $ready | $ready.lastTransitionTime + " " + $ready.status + " " + .metadata.name + " " + (.metadata.labels | to_entries[] | select(.key | startswith("node-role.kubernetes.io/")).key| tostring)' | sort 2020-08-28T08:41:54Z True worker-0...local node-role.kubernetes.io/app 2020-08-28T09:45:56Z True worker-2...local node-role.kubernetes.io/app 2020-10-29T13:54:51Z True worker-1...local node-role.kubernetes.io/app 2020-10-30T09:59:20Z True infra-2...local node-role.kubernetes.io/app 2020-10-30T09:59:20Z True infra-2...local node-role.kubernetes.io/infra 2020-10-30T10:01:38Z True master-1...local node-role.kubernetes.io/master 2020-10-30T10:04:24Z True master-0...local node-role.kubernetes.io/master 2020-10-30T10:07:02Z True infra-1...local node-role.kubernetes.io/app 2020-10-30T10:07:02Z True infra-1...local node-role.kubernetes.io/infra 2020-10-30T10:07:27Z True master-2...local node-role.kubernetes.io/master 2020-10-30T10:10:10Z True infra-0...local node-role.kubernetes.io/app 2020-10-30T10:10:10Z True infra-0...local node-role.kubernetes.io/infra This example cluster has nodes with "worker-..." names, but the roles are all app, infra, or master.
Who is impacted? If we have to block upgrade edges based on this issue, which edges would need blocking? example: Customers upgrading from 4.y.Z to 4.y+1.z running on GCP with thousands of namespaces, approximately 5% of the subscribed fleet example: All customers upgrading from 4.y.z to 4.y+1.z fail approximately 10% of the time What is the impact? Is it serious enough to warrant blocking edges? example: Up to 2 minute disruption in edge routing example: Up to 90seconds of API downtime example: etcd loses quorum and you have to restore from backup How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)? example: Issue resolves itself after five minutes example: Admin uses oc to fix things example: Admin must SSH to hosts, restore from backups, or other non standard admin activities Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not increase exposure)? example: No, itβs always been like this we just never noticed example: Yes, from 4.y.z to 4.y+1.z Or 4.y.z to 4.y.z+1
Is this a reliable reproducer: Steps to Reproduce: 1. Have no vanilla 'worker' nodes, but have a bunch of custom compute pools [1] 2. Try to survive on 4.6+ @mfojtik mentioned: ...If I understand the scenario right, there must be something tweaked in ingress config to make this work, right? something that put nodeSelector for router pods to "node-role.kubernetes.io/infra": "".....so to repro, we need to tweak the config and setup the nodes the way it can succeed ?
No need for an update. Steps to reproduce in [1]. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1881155#c5
Launched fresh 4.7.0-0.nightly-2020-11-04-013819 env, this payload includes the fix PR. But after following steps in bug 1881155#c5, the issue is still reproduced: $ oc get node NAME STATUS ROLES AGE VERSION ip-10-0-131-120.ap-southeast-2.compute.internal Ready master 4h45m v1.19.2+6bd0f34 ip-10-0-158-57.ap-southeast-2.compute.internal Ready infra 4h34m v1.19.2+6bd0f34 ip-10-0-163-149.ap-southeast-2.compute.internal Ready infra 4h34m v1.19.2+6bd0f34 ip-10-0-173-64.ap-southeast-2.compute.internal Ready master 4h45m v1.19.2+6bd0f34 ip-10-0-193-68.ap-southeast-2.compute.internal Ready master 4h45m v1.19.2+6bd0f34 ip-10-0-221-91.ap-southeast-2.compute.internal Ready infra 4h34m v1.19.2+6bd0f34 $ oc get ingresscontroller default -o yaml -n openshift-ingress-operator ... spec: nodePlacement: nodeSelector: matchLabels: node-role.kubernetes.io/infra: "" replicas: 2 ... $ oc -n openshift-ingress get po -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES router-default-564fcd4d9-5xxbc 1/1 Running 0 6m 10.128.2.88 ip-10-0-163-149.ap-southeast-2.compute.internal <none> <none> router-default-564fcd4d9-nbsb8 1/1 Running 0 6m 10.129.2.10 ip-10-0-221-91.ap-southeast-2.compute.internal <none> <none> $ oc get co | grep -v "4.7.0-0.nightly-2020-11-04-013819.*T.*F.*F" NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.7.0-0.nightly-2020-11-04-013819 False False False 8m9s $ oc describe co authentication Name: authentication ... Last Transition Time: 2020-11-04T07:37:17Z Message: ReadyIngressNodesAvailable: Authentication requires functional ingress which requires at least one schedulable and ready node. Got 0 worker nodes, 3 master nodes, 0 custom target nodes (none are schedulable or ready for ingress pods). Reason: ReadyIngressNodes_NoReadyIngressNodes Status: False Type: Available ... I guess the PR has some problem in function numberOfCustomIngressTargets which wrongly got 0 custom target nodes.
Per Dev's request, I have done pre-merge verification by launching cluster using the open PR cluster-authentication-operator/pull/373, not reproduced now.
Verified in 4.7.0-0.nightly-2020-11-05-010603. Everything is fine. The mad co/authentication is not reproduced: NODES=`oc get node | grep worker | grep -o "^[^ ]*"` echo $NODES oc label node $NODES node-role.kubernetes.io/infra=; oc label node $NODES node-role.kubernetes.io/worker- oc -n openshift-ingress-operator patch ingresscontroller default --type json -p '[{"op": "add", "path": "/spec/nodePlacement", "value": {"nodeSelector": {"matchLabels": {"node-role.kubernetes.io/infra": ""}}}}]' $ oc get no NAME STATUS ROLES AGE VERSION ip-10-0-155-98.ap-southeast-2.compute.internal Ready infra 60m v1.19.2+6bd0f34 ip-10-0-158-63.ap-southeast-2.compute.internal Ready master 73m v1.19.2+6bd0f34 ip-10-0-166-26.ap-southeast-2.compute.internal Ready infra 63m v1.19.2+6bd0f34 ip-10-0-180-210.ap-southeast-2.compute.internal Ready master 73m v1.19.2+6bd0f34 ip-10-0-206-75.ap-southeast-2.compute.internal Ready master 73m v1.19.2+6bd0f34 ip-10-0-222-131.ap-southeast-2.compute.internal Ready infra 60m v1.19.2+6bd0f34 $ oc -n openshift-ingress get po NAME READY STATUS RESTARTS AGE router-default-bbb78bc68-6nvw5 1/1 Running 0 2m5s router-default-bbb78bc68-schff 1/1 Running 0 2m5s $ oc get co | grep -v "4.7.*T.*F.*F" NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE $ oc get co authentication ingress NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.7.0-0.nightly-2020-11-05-010603 True False False 2m7s ingress 4.7.0-0.nightly-2020-11-05-010603 True False False 59m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633