Bug 1893386 - false-positive ReadyIngressNodes_NoReadyIngressNodes: Auth operator makes risky "worker" assumption when guessing about ingress availability
Summary: false-positive ReadyIngressNodes_NoReadyIngressNodes: Auth operator makes ris...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: apiserver-auth
Version: 4.6
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.7.0
Assignee: Michal Fojtik
QA Contact: Xingxing Xia
Depends On:
Blocks: 1893803
TreeView+ depends on / blocked
Reported: 2020-10-30 23:49 UTC by W. Trevor King
Modified: 2023-09-15 00:50 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Authentication operator only watched config resources with name "cluster", however the "ingress" config resource name is "default". Consequence: Authentication operator ignored changes in "ingress" config which led to wrong assumption that there are no schedulable "worker" nodes when "ingress" was configured with custom node selector. Fix: Do not filter config resources using "cluster" name, but instead watch all config resources, regardless of their name. Result: Operator properly observe the ingress config change and reconcile worker nodes availability.
Clone Of:
: 1893803 (view as bug list)
Last Closed: 2021-02-24 15:29:19 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift cluster-authentication-operator pull 370 0 None closed bug 1893386: update ingress node available to handle custom placement 2021-02-10 13:45:03 UTC
Github openshift cluster-authentication-operator pull 373 0 None closed Bug 1893386: Fix wrong operator config informer 2021-02-10 13:45:03 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:29:55 UTC

Description W. Trevor King 2020-10-30 23:49:24 UTC
Description of problem:

In 4.6, the auth operator grew logic to check if the router could schedule pods, but that logic assumes the router will be scheduled on "worker"-labeled nodes [1].

Version-Release number of selected component (if applicable):

4.6 and later.

How reproducible:

100%, when you have no vanilla 'worker' nodes.

Steps to Reproduce:

1. Have no vanilla 'worker' nodes, but have a bunch of custom compute pools [1]
2. Try to survive on 4.6+

Actual results:

Watch the authentication operator complain: Available=False ReadyIngressNodes_NoReadyIngressNodes ReadyIngressNodesAvailable: Authentication require functional ingress which requires at least one schedulable and ready node. Got 0 worker nodes and 3 master nodes (none are schedulable or ready for ingress pods).

Expected results:

Auth operator minds its own business, and the ingress operator complains when it is unscheduled (bug 1881155, [3]) ;).

Additional info:

Auth operator going Available=False on this can hang updates, e.g. updates from 4.5 into 4.6 for folks without vanilla compute nodes.

Workaround: scale up at least one node with a 'node-role.kubernetes.io/worker' label and an empty value.

[1]: https://github.com/openshift/cluster-authentication-operator/pull/344/files#diff-74035431d399f5431916d8624ce3080db323d3b4762cb875651311d703168425R66
[2]: https://github.com/openshift/machine-config-operator/blob/0170e082a8b8228373bd841d17555fff2cfb51b7/docs/custom-pools.md#custom-pools
[3]: https://github.com/openshift/cluster-ingress-operator/pull/465

Comment 1 W. Trevor King 2020-10-31 02:14:13 UTC
The check for "am I impacted by this?" looks like a ReadyIngressNodes_NoReadyIngressNodes Available=False authentication operator:

$ oc get -o json clusteroperator authentication | jq -r '.status.conditions[] | select(.type == "Available") | .lastTransitionTime + " " + .type + " " + .status + " " + (.reason // "-") + " " + (.message // "-")'
2020-10-29T12:49:29Z Available False ReadyIngressNodes_NoReadyIngressNodes ReadyIngressNodesAvailable: Authentication require functional ingress which requires at least one schedulable and ready node. Got 0 worker nodes and 3 master nodes (none are schedulable or ready for ingress pods).

combined with a lack of 'worker' nodes:

$ oc get -o json nodes | jq -r '.items[] | [.status.conditions[] | select(.type == "Ready")][0] as $ready | $ready.lastTransitionTime + " " + $ready.status + " " + .metadata.name + " " + (.metadata.labels | to_entries[] | select(.key | startswith("node-role.kubernetes.io/")).key| tostring)' | sort
2020-08-28T08:41:54Z True worker-0...local node-role.kubernetes.io/app
2020-08-28T09:45:56Z True worker-2...local node-role.kubernetes.io/app
2020-10-29T13:54:51Z True worker-1...local node-role.kubernetes.io/app
2020-10-30T09:59:20Z True infra-2...local node-role.kubernetes.io/app
2020-10-30T09:59:20Z True infra-2...local node-role.kubernetes.io/infra
2020-10-30T10:01:38Z True master-1...local node-role.kubernetes.io/master
2020-10-30T10:04:24Z True master-0...local node-role.kubernetes.io/master
2020-10-30T10:07:02Z True infra-1...local node-role.kubernetes.io/app
2020-10-30T10:07:02Z True infra-1...local node-role.kubernetes.io/infra
2020-10-30T10:07:27Z True master-2...local node-role.kubernetes.io/master
2020-10-30T10:10:10Z True infra-0...local node-role.kubernetes.io/app
2020-10-30T10:10:10Z True infra-0...local node-role.kubernetes.io/infra

This example cluster has nodes with "worker-..." names, but the roles are all app, infra, or master.

Comment 5 Lalatendu Mohanty 2020-11-02 11:50:46 UTC
Who is impacted?  If we have to block upgrade edges based on this issue, which edges would need blocking?
  example: Customers upgrading from 4.y.Z to 4.y+1.z running on GCP with thousands of namespaces, approximately 5% of the subscribed fleet
  example: All customers upgrading from 4.y.z to 4.y+1.z fail approximately 10% of the time
What is the impact?  Is it serious enough to warrant blocking edges?
  example: Up to 2 minute disruption in edge routing
  example: Up to 90seconds of API downtime
  example: etcd loses quorum and you have to restore from backup
How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)?
  example: Issue resolves itself after five minutes
  example: Admin uses oc to fix things
  example: Admin must SSH to hosts, restore from backups, or other non standard admin activities
Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not increase exposure)?
  example: No, it’s always been like this we just never noticed
  example: Yes, from 4.y.z to 4.y+1.z Or 4.y.z to 4.y.z+1

Comment 7 Mike Fiedler 2020-11-03 15:04:46 UTC
Is this a reliable reproducer:

Steps to Reproduce:

1. Have no vanilla 'worker' nodes, but have a bunch of custom compute pools [1]
2. Try to survive on 4.6+

@mfojtik mentioned:

...If I understand the scenario right, there must be something tweaked in ingress config to make this work, right? something that put nodeSelector for router pods to "node-role.kubernetes.io/infra": "".....so to repro, we need to tweak the config and setup the nodes the way it can succeed ?

Comment 8 W. Trevor King 2020-11-03 15:20:09 UTC
No need for an update.  Steps to reproduce in [1].

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1881155#c5

Comment 11 Xingxing Xia 2020-11-04 07:50:35 UTC
Launched fresh 4.7.0-0.nightly-2020-11-04-013819 env, this payload includes the fix PR. But after following steps in bug 1881155#c5, the issue is still reproduced:
$ oc get node
NAME                                              STATUS   ROLES    AGE     VERSION
ip-10-0-131-120.ap-southeast-2.compute.internal   Ready    master   4h45m   v1.19.2+6bd0f34
ip-10-0-158-57.ap-southeast-2.compute.internal    Ready    infra    4h34m   v1.19.2+6bd0f34
ip-10-0-163-149.ap-southeast-2.compute.internal   Ready    infra    4h34m   v1.19.2+6bd0f34
ip-10-0-173-64.ap-southeast-2.compute.internal    Ready    master   4h45m   v1.19.2+6bd0f34
ip-10-0-193-68.ap-southeast-2.compute.internal    Ready    master   4h45m   v1.19.2+6bd0f34
ip-10-0-221-91.ap-southeast-2.compute.internal    Ready    infra    4h34m   v1.19.2+6bd0f34
$ oc get ingresscontroller default -o yaml -n openshift-ingress-operator
        node-role.kubernetes.io/infra: ""
  replicas: 2
$ oc -n openshift-ingress get po -o wide
NAME                             READY   STATUS    RESTARTS   AGE   IP            NODE                                              NOMINATED NODE   READINESS GATES
router-default-564fcd4d9-5xxbc   1/1     Running   0          6m   ip-10-0-163-149.ap-southeast-2.compute.internal   <none>           <none>
router-default-564fcd4d9-nbsb8   1/1     Running   0          6m   ip-10-0-221-91.ap-southeast-2.compute.internal    <none>           <none>
$ oc get co | grep -v "4.7.0-0.nightly-2020-11-04-013819.*T.*F.*F"
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.7.0-0.nightly-2020-11-04-013819   False       False         False      8m9s
$ oc describe co authentication
Name:         authentication
    Last Transition Time:  2020-11-04T07:37:17Z
    Message:               ReadyIngressNodesAvailable: Authentication requires functional ingress which requires at least one schedulable and ready node. Got 0 worker nodes, 3 master nodes, 0 custom target nodes (none are schedulable or ready for ingress pods).
    Reason:                ReadyIngressNodes_NoReadyIngressNodes
    Status:                False
    Type:                  Available
I guess the PR has some problem in function numberOfCustomIngressTargets which wrongly got 0 custom target nodes.

Comment 12 Xingxing Xia 2020-11-04 10:49:43 UTC
Per Dev's request, I have done pre-merge verification by launching cluster using the open PR cluster-authentication-operator/pull/373, not reproduced now.

Comment 14 Xingxing Xia 2020-11-05 03:34:51 UTC
Verified in 4.7.0-0.nightly-2020-11-05-010603. Everything is fine. The mad co/authentication is not reproduced:
NODES=`oc get node | grep worker | grep -o "^[^ ]*"`
echo $NODES
oc label node $NODES node-role.kubernetes.io/infra=; oc label node $NODES node-role.kubernetes.io/worker-
oc -n openshift-ingress-operator patch ingresscontroller default --type json -p '[{"op": "add", "path": "/spec/nodePlacement", "value": {"nodeSelector": {"matchLabels": {"node-role.kubernetes.io/infra": ""}}}}]'

$ oc get no
NAME                                              STATUS   ROLES    AGE   VERSION
ip-10-0-155-98.ap-southeast-2.compute.internal    Ready    infra    60m   v1.19.2+6bd0f34
ip-10-0-158-63.ap-southeast-2.compute.internal    Ready    master   73m   v1.19.2+6bd0f34
ip-10-0-166-26.ap-southeast-2.compute.internal    Ready    infra    63m   v1.19.2+6bd0f34
ip-10-0-180-210.ap-southeast-2.compute.internal   Ready    master   73m   v1.19.2+6bd0f34
ip-10-0-206-75.ap-southeast-2.compute.internal    Ready    master   73m   v1.19.2+6bd0f34
ip-10-0-222-131.ap-southeast-2.compute.internal   Ready    infra    60m   v1.19.2+6bd0f34
$ oc -n openshift-ingress get po
NAME                             READY   STATUS    RESTARTS   AGE
router-default-bbb78bc68-6nvw5   1/1     Running   0          2m5s
router-default-bbb78bc68-schff   1/1     Running   0          2m5s
$ oc get co | grep -v "4.7.*T.*F.*F"
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
$ oc get co authentication ingress 
NAME             VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication   4.7.0-0.nightly-2020-11-05-010603   True        False         False      2m7s
ingress          4.7.0-0.nightly-2020-11-05-010603   True        False         False      59m

Comment 18 errata-xmlrpc 2021-02-24 15:29:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Comment 20 Red Hat Bugzilla 2023-09-15 00:50:35 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.