The way the fix was scoped [1], verification for 4.11 is going to be tricky. Perhaps: 1. Install a recent 4.9.z. 2. Install a broken webhook. Whatever the webhook is supposed to do doesn't work, but kube-apiserver is otherwise oblivious to the issue. 3. Ask the cluster to update directly to a 4.11 nightly. Direct 4.9 -> 4.11 updates are a terrible idea for anyone who cares about their cluster, but the kube-apiserver is early in the update, and maybe we'll get that far before things blow up. 4. Cluster updates towards 4.10. 5. As the 4.11 kube-apiserver operator comes in, the new controller [1] takes a look around, sees the broken webhooks, and (before this fix) sets Degraded=True or (with this fix) stays Degraded=False. If the CVO gets past the kube-apiserver ClusterOperator and starts asking later components to update, we can confirm that the 4.11 fix is working as expected. [1]: https://github.com/openshift/cluster-kube-apiserver-operator/pull/1309#discussion_r802636750
Refer to the Comment 1, verification as below, 1. Install a recent 4.9.z. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.21 True False 12m Cluster version is 4.9.21 2. Install a broken webhook. $ cat webhook-deploy.yaml # Let the targetPort inconsistent with containerPort, will cause webhook failed. apiVersion: v1 kind: Namespace metadata: name: validationwebhook --- apiVersion: v1 kind: Service metadata: name: validationwebhook namespace: validationwebhook spec: selector: app: validationwebhook ports: - protocol: TCP port: 443 targetPort: 8444 --- apiVersion: apps/v1 kind: Deployment metadata: labels: app: validationwebhook name: validationwebhook namespace: validationwebhook spec: replicas: 1 selector: matchLabels: app: validationwebhook template: metadata: labels: app: validationwebhook spec: containers: - name: test1 image: quay.io/wangke19/test1:v1 imagePullPolicy: Always ports: - containerPort: 8443 ------- $ cat webhook-registration.yaml apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingWebhookConfiguration metadata: name: validationwebhook.validationwebhook.svc annotations: service.beta.openshift.io/inject-cabundle: "true" webhooks: - name: validationwebhook.validationwebhook.svc failurePolicy: Fail rules: - apiGroups: ["*"] apiVersions: ["v1"] operations: ["UPDATE"] resources: ["nodes"] clientConfig: service: namespace: validationwebhook name: validationwebhook path: "/" admissionReviewVersions: ["v1"] sideEffects: None Thu Feb 10 18:46:34 [kewang@kewang-fedora]$ oc apply -f webhook-deploy.yaml namespace/validationwebhook created service/validationwebhook created deployment.apps/validationwebhook created Thu Feb 10 18:46:54 [kewang@kewang-fedora]$ oc apply -f webhook-registration.yaml validatingwebhookconfiguration.admissionregistration.k8s.io/validationwebhook.validationwebhook.svc created webhook ran into error, Thu Feb 10 18:55:01 [kewang@kewang-fedora]$ oc get pod -n validationwebhook NAME READY STATUS RESTARTS AGE validationwebhook-7478c99bd-78g75 0/1 CrashLoopBackOff 6 (2m11s ago) 8m29s After a while, check the kube-apiserver, it doesn't cate webhook status, Thu Feb 10 18:55:33 [kewang@kewang-fedora]$ oc get co/kube-apiserver NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE kube-apiserver 4.9.21 True False False 32m 3. Ask the cluster to update directly to a 4.11 nightly 4.11.0-0.nightly-2022-02-10-031822(Already landed the PR fix) Thu Feb 10 18:56:17 [kewang@kewang-fedora]$ oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-02-10-031822 --allow-explicit-upgrade=true --force warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead warning: The requested upgrade image is not one of the available updates. You have used --allow-explicit-upgrade to the update to proceed anyway warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures. Updating to release image registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-02-10-031822 Thu Feb 10 18:58:50 [kewang@kewang-fedora]$ oc get co/kube-apiserver;oc get clusterversion NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE kube-apiserver 4.9.21 True True False 35m NodeInstallerProgressing: 3 nodes are at revision 10; 0 nodes have achieved new revision 12 NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.21 True True 2m18s Working towards 4.11.0-0.nightly-2022-02-10-031822: 94 of 770 done (12% complete) Thu Feb 10 18:59:10 [kewang@kewang-fedora]$ oc get co/kube-apiserver;oc get clusterversion NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE kube-apiserver 4.9.21 True True False 41m NodeInstallerProgressing: 1 nodes are at revision 10; 2 nodes are at revision 12 NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.21 True True 8m19s Working towards 4.11.0-0.nightly-2022-02-10-031822: 95 of 770 done (12% complete) ... Thu Feb 10 20:15:32 [kewang@kewang-fedora]$ oc get co/kube-apiserver;oc get clusterversion NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE kube-apiserver 4.11.0-0.nightly-2022-02-10-031822 True False False 112m NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.21 True True 79m Unable to apply 4.11.0-0.nightly-2022-02-10-031822: wait has exceeded 40 minutes for these operators: machine-config Thu Feb 10 20:15:56 [kewang@kewang-fedora]$ oc get co/machine-config NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE machine-config 4.9.21 True True True 114m Unable to apply 4.11.0-0.nightly-2022-02-10-031822: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: controller version mismatch for rendered-master-b41efcb481e1920fbd445f1b6daa2729 expected 9629f56063b846202dea4ac24a722b963296a46f has ce39533a3346da509f49379c537133eea9bb6c06: all 3 nodes are at latest configuration rendered-master-b41efcb481e1920fbd445f1b6daa2729, retrying Finally, the upgrade got stuck in machine-config updating, we don't care that for this bug verification, I tested the same case with upgrade path 4.10 -> 4.11 nightly and 4.11 nightly A -> 4.11 nightly B, both got stuck in kube-apiserver updating as the following, Thu Feb 10 12:42:56 [kewang@kewang-fedora]$ oc get co/kube-apiserver -w NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE kube-apiserver 4.10.0-rc.1 True False True 120m ValidatingAdmissionWebhookConfigurationDegraded: validationwebhook.validationwebhook.svc: dial tcp 172.30.84.193:443: connect: no route to host kube-apiserver 4.10.0-rc.1 True False True 120m ValidatingAdmissionWebhookConfigurationDegraded: validationwebhook.validationwebhook.svc: dial tcp 172.30.84.193:443: i/o timeout Thu Feb 10 12:43:27 [kewang@kewang-fedora]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-rc.1 True True 3m50s Working towards 4.11.0-0.nightly-2022-02-09-185722: 94 of 770 done (12% complete) Based on above, As the 4.11 kube-apiserver operator comes in, the new controller takes a look around, sees the broken webhooks, and (before this fix) sets Degraded=True or (with this fix) stays Degraded=False, the behavior is as expected, move the bug VERIFIED.
moving it to assigned since we have decided to rename the condition (remove Degraded suffix) so as not to block upgrade.
Retest the upgrade from 4.10.0-rc.1 to latest 4.11 including PR fix, steps see below, $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-rc.1 True False 12m Cluster version is 4.10.0-rc.1 $ oc get co/kube-apiserver # rc.1 without PR fix, kube-apiserver will stay DEGRADED NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE kube-apiserver 4.10.0-rc.1 True False True 18m ValidatingAdmissionWebhookConfigurationDegraded: validationwebhook.validationwebhook.svc: dial tcp 172.30.201.34:443: i/o timeout $ oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-02-11-014337 --allow-explicit-upgrade=true --force warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead warning: The requested upgrade image is not one of the available updates. You have used --allow-explicit-upgrade to the update to proceed anyway warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures. Updating to release image registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-02-11-014337 $ oc get co/kube-apiserver;echo;oc get clusterversion NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE kube-apiserver 4.10.0-rc.1 True False True 19m ValidatingAdmissionWebhookConfigurationDegraded: validationwebhook.validationwebhook.svc: dial tcp 172.30.201.34:443: i/o timeout NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-rc.1 True True 19s Working towards 4.11.0-0.nightly-2022-02-11-014337: 6 of 770 done (0% complete) $ oc get co/kube-apiserver;echo;oc get clusterversion # As the 4.11 kube-apiserver operator comes in, will go on smoothly without above ValidatingAdmissionWebhookConfigurationDegraded condition NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE kube-apiserver 4.10.0-rc.1 True True False 26m NodeInstallerProgressing: 2 nodes are at revision 9; 1 nodes are at revision 10 NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-rc.1 True True 7m12s Working towards 4.11.0-0.nightly-2022-02-11-014337: 95 of 770 done (12% complete) ... $ oc get co/kube-apiserver;echo;oc get clusterversion NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE kube-apiserver 4.11.0-0.nightly-2022-02-11-014337 True False False 42m NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-rc.1 True True 23m Working towards 4.11.0-0.nightly-2022-02-11-014337: 383 of 770 done (49% complete) $ oc get co/kube-apiserver;echo;oc get clusterversion NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE kube-apiserver 4.11.0-0.nightly-2022-02-11-014337 True False False 107m NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-rc.1 True True 88m Working towards 4.11.0-0.nightly-2022-02-11-014337: 648 of 770 done (84% complete) $ oc get co/kube-apiserver;echo;oc get clusterversion NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE kube-apiserver 4.11.0-0.nightly-2022-02-11-014337 True False False 124m NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-rc.1 True True 105m Unable to apply 4.11.0-0.nightly-2022-02-11-014337: wait has exceeded 40 minutes for these operators: machine-config Finally, the upgrade still got stuck in machine-config updating, seems hit the bug 2000937 I've seen it before. Anyway, the broken webhooks are no longer a problem for upgrade, move the bug VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069