This bug's PR is dev-approved and not yet merged, so I'm following issue DPTP-660 to do the pre-merge verifying for QE pre-merge verification goal of issue OCPQE-815 by using the bot to launch a cluster with the open PR. Here is the verification steps: To verify this PR, we need to do with single node cluster, in 4.8 there is no fallback yet, the backoff only applies to failing installers, not failing operands. 1. set installer error probability to 1.0 $ oc edit kubeapiserver cluster $ oc get kubeapiserver cluster -oyaml | grep -A2 unsupportedConfigOverrides unsupportedConfigOverrides: installerErrorInjection: failPropability: 1.0 2. trigger a revision and wait until the retry backoff goes up to maximum (10 min, after roughly 10 retries) $ oc patch kubeapiserver/cluster --type=json -p '[ {"op": "replace", "path": "/spec/forceRedeploymentReason", "value": "roll-'"$( date --rfc-3339=ns )"'"} ]' Wed 01 Sep 2021 12:23:28 PM CST oc get co | grep -v '.True.*False.*False' NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE kube-apiserver 4.8.0-0.ci.test-2021-09-01-032024-ci-ln-w76hfgk-latest True True False 36m Wed 01 Sep 2021 12:23:29 PM CST oc get pod -n openshift-kube-apiserver --show-labels NAME READY STATUS RESTARTS AGE LABELS ... installer-7-ip-xx-x-xxx-xxx.us-west-1.compute.internal 0/1 Completed 0 33m app=installer kube-apiserver-ip-xx-x-xxx-xxx.us-west-1.compute.internal 5/5 Running 0 31m apiserver=true,app=openshift-kube-apiserver,revision=7 revision-pruner-4-ip-xx-x-xxx-xxx.us-west-1.compute.internal 0/1 Completed 0 36m app=pruner revision-pruner-7-ip-xx-x-xxx-xxx.us-west-1.compute.internal 0/1 Completed 0 29m app=pruner Wed 01 Sep 2021 12:23:30 PM CST oc get kubeapiserver -oyaml | grep -A15 'latestAvailableRevision' latestAvailableRevision: 8 latestAvailableRevisionReason: "" nodeStatuses: - currentRevision: 7 nodeName: ip-xx-x-xxx-xxx.us-west-1.compute.internal targetRevision: 8 readyReplicas: 0 kind: List metadata: resourceVersion: "" selfLink: "" ... Wed 01 Sep 2021 12:23:36 PM CST oc get co | grep -v '.True.*False.*False' NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE kube-apiserver 4.8.0-0.ci.test-2021-09-01-032024-ci-ln-w76hfgk-latest True True False 36m Wed 01 Sep 2021 12:23:38 PM CST oc get pod -n openshift-kube-apiserver --show-labels NAME READY STATUS RESTARTS AGE LABELS ... installer-8-ip-xx-x-xxx-xxx.us-west-1.compute.internal 0/1 Error 0 8s app=installer kube-apiserver-ip-xx-x-xxx-xxx.us-west-1.compute.internal 5/5 Running 0 31m apiserver=true,app=openshift-kube-apiserver,revision=7 ... Wed 01 Sep 2021 12:23:39 PM CST oc get kubeapiserver -oyaml | grep -A15 'latestAvailableRevision' latestAvailableRevision: 8 latestAvailableRevisionReason: "" nodeStatuses: - currentRevision: 7 lastFailedCount: 1 lastFailedRevision: 8 lastFailedRevisionErrors: - no detailed termination message, see `oc get -oyaml -n "openshift-kube-apiserver" pods "installer-8-ip-xx-x-xxx-xxx.us-west-1.compute.internal"` lastFailedTime: "2021-09-01T04:23:35Z" nodeName: ip-xx-x-xxx-xxx.us-west-1.compute.internal targetRevision: 8 readyReplicas: 0 kind: List metadata: resourceVersion: "" selfLink: "" ... Wed 01 Sep 2021 01:04:22 PM CST oc get co | grep -v '.True.*False.*False' NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE kube-apiserver 4.8.0-0.ci.test-2021-09-01-032024-ci-ln-w76hfgk-latest True True True 77m Wed 01 Sep 2021 01:04:23 PM CST oc get pod -n openshift-kube-apiserver --show-labels NAME READY STATUS RESTARTS AGE LABELS ... installer-8-retry-1-ip-xx-x-xxx-xxx.us-west-1.compute.internal 0/1 Error 0 40m app=installer installer-8-retry-2-ip-xx-x-xxx-xxx.us-west-1.compute.internal 0/1 Error 0 39m app=installer installer-8-retry-3-ip-xx-x-xxx-xxx.us-west-1.compute.internal 0/1 Error 0 38m app=installer installer-8-retry-4-ip-xx-x-xxx-xxx.us-west-1.compute.internal 0/1 Error 0 37m app=installer installer-8-retry-5-ip-xx-x-xxx-xxx.us-west-1.compute.internal 0/1 Error 0 35m app=installer installer-8-retry-6-ip-xx-x-xxx-xxx.us-west-1.compute.internal 0/1 Error 0 32m app=installer installer-8-retry-7-ip-xx-x-xxx-xxx.us-west-1.compute.internal 0/1 Error 0 20m app=installer installer-8-retry-8-ip-xx-x-xxx-xxx.us-west-1.compute.internal 0/1 Error 0 14m app=installer installer-8-retry-9-ip-xx-x-xxx-xxx.us-west-1.compute.internal 0/1 Error 0 8s app=installer kube-apiserver-ip-xx-x-xxx-xxx.us-west-1.compute.internal 5/5 Running 0 72m apiserver=true,app=openshift-kube-apiserver,revision=7 ... Wed 01 Sep 2021 01:04:24 PM CST oc get kubeapiserver -oyaml | grep -A15 'latestAvailableRevision' latestAvailableRevision: 8 latestAvailableRevisionReason: "" nodeStatuses: - currentRevision: 7 lastFailedCount: 10 lastFailedRevision: 8 lastFailedRevisionErrors: - no detailed termination message, see `oc get -oyaml -n "openshift-kube-apiserver" pods "installer-8-retry-9-ip-xx-x-xxx-xxx.us-west-1.compute.internal"` lastFailedTime: "2021-09-01T05:04:22Z" nodeName: ip-xx-x-xxx-xxx.us-west-1.compute.internal targetRevision: 8 readyReplicas: 0 kind: List metadata: resourceVersion: "" selfLink: "" 3. after the first 10 retries just remove the failPropability 1.0 and trigger a new revision at the same time Wed Sep 01 13:04:32 [kewang@kewang-fedora]$ oc edit kubeapiserver cluster kubeapiserver.operator.openshift.io/cluster edited Wed Sep 01 13:04:46 [kewang@kewang-fedora]$ oc patch kubeapiserver/cluster --type=json -p '[ {"op": "replace", "path": "/spec/forceRedeploymentReason", "value": "roll-'"$( date --rfc-3339=ns )"'"} ]' kubeapiserver.operator.openshift.io/cluster patched ... 4. watch that the new installer of the new revision is created within <<10min, but rather seconds. Wed 01 Sep 2021 01:05:19 PM CST oc get co | grep -v '.True.*False.*False' NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE kube-apiserver 4.8.0-0.ci.test-2021-09-01-032024-ci-ln-w76hfgk-latest True True True 78m ----> New installer 9 is created. Wed 01 Sep 2021 01:05:20 PM CST oc get pod -n openshift-kube-apiserver --show-labels NAME READY STATUS RESTARTS AGE LABELS ... installer-8-ip-xx-x-xxx-xxx.us-west-1.compute.internal 0/1 Error 0 41m app=installer ... installer-8-retry-9-ip-xx-x-xxx-xxx.us-west-1.compute.internal 0/1 Error 0 65s app=installer installer-9-ip-xx-x-xxx-xxx.us-west-1.compute.internal 0/1 ContainerCreating 0 2s app=installer kube-apiserver-ip-xx-x-xxx-xxx.us-west-1.compute.internal 5/5 Running 0 73m apiserver=true,app=openshift-kube-apiserver,revision=7 ... Wed 01 Sep 2021 01:05:21 PM CST oc get kubeapiserver -oyaml | grep -A15 'latestAvailableRevision' latestAvailableRevision: 9 latestAvailableRevisionReason: "" nodeStatuses: - currentRevision: 7 lastFailedRevision: 8 nodeName: ip-xx-x-xxx-xxx.us-west-1.compute.internal targetRevision: 9 readyReplicas: 0 kind: List metadata: resourceVersion: "" selfLink: "" Wed 01 Sep 2021 01:05:27 PM CST oc get co | grep -v '.True.*False.*False' NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE kube-apiserver 4.8.0-0.ci.test-2021-09-01-032024-ci-ln-w76hfgk-latest True True True 78m Wed 01 Sep 2021 01:05:28 PM CST oc get pod -n openshift-kube-apiserver --show-labels NAME READY STATUS RESTARTS AGE LABELS ... installer-8-ip-xx-x-xxx-xxx.us-west-1.compute.internal 0/1 Error 0 41m app=installer ... installer-9-ip-xx-x-xxx-xxx.us-west-1.compute.internal 1/1 Running 0 10s app=installer kube-apiserver-ip-xx-x-xxx-xxx.us-west-1.compute.internal 5/5 Running 0 73m apiserver=true,app=openshift-kube-apiserver,revision=7 ... Wed 01 Sep 2021 01:05:29 PM CST oc get kubeapiserver -oyaml | grep -A15 'latestAvailableRevision' latestAvailableRevision: 9 latestAvailableRevisionReason: "" nodeStatuses: - currentRevision: 7 lastFailedRevision: 8 nodeName: ip-xx-x-xxx-xxx.us-west-1.compute.internal targetRevision: 9 readyReplicas: 0 kind: List metadata: resourceVersion: "" selfLink: "" Wed 01 Sep 2021 01:05:35 PM CST oc get co | grep -v '.True.*False.*False' NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE kube-apiserver 4.8.0-0.ci.test-2021-09-01-032024-ci-ln-w76hfgk-latest True True True 78m Wed 01 Sep 2021 01:05:36 PM CST oc get pod -n openshift-kube-apiserver --show-labels NAME READY STATUS RESTARTS AGE LABELS ... installer-8-retry-9-ip-xx-x-xxx-xxx.us-west-1.compute.internal 0/1 Error 0 81s app=installer installer-9-ip-xx-x-xxx-xxx.us-west-1.compute.internal 0/1 Completed 0 18s app=installer kube-apiserver-ip-xx-x-xxx-xxx.us-west-1.compute.internal 5/5 Running 0 73m apiserver=true,app=openshift-kube-apiserver,revision=7 ... Wed 01 Sep 2021 01:09:05 PM CST oc get co | grep -v '.True.*False.*False' Wed 01 Sep 2021 01:09:06 PM CST oc get pod -n openshift-kube-apiserver --show-labels Wed 01 Sep 2021 01:09:06 PM CST oc get kubeapiserver -oyaml | grep -A15 'latestAvailableRevision' ... Wed 01 Sep 2021 01:10:04 PM CST oc get co | grep -v '.True.*False.*False' Wed 01 Sep 2021 01:10:05 PM CST oc get pod -n openshift-kube-apiserver --show-labels Wed 01 Sep 2021 01:10:05 PM CST oc get kubeapiserver -oyaml | grep -A15 'latestAvailableRevision' latestAvailableRevision: 9 latestAvailableRevisionReason: "" nodeStatuses: - currentRevision: 7 lastFailedRevision: 8 nodeName: ip-xx-x-xxx-xxx.us-west-1.compute.internal targetRevision: 9 readyReplicas: 0 kind: List metadata: resourceVersion: "" selfLink: "" Wed 01 Sep 2021 01:10:31 PM CST oc get co | grep -v '.True.*False.*False' NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE kube-apiserver 4.8.0-0.ci.test-2021-09-01-032024-ci-ln-w76hfgk-latest True True True 83m Wed 01 Sep 2021 01:10:32 PM CST oc get pod -n openshift-kube-apiserver --show-labels NAME READY STATUS RESTARTS AGE LABELS ... installer-8-retry-9-ip-xx-x-xxx-xxx.us-west-1.compute.internal 0/1 Error 0 6m17s app=installer installer-9-ip-xx-x-xxx-xxx.us-west-1.compute.internal 0/1 Completed 0 5m14s app=installer kube-apiserver-ip-xx-x-xxx-xxx.us-west-1.compute.internal 5/5 Running 0 21s apiserver=true,app=openshift-kube-apiserver,revision=9 revision-pruner-4-ip-xx-x-xxx-xxx.us-west-1.compute.internal 0/1 Completed 0 83m app=pruner revision-pruner-7-ip-xx-x-xxx-xxx.us-west-1.compute.internal 0/1 Completed 0 76m app=pruner revision-pruner-8-ip-xx-x-xxx-xxx.us-west-1.compute.internal 0/1 Completed 0 46m app=pruner Wed 01 Sep 2021 01:10:33 PM CST oc get kubeapiserver -oyaml | grep -A15 'latestAvailableRevision' latestAvailableRevision: 9 latestAvailableRevisionReason: "" nodeStatuses: - currentRevision: 7 lastFailedRevision: 8 nodeName: ip-xx-x-xxx-xxx.us-west-1.compute.internal targetRevision: 9 readyReplicas: 0 kind: List metadata: resourceVersion: "" selfLink: "" ... Wed 01 Sep 2021 01:11:36 PM CST oc get co | grep -v '.True.*False.*False' NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE Wed 01 Sep 2021 01:11:37 PM CST oc get pod -n openshift-kube-apiserver --show-labels NAME READY STATUS RESTARTS AGE LABELS ... kube-apiserver-ip-xx-x-xxx-xxx.us-west-1.compute.internal 5/5 Running 0 86s apiserver=true,app=openshift-kube-apiserver,revision=9 ... <---- the new installer of the new revision was created Above results are as expected. So the bug is pre-merge verified. After the PR gets merged, the bug will be moved to VERIFIED by the bot automatically
Can't wait for the robot to move the bug VERIFIED, because the Errata relevant person in charge to urge.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.8.12 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3511