Verify this bug using payload 4.3.0-0.nightly-2020-02-14-234906, downgrade to 4.2.19 still failed. # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.3.0-0.nightly-2020-02-14-234906 True False 9m11s Cluster version is 4.3.0-0.nightly-2020-02-14-234906 # oc adm upgrade --to-image='quay.io/openshift-release-dev/ocp-release@sha256:b51a0c316bb0c11686e6b038ec7c9f7ff96763f47a53c3443ac82e8c054bc035' --allow-explicit-upgrade Updating to release image quay.io/openshift-release-dev/ocp-release@sha256:b51a0c316bb0c11686e6b038ec7c9f7ff96763f47a53c3443ac82e8c054bc035 # oc get pod -n openshift-cluster-version NAME READY STATUS RESTARTS AGE cluster-version-operator-6d78ff4f8f-ng6fl 0/1 Error 4 15m version--qgqbz-29vmc 0/1 Completed 0 15m # oc logs cluster-version-operator-6d78ff4f8f-ng6fl -n openshift-cluster-version ... ... I0215 06:08:49.510678 1 request.go:530] Throttling request took 793.363675ms, request: GET:https://127.0.0.1:6443/apis/apps/v1/namespaces/openshift-cluster-machine-approver/deployments/machine-approver E0215 06:08:49.515290 1 runtime.go:69] Observed a panic: "index out of range" (runtime error: index out of range) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:76 /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65 /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51 /opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/panic.go:522 /opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/panic.go:44 /go/src/github.com/openshift/cluster-version-operator/lib/resourcemerge/core.go:69 /go/src/github.com/openshift/cluster-version-operator/lib/resourcemerge/core.go:28 /go/src/github.com/openshift/cluster-version-operator/lib/resourcemerge/core.go:23 /go/src/github.com/openshift/cluster-version-operator/lib/resourcemerge/apps.go:27 /go/src/github.com/openshift/cluster-version-operator/lib/resourceapply/apps.go:29 /go/src/github.com/openshift/cluster-version-operator/lib/resourcebuilder/apps.go:70 /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:593 /go/src/github.com/openshift/cluster-version-operator/pkg/payload/task.go:71 /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:588 /go/src/github.com/openshift/cluster-version-operator/pkg/payload/task_graph.go:591 /opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/asm_amd64.s:1337 panic: runtime error: index out of range [recovered] panic: runtime error: index out of range goroutine 196 [running]: github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x105 panic(0x13e1c20, 0x2540480) /opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/panic.go:522 +0x1b5 github.com/openshift/cluster-version-operator/lib/resourcemerge.ensureContainers(0xc001b7f65f, 0xc0019e38b0, 0xc00095fe00, 0x1, 0x1) /go/src/github.com/openshift/cluster-version-operator/lib/resourcemerge/core.go:69 +0x799 github.com/openshift/cluster-version-operator/lib/resourcemerge.ensurePodSpec(0xc001b7f65f, 0xc0019e3880, 0xc001223c20, 0x1, 0x1, 0x0, 0x0, 0x0, 0xc00095fe00, 0x1, ...) /go/src/github.com/openshift/cluster-version-operator/lib/resourcemerge/core.go:28 +0xc6 github.com/openshift/cluster-version-operator/lib/resourcemerge.ensurePodTemplateSpec(0xc001b7f65f, 0xc0019e3798, 0xc000d75560, 0x10, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /go/src/github.com/openshift/cluster-version-operator/lib/resourcemerge/core.go:23 +0xd0 github.com/openshift/cluster-version-operator/lib/resourcemerge.EnsureDeployment(0xc001b7f65f, 0xc0019e3680, 0x12b0c24, 0xa, 0xc000d75d30, 0x7, 0xc000d75180, 0x10, 0x0, 0x0, ...) /go/src/github.com/openshift/cluster-version-operator/lib/resourcemerge/apps.go:27 +0x172 github.com/openshift/cluster-version-operator/lib/resourceapply.ApplyDeployment(0x17a4d60, 0xc0002c9ff0, 0xc0019e3200, 0x20, 0x2573d20, 0xa, 0xc00069d698) /go/src/github.com/openshift/cluster-version-operator/lib/resourceapply/apps.go:29 +0x1b0 github.com/openshift/cluster-version-operator/lib/resourcebuilder.(*deploymentBuilder).Do(0xc00147b480, 0x17de660, 0xc0012f52c0, 0xc00147b480, 0xc00147b480) /go/src/github.com/openshift/cluster-version-operator/lib/resourcebuilder/apps.go:70 +0xeb github.com/openshift/cluster-version-operator/pkg/cvo.(*resourceBuilder).Apply(0xc001027f80, 0x17de660, 0xc0012f52c0, 0xc00093a6e0, 0x0, 0x30, 0x200) /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:593 +0xb6 github.com/openshift/cluster-version-operator/pkg/payload.(*Task).Run(0xc000ad5b80, 0x17de660, 0xc0012f52c0, 0xc000ceee00, 0x6, 0x17a3840, 0xc001027f80, 0x0, 0x0, 0x0) /go/src/github.com/openshift/cluster-version-operator/pkg/payload/task.go:71 +0xb0 github.com/openshift/cluster-version-operator/pkg/cvo.(*SyncWorker).apply.func1(0x17de660, 0xc0012f52c0, 0xc0012305b8, 0x7, 0x149, 0x2, 0x2) /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:588 +0x37d github.com/openshift/cluster-version-operator/pkg/payload.RunGraph.func2(0xc000dad790, 0x17de660, 0xc0012f52c0, 0xc0011e8d20, 0xc001026b10, 0xc0010489a0, 0xc0011e8d80, 0xa) /go/src/github.com/openshift/cluster-version-operator/pkg/payload/task_graph.go:591 +0x289 created by github.com/openshift/cluster-version-operator/pkg/payload.RunGraph /go/src/github.com/openshift/cluster-version-operator/pkg/payload/task_graph.go:577 +0x23b
> Verify this bug using payload 4.3.0-0.nightly-2020-02-14-234906, downgrade to 4.2.19 still failed. On upgrades and downgrades, the important CVO is that for the target release. So for 4.3.0-0.nightly-2020-02-14-234906 -> 4.2.19, you will still hit this failure mode because 4.2.19 does not contain the patch. If you want to independently verify this fix for this 4.3 Bugzilla, you will need to use a whatever -> 4.3-nightly upgrade in which there is a manifest change that removes either a container or a service port that was not the final entry in its list (e.g. see the unit test removing the test-A container [1]). I'm not sure an appropriate source release image exists off the shelf. You could create one by adding additional ports to a service in your target 4.3 nightly. Or you could just verify this bug by saying "we don't see any regressions" and then test the 4.3->4.2 downgrade as part of verifying the 4.2.z bug 1800346. [1]: https://github.com/openshift/cluster-version-operator/pull/282/files#diff-415c13f11ffc32696c5d69b900b3fe58R251-R268
Digging into the manifest change that triggered the initial issue. We don't have 4.3.0-0.nightly-2019-12-12-155629 around anymore, but we do have the temporally close 4.3.0-0.nightly-2019-12-13-072740. Comparing between the 4.3 nightly and 4.2.10: $ oc adm release extract --to 4.2.10 quay.io/openshift-release-dev/ocp-release:4.2.10 $ oc adm release extract --to 4.3.0-0.nightly-2019-12-13-072740 quay.io/openshift-release-dev/ocp-release-nightly:4.3.0-0.nightly-2019-12-13-072740 $ diff -U3 4.2.10/0000_50_cluster-machine-approver_02-deployment.yaml 4.3.0-0.nightly-2019-12-13-072740/0000_50_cluster-machine-approver_04-deployment.yaml --- 4.2.10/0000_50_cluster-machine-approver_02-deployment.yaml 2019-12-02 22:52:11.000000000 -0800 +++ 4.3.0-0.nightly-2019-12-13-072740/0000_50_cluster-machine-approver_04-deployment.yaml 2019-12-06 16:35:48.000000000 -0800 @@ -21,8 +23,31 @@ hostNetwork: true serviceAccountName: machine-approver-sa containers: + - args: + ... + name: kube-rbac-proxy + ... + name: machine-approver-tls - name: machine-approver-controller ... so the issue is that the kube-rbac-proxy container spec (the first entry in that array) is being removed, and subsequent iteration into the machine-approver-controller container spec hits the panic. Unless 4.4 -> 4.3 downgrades were hitting a similar panic already, you'd need to synthesize another change like this (or by adding a Service port) in order to verify this 4.3.z bug in a whatever -> 4.3-nightly upgrade/downgrade.
Thanks very much for the detailed explanation, I must be confused the CVO version when running downgrade. Actually I should already made the 4.4 -> 4.3 downgrade test in https://bugzilla.redhat.com/show_bug.cgi?id=1783221#c7. Since there's no issue of 4.3 CVO, I'll prefer to move this bug to verified and will test the initial problem in BZ#1800346. Thanks.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0528