Bug 1792842 - downgrade can not start due to cvo pod panic
Summary: downgrade can not start due to cvo pod panic
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.4.0
Assignee: Clayton Coleman
QA Contact: liujia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-20 07:28 UTC by liujia
Modified: 2020-05-04 11:25 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-04 11:25:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-version-operator pull 303 0 None closed Bug 1793051: cvo: pass limiter when creating a store 2020-07-13 16:58:24 UTC
Red Hat Product Errata RHBA-2020:0581 0 None None None 2020-05-04 11:25:55 UTC

Description liujia 2020-01-20 07:28:32 UTC
Description of problem:
Downgrade from B to A(A has upgraded to B successfully), downgrade can not start due to cvo pod panic.

# ./oc get po -n openshift-cluster-version
NAME                                        READY   STATUS             RESTARTS   AGE
cluster-version-operator-6db487dfc7-89dzl   0/1     CrashLoopBackOff   6          28m

I0120 07:18:46.181187       1 cvo.go:468] Desired version from spec is v1.Update{Version:"", Image:"registry.svc.ci.openshift.org/ocp/release@sha256:7467ac3dd703da09a1d5f63e9c61c2c3e04ff25c96bf58c66a715aafaa73697a", Force:true}
I0120 07:18:46.181845       1 sync_worker.go:471] Running sync registry.svc.ci.openshift.org/ocp/release@sha256:7467ac3dd703da09a1d5f63e9c61c2c3e04ff25c96bf58c66a715aafaa73697a (force=true) on generation 3 in state Updating at attempt 0
I0120 07:18:46.181928       1 sync_worker.go:477] Loading payload
E0120 07:18:46.182087       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)


Version-Release number of the following components:
4.4.0-0.nightly-2020-01-20-002356

How reproducible:
always

Steps to Reproduce:
1. install ocp 4.3.0-rc.3
2. trigger upgrade from 4.3.0-rc.3 to 4.4.0-0.nightly-2020-01-20-002356
# ./oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-01-20-002356   True        False         50s     Cluster version is 4.4.0-0.nightly-2020-01-20-002356

3. trigger downgrade from 4.4.0-0.nightly-2020-01-20-002356 to 4.3.0-rc.3
# ./oc adm upgrade --to-image registry.svc.ci.openshift.org/ocp/release@sha256:7467ac3dd703da09a1d5f63e9c61c2c3e04ff25c96bf58c66a715aafaa73697a --allow-explicit-upgrade --force
Updating to release image registry.svc.ci.openshift.org/ocp/release@sha256:7467ac3dd703da09a1d5f63e9c61c2c3e04ff25c96bf58c66a715aafaa73697a

Actual results:
downgrade can not start.

Expected results:
downgrade succeed.

Additional info:
[root@preserve-jliu-honey tmp]# ./oc get clusterversion -o json|jq .items[].status.history
[
  {
    "completionTime": "2020-01-20T07:00:49Z",
    "image": "registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-01-20-002356",
    "startedTime": "2020-01-20T06:23:32Z",
    "state": "Completed",
    "verified": false,
    "version": "4.4.0-0.nightly-2020-01-20-002356"
  },
  {
    "completionTime": "2020-01-20T06:22:02Z",
    "image": "registry.svc.ci.openshift.org/ocp/release@sha256:7467ac3dd703da09a1d5f63e9c61c2c3e04ff25c96bf58c66a715aafaa73697a",
    "startedTime": "2020-01-20T05:56:37Z",
    "state": "Completed",
    "verified": false,
    "version": "4.3.0-rc.3"
  }
]
[root@preserve-jliu-honey tmp]# ./oc get clusterversion -o json|jq .items[].spec.desiredUpdate
{
  "force": true,
  "image": "registry.svc.ci.openshift.org/ocp/release@sha256:7467ac3dd703da09a1d5f63e9c61c2c3e04ff25c96bf58c66a715aafaa73697a",
  "version": ""
}
[root@preserve-jliu-honey tmp]# ./oc adm upgrade
Cluster version is 4.4.0-0.nightly-2020-01-20-002356

warning: Cannot display available updates:
  Reason: VersionNotFound
  Message: Unable to retrieve available updates: currently installed version 4.4.0-0.nightly-2020-01-20-002356 not found in the "stable-4.3" channel

Comment 1 liujia 2020-01-20 07:29:25 UTC
I0120 07:17:05.732070       1 start.go:19] ClusterVersionOperator v4.4.0-202001172346-dirty
I0120 07:17:05.732511       1 merged_client_builder.go:122] Using in-cluster configuration
I0120 07:17:05.739427       1 payload.go:210] Loading updatepayload from "/"
I0120 07:17:06.037461       1 cvo.go:264] Verifying release authenticity: All release image digests must have GPG signatures from verifier-public-key-redhat (567E347AD0044ADE55BA8A5F199E2F91FD431D51: Red Hat, Inc. (release key 2) <security>, B08B659EE86AF623BC90E8DB938A80CAF21541EB: Red Hat, Inc. (beta key 2) <security>) - will check for signatures in containers/image format at https://storage.googleapis.com/openshift-release/official/signatures/openshift/release, https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release and from config maps in openshift-config-managed with label "release.openshift.io/verification-signatures"
I0120 07:17:06.037882       1 leaderelection.go:241] attempting to acquire leader lease  openshift-cluster-version/version...
I0120 07:17:06.050011       1 leaderelection.go:350] lock is held by jliu-4-p7f28-m-0.c.openshift-qe.internal_15561870-9d43-4f43-b7f5-c63ca86dd9b1 and has not yet expired
I0120 07:17:06.050039       1 leaderelection.go:246] failed to acquire lease openshift-cluster-version/version
I0120 07:17:51.341033       1 leaderelection.go:350] lock is held by jliu-4-p7f28-m-0.c.openshift-qe.internal_15561870-9d43-4f43-b7f5-c63ca86dd9b1 and has not yet expired
I0120 07:17:51.341056       1 leaderelection.go:246] failed to acquire lease openshift-cluster-version/version
I0120 07:18:46.078405       1 leaderelection.go:251] successfully acquired lease openshift-cluster-version/version
I0120 07:18:46.079582       1 reflector.go:120] Starting reflector *v1.ClusterOperator (3m19.747206386s) from github.com/openshift/client-go/config/informers/externalversions/factory.go:101
I0120 07:18:46.079620       1 reflector.go:158] Listing and watching *v1.ClusterOperator from github.com/openshift/client-go/config/informers/externalversions/factory.go:101
I0120 07:18:46.080171       1 reflector.go:120] Starting reflector *v1.ClusterVersion (3m12.559234557s) from github.com/openshift/client-go/config/informers/externalversions/factory.go:101
I0120 07:18:46.080196       1 reflector.go:158] Listing and watching *v1.ClusterVersion from github.com/openshift/client-go/config/informers/externalversions/factory.go:101
I0120 07:18:46.080390       1 event.go:255] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"openshift-cluster-version", Name:"version", UID:"7ed9bdfa-cf36-4d3a-803c-f471dc40203f", APIVersion:"v1", ResourceVersion:"46452", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' jliu-4-p7f28-m-0.c.openshift-qe.internal_6874545d-522e-4de5-8e76-58641c12494c became leader
I0120 07:18:46.080437       1 cvo.go:332] Starting ClusterVersionOperator with minimum reconcile period 2m52.525702462s
I0120 07:18:46.080694       1 reflector.go:120] Starting reflector *v1.Proxy (3m19.747206386s) from github.com/openshift/client-go/config/informers/externalversions/factory.go:101
I0120 07:18:46.080769       1 reflector.go:158] Listing and watching *v1.Proxy from github.com/openshift/client-go/config/informers/externalversions/factory.go:101
I0120 07:18:46.082926       1 reflector.go:120] Starting reflector *v1.ConfigMap (3m52.861090565s) from k8s.io/client-go/informers/factory.go:134
I0120 07:18:46.083008       1 reflector.go:158] Listing and watching *v1.ConfigMap from k8s.io/client-go/informers/factory.go:134
I0120 07:18:46.180627       1 shared_informer.go:227] caches populated
I0120 07:18:46.180832       1 cvo.go:526] Started syncing upgradeable "openshift-cluster-version/version" (2020-01-20 07:18:46.180822966 +0000 UTC m=+100.466486359)
I0120 07:18:46.180930       1 cvo.go:528] Finished syncing upgradeable "openshift-cluster-version/version" (103.979µs)
I0120 07:18:46.180958       1 cvo.go:503] Started syncing available updates "openshift-cluster-version/version" (2020-01-20 07:18:46.180955724 +0000 UTC m=+100.466619070)
I0120 07:18:46.181121       1 cvo.go:439] Started syncing cluster version "openshift-cluster-version/version" (2020-01-20 07:18:46.181114063 +0000 UTC m=+100.466777390)
I0120 07:18:46.181187       1 cvo.go:468] Desired version from spec is v1.Update{Version:"", Image:"registry.svc.ci.openshift.org/ocp/release@sha256:7467ac3dd703da09a1d5f63e9c61c2c3e04ff25c96bf58c66a715aafaa73697a", Force:true}
I0120 07:18:46.181845       1 sync_worker.go:471] Running sync registry.svc.ci.openshift.org/ocp/release@sha256:7467ac3dd703da09a1d5f63e9c61c2c3e04ff25c96bf58c66a715aafaa73697a (force=true) on generation 3 in state Updating at attempt 0
I0120 07:18:46.181928       1 sync_worker.go:477] Loading payload
E0120 07:18:46.182087       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 179 [running]:
github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime.logPanic(0x15e54e0, 0x27a5500)
	/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0xa3
github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x82
panic(0x15e54e0, 0x27a5500)
	/opt/rh/go-toolset-1.13/root/usr/lib/go-toolset-1.13-golang/src/runtime/panic.go:679 +0x1b2
sync.(*Mutex).Lock(...)
	/opt/rh/go-toolset-1.13/root/usr/lib/go-toolset-1.13-golang/src/sync/mutex.go:74
github.com/openshift/cluster-version-operator/vendor/golang.org/x/time/rate.(*Limiter).reserveN(0x0, 0xbf8172f58ad8ebca, 0x176456b42b, 0x27c1fc0, 0x1, 0x7fffffffffffffff, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/openshift/cluster-version-operator/vendor/golang.org/x/time/rate/rate.go:305 +0x68
github.com/openshift/cluster-version-operator/vendor/golang.org/x/time/rate.(*Limiter).ReserveN(...)
	/go/src/github.com/openshift/cluster-version-operator/vendor/golang.org/x/time/rate/rate.go:212
github.com/openshift/cluster-version-operator/vendor/golang.org/x/time/rate.(*Limiter).Reserve(0x0, 0x0)
	/go/src/github.com/openshift/cluster-version-operator/vendor/golang.org/x/time/rate/rate.go:194 +0xb5
github.com/openshift/cluster-version-operator/pkg/verify/verifyconfigmap.(*Store).DigestSignatures(0xc000d97d60, 0x1a53c00, 0xc000a65da0, 0xc00138ea2a, 0x47, 0xc0012bb3b0, 0x47, 0x2, 0x3, 0x0)
	/go/src/github.com/openshift/cluster-version-operator/pkg/verify/verifyconfigmap/store.go:74 +0x72
github.com/openshift/cluster-version-operator/pkg/verify.(*ReleaseVerifier).Verify(0xc0002a4960, 0x1a53c00, 0xc000a65da0, 0xc00138ea2a, 0x47, 0xc00109d5f8, 0x146867e)
	/go/src/github.com/openshift/cluster-version-operator/pkg/verify/verify.go:247 +0x5c7
github.com/openshift/cluster-version-operator/pkg/cvo.(*payloadRetriever).RetrievePayload(0xc0001d2a00, 0x1a53c00, 0xc000a65da0, 0x0, 0x0, 0xc00138ea00, 0x71, 0x1, 0x0, 0x0, ...)
	/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/updatepayload.go:91 +0x150
github.com/openshift/cluster-version-operator/pkg/cvo.(*SyncWorker).syncOnce(0xc0000c71e0, 0x1a53c00, 0xc000a65da0, 0xc000980e40, 0x10, 0x1a0bba0, 0xc0008de650, 0x0, 0x0)
	/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:485 +0x4ea
github.com/openshift/cluster-version-operator/pkg/cvo.(*SyncWorker).Start.func1.1(0xc000980e40, 0xc0000c71e0, 0x1a53b80, 0xc000e5a680, 0x10, 0x0, 0x0)
	/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:314 +0x217
github.com/openshift/cluster-version-operator/pkg/cvo.(*SyncWorker).Start.func1()
	/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:315 +0x2ed
github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc000058f80)
	/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x5e
github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00109df80, 0x989680, 0x0, 0x1, 0xc001344de0)
	/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0xf8
github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until(...)
	/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
github.com/openshift/cluster-version-operator/pkg/cvo.(*SyncWorker).Start(0xc0000c71e0, 0x1a53b80, 0xc000e5a680, 0x10)
	/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:250 +0xf7
created by github.com/openshift/cluster-version-operator/pkg/cvo.(*Operator).Run
	/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:345 +0x4af
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x1200768]

goroutine 179 [running]:
github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0x105
panic(0x15e54e0, 0x27a5500)
	/opt/rh/go-toolset-1.13/root/usr/lib/go-toolset-1.13-golang/src/runtime/panic.go:679 +0x1b2
sync.(*Mutex).Lock(...)
	/opt/rh/go-toolset-1.13/root/usr/lib/go-toolset-1.13-golang/src/sync/mutex.go:74
github.com/openshift/cluster-version-operator/vendor/golang.org/x/time/rate.(*Limiter).reserveN(0x0, 0xbf8172f58ad8ebca, 0x176456b42b, 0x27c1fc0, 0x1, 0x7fffffffffffffff, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/openshift/cluster-version-operator/vendor/golang.org/x/time/rate/rate.go:305 +0x68
github.com/openshift/cluster-version-operator/vendor/golang.org/x/time/rate.(*Limiter).ReserveN(...)
	/go/src/github.com/openshift/cluster-version-operator/vendor/golang.org/x/time/rate/rate.go:212
github.com/openshift/cluster-version-operator/vendor/golang.org/x/time/rate.(*Limiter).Reserve(0x0, 0x0)
	/go/src/github.com/openshift/cluster-version-operator/vendor/golang.org/x/time/rate/rate.go:194 +0xb5
github.com/openshift/cluster-version-operator/pkg/verify/verifyconfigmap.(*Store).DigestSignatures(0xc000d97d60, 0x1a53c00, 0xc000a65da0, 0xc00138ea2a, 0x47, 0xc0012bb3b0, 0x47, 0x2, 0x3, 0x0)
	/go/src/github.com/openshift/cluster-version-operator/pkg/verify/verifyconfigmap/store.go:74 +0x72
github.com/openshift/cluster-version-operator/pkg/verify.(*ReleaseVerifier).Verify(0xc0002a4960, 0x1a53c00, 0xc000a65da0, 0xc00138ea2a, 0x47, 0xc00109d5f8, 0x146867e)
	/go/src/github.com/openshift/cluster-version-operator/pkg/verify/verify.go:247 +0x5c7
github.com/openshift/cluster-version-operator/pkg/cvo.(*payloadRetriever).RetrievePayload(0xc0001d2a00, 0x1a53c00, 0xc000a65da0, 0x0, 0x0, 0xc00138ea00, 0x71, 0x1, 0x0, 0x0, ...)
	/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/updatepayload.go:91 +0x150
github.com/openshift/cluster-version-operator/pkg/cvo.(*SyncWorker).syncOnce(0xc0000c71e0, 0x1a53c00, 0xc000a65da0, 0xc000980e40, 0x10, 0x1a0bba0, 0xc0008de650, 0x0, 0x0)
	/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:485 +0x4ea
github.com/openshift/cluster-version-operator/pkg/cvo.(*SyncWorker).Start.func1.1(0xc000980e40, 0xc0000c71e0, 0x1a53b80, 0xc000e5a680, 0x10, 0x0, 0x0)
	/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:314 +0x217
github.com/openshift/cluster-version-operator/pkg/cvo.(*SyncWorker).Start.func1()
	/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:315 +0x2ed
github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc000058f80)
	/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x5e
github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0012ddf80, 0x989680, 0x0, 0x1, 0xc001344de0)
	/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0xf8
github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until(...)
	/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
github.com/openshift/cluster-version-operator/pkg/cvo.(*SyncWorker).Start(0xc0000c71e0, 0x1a53b80, 0xc000e5a680, 0x10)
	/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:250 +0xf7
created by github.com/openshift/cluster-version-operator/pkg/cvo.(*Operator).Run
	/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:345 +0x4af

Comment 2 W. Trevor King 2020-01-21 23:52:48 UTC
> E0120 07:18:46.182087       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
> ...
> github.com/openshift/cluster-version-operator/pkg/verify/verifyconfigmap.(*Store).DigestSignatures(0xc000d97d60, 0x1a53c00, 0xc000a65da0, 0xc00138ea2a, 0x47, 0xc0012bb3b0, 0x47, 0x2, 0x3, 0x0)
> 	/go/src/github.com/openshift/cluster-version-operator/pkg/verify/verifyconfigmap/store.go:74 +0x72

This panic was actually fixed by #303 (which I'm linking).  Before that, line 74 was an attempt to access the nil limiter [1].

[1]: https://github.com/vrutkovs/cluster-version-operator/commit/2ba18d868a865a8ad3b633e79e32bde8a90fe90b#diff-4f12874ad5cd49216fb65c33c0fa4519R74

Comment 4 liujia 2020-01-22 08:44:14 UTC
1. install ocp 4.3.0
2. trigger upgrade from 4.3.0 to 4.4.0-0.nightly-2020-01-22-045318
# ./oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-01-22-045318   True        False         4m29s   Cluster version is 4.4.0-0.nightly-2020-01-22-045318

3. trigger downgrade from 4.4.0-0.nightly-2020-01-22-045318 to 4.3.0 and check the downgrade succeed.
# ./oc get clusterversion -o json|jq -r '.items[0].status.history[]|.startedTime + "|" + .completionTime + "|" + .state + "|" + .version'
2020-01-22T07:59:51Z|2020-01-22T08:30:03Z|Completed|4.3.0
2020-01-22T07:07:44Z|2020-01-22T07:53:36Z|Completed|4.4.0-0.nightly-2020-01-22-045318
2020-01-22T06:18:53Z|2020-01-22T06:40:59Z|Completed|4.3.0

Comment 6 errata-xmlrpc 2020-05-04 11:25:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.