2050300 – panic in cluster-storage-operator while updating status

Bug 2050300 - panic in cluster-storage-operator while updating status

Summary: panic in cluster-storage-operator while updating status

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Fabio Bertinatto
QA Contact:	Wei Duan
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-02-03 16:34 UTC by Hemant Kumar
Modified:	2022-08-10 10:47 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-10 10:47:09 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-storage-operator pull 261	0	None	open	Bug 2050300: Don't set generation in object comming from cache	2022-02-04 14:19:43 UTC
Red Hat Product Errata	RHSA-2022:5069	0	None	None	None	2022-08-10 10:47:34 UTC

Description Hemant Kumar 2022-02-03 16:34:52 UTC

panic in cluster-storage-operator while updating status - https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_kubernetes/1163/pull-ci-openshift-kubernetes-master-k8s-e2e-conformance-aws/1489076275103404032

Comment 1 Jan Safranek 2022-02-03 16:37:46 UTC

The panic:

panic({0x1ff9140, 0x27bc000})
	runtime/panic.go:1047 +0x266
reflect.Value.Index({0x1f77aa0, 0xc00241cb78, 0xc00068d080}, 0x199)
	reflect/value.go:1297 +0x178
k8s.io/apimachinery/third_party/forked/golang/reflect.Equalities.deepValueEqual(0x22e3b60, {0x1f77aa0, 0xc00292d3d0, 0x16}, {0x1f77aa0, 0xc00241cb78, 0xc001d070b8}, 0x43a191, 0x2)
	k8s.io/apimachinery.1/third_party/forked/golang/reflect/deep_equal.go:172 +0x1185
k8s.io/apimachinery/third_party/forked/golang/reflect.Equalities.deepValueEqual(0x2160740, {0x22e3b60, 0xc00292d398, 0xc0014c8480}, {0x22e3b60, 0xc00241cb40, 0x2}, 0xc000bf4360, 0x1)
	k8s.io/apimachinery.1/third_party/forked/golang/reflect/deep_equal.go:186 +0x1325
k8s.io/apimachinery/third_party/forked/golang/reflect.Equalities.deepValueEqual(0x2160740, {0x2160740, 0xc00292d398, 0x578}, {0x2160740, 0xc00241cb40, 0x580}, 0xc000748800, 0x0)
	k8s.io/apimachinery.1/third_party/forked/golang/reflect/deep_equal.go:183 +0xe11
k8s.io/apimachinery/third_party/forked/golang/reflect.Equalities.DeepEqual(0xc00292d398, {0x2160740, 0xc00292d398}, {0x2160740, 0xc00241cb40})
	k8s.io/apimachinery.1/third_party/forked/golang/reflect/deep_equal.go:243 +0x285
github.com/openshift/library-go/pkg/operator/v1helpers.UpdateStatus.func1()
	github.com/openshift/library-go.0-20220114151217-4362aa519714/pkg/operator/v1helpers/helpers.go:175 +0x169
k8s.io/client-go/util/retry.OnError.func1()
	k8s.io/client-go.0+incompatible/util/retry/util.go:51 +0x33
k8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1({0x4131f4, 0xc001d07798})
	k8s.io/apimachinery.1/pkg/util/wait/wait.go:220 +0x1b
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext({0x2819048, 0xc0001ae008}, 0x203000)
	k8s.io/apimachinery.1/pkg/util/wait/wait.go:233 +0x7c
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection(0x2262d80)
	k8s.io/apimachinery.1/pkg/util/wait/wait.go:226 +0x39
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff({0x989680, 0x4014000000000000, 0x3fb999999999999a, 0x4, 0x0}, 0x413547)
	k8s.io/apimachinery.1/pkg/util/wait/wait.go:421 +0x5f
k8s.io/client-go/util/retry.OnError({0x989680, 0x4014000000000000, 0x3fb999999999999a, 0x4, 0x0}, 0x25dcc28, 0xc00241caf0)
	k8s.io/client-go.0+incompatible/util/retry/util.go:50 +0xf1
k8s.io/client-go/util/retry.RetryOnConflict(...)
	k8s.io/client-go.0+incompatible/util/retry/util.go:104
github.com/openshift/library-go/pkg/operator/v1helpers.UpdateStatus({0x2819010, 0xc0001cfcc0}, {0x283cd38, 0xc0005776c0}, {0xc0029c6840, 0x3, 0x3})
	github.com/openshift/library-go.0-20220114151217-4362aa519714/pkg/operator/v1helpers/helpers.go:162 +0x1c5
github.com/openshift/cluster-storage-operator/pkg/operator/defaultstorageclass.(*Controller).sync(0xc000ce8230, {0x2819010, 0xc0001cfcc0}, {0x30, 0xc000077800})
	github.com/openshift/cluster-storage-operator/pkg/operator/defaultstorageclass/controller.go:134 +0x8c5


storage class controller updates its own conditions and that panics somewhere in reflect.Equalities.DeepEqual

Comment 2 Fabio Bertinatto 2022-02-03 18:26:52 UTC

The panic is happening because of an out-of-range index access when the status is updated:

> E0203 09:27:18.496815       1 runtime.go:76] Observed a panic: reflect: slice index out of range

Which seems to be caused here (see https://github.com/openshift/cluster-storage-operator/blob/2eb1fd83439da37e4f54231d627c3120735b20ee/vendor/k8s.io/apimachinery/third_party/forked/golang/reflect/deep_equal.go#L152-L153):

> for i := 0; i < v1.Len(); i++ {
> 	if !e.deepValueEqual(v1.Index(i), v2.Index(i), visited, depth+1) {


In the code above, the index range is determined via the length of v1 (which is the "old" status in our case). However, the "new" status contains fewer elements (conditions), so an out of range error happens.

For completeness, the "new" status contains fewer conditions because we explicitly remove a "Disabled" condition here:

https://github.com/openshift/cluster-storage-operator/blob/2eb1fd83439da37e4f54231d627c3120735b20ee/pkg/operator/defaultstorageclass/controller.go#L137

Comment 3 Jonathan Dobson 2022-02-03 18:40:06 UTC

From the stack trace, we're hitting this line:

k8s.io/apimachinery/third_party/forked/golang/reflect.Equalities.deepValueEqual(0x22e3b60, {0x1f77aa0, 0xc00292d3d0, 0x16}, {0x1f77aa0, 0xc00241cb78, 0xc001d070b8}, 0x43a191, 0x2)
	k8s.io/apimachinery.1/third_party/forked/golang/reflect/deep_equal.go:172 +0x1185

https://github.com/openshift/cluster-storage-operator/blob/2eb1fd83439da37e4f54231d627c3120735b20ee/vendor/k8s.io/apimachinery/third_party/forked/golang/reflect/deep_equal.go#L172

So it's comparing slices instead of arrays if the line numbers are correct.
And there is a check for slices to see if the lengths are the same before that line, so why doesn't that prevent the panic?

		if v1.Len() != v2.Len() {
			return false
		}

Comment 4 Jonathan Dobson 2022-02-03 19:06:40 UTC

In this call:

k8s.io/apimachinery/third_party/forked/golang/reflect.Equalities.deepValueEqual(0x22e3b60, {0x1f77aa0, 0xc00292d3d0, 0x16}, {0x1f77aa0, 0xc00241cb78, 0xc001d070b8}, 0x43a191, 0x2)
	k8s.io/apimachinery.1/third_party/forked/golang/reflect/deep_equal.go:172 +0x1185

deepValueEqual compares these 2 slices:

  v1 = {0x1f77aa0, 0xc00292d3d0, 0x16}
  v2 = {0x1f77aa0, 0xc00241cb78, 0xc001d070b8}

And then calls v2.Index(i):

reflect.Value.Index({0x1f77aa0, 0xc00241cb78, 0xc00068d080}, 0x199)
	reflect/value.go:1297 +0x178

Somehow, the third element of v2 appears to be a different value, and i = 0x199.
It would help to see the objects being compared when this panic triggers. I think Fabio is right that it's due to a difference in length between the two slices, but I'm puzzled why the length check didn't prevent the panic.

Comment 5 Jan Safranek 2022-02-04 11:24:09 UTC

Can we reproduce the issue easily?

> And there is a check for slices to see if the lengths are the same before that line, so why doesn't that prevent the panic?

I did not check the code, but this smells like a race condition. Can any other goroutine modify any of the objects during the checks? We may miss some DeepCopy() in our operator / library-go.

Comment 6 Fabio Bertinatto 2022-02-04 14:18:51 UTC

It appears that the CR controller is mistakenly overriding the status while trying to set the generation. I've filled a PR to fix that.

Comment 9 Wei Duan 2022-02-09 04:08:04 UTC

Monitoring two days and not found 4.11 failure, will Monitor more days.

Comment 11 errata-xmlrpc 2022-08-10 10:47:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Note You need to log in before you can comment on or make changes to this bug.