Bug 1814280 - CSI Snapshot Controller panics in checkandRemoveSnapshotFinalizersAndCheckandDeleteContent
Summary: CSI Snapshot Controller panics in checkandRemoveSnapshotFinalizersAndCheckand...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.5.0
Assignee: Christian Huffman
QA Contact: Qin Ping
URL:
Whiteboard:
: 1814458 (view as bug list)
Depends On:
Blocks: 1815563
TreeView+ depends on / blocked
 
Reported: 2020-03-17 14:41 UTC by Christian Huffman
Modified: 2020-08-04 18:05 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: VolumeSnapshotContents were assumed to be created, resulting in a panic if the content was nil. Consequence: This could cause the CSI Snapshot Controller to panic and crash. Fix: Included logic so that the we check to see if the VolumeSnapshotContent is nil before using it. Result: The CSI Snapshot Controller no longer panics due to a nil VolumeSnapshotContent.
Clone Of:
: 1815563 (view as bug list)
Environment:
Last Closed: 2020-08-04 18:05:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift csi-external-snapshotter pull 16 0 None closed Bug 1814280: UPSTREAM 278: Include a nil check for the snapshotContent 2020-10-02 10:54:54 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-08-04 18:05:51 UTC

Description Christian Huffman 2020-03-17 14:41:43 UTC
The CSI Snapshot Controller can enter into a crashloop in checkandRemoveSnapshotFinalizersAndCheckandDeleteContent. This was seen in the following test:

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_kube-state-metrics/27/pull-ci-openshift-kube-state-metrics-master-e2e-aws/41

This seems to occur because we don't examine if the snapshotContent is nil before using it in certain cases. The stack trace is below:

I0316 19:01:12.229634       1 snapshot_controller.go:832] checkandRemovePVCFinalizer[snapshot-rt6ft]: Remove Finalizer for PVC pvc-f42nr as it is not used by snapshots in creation
...
E0316 19:01:14.124545       1 snapshot_controller_base.go:371] could not sync volume "e2e-snapshotting-5615/snapshot-rt6ft": failed to delete VolumeSnapshotContent snapcontent-2d11399d-a8c6-430a-b3da-747c853c1b55 from API server: "volumesnapshotcontents.snapshot.storage.k8s.io \"snapcontent-2d11399d-a8c6-430a-b3da-747c853c1b55\" not found"
E0316 19:01:14.126116       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 136 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x12df160, 0x1fd2770)
	/go/src/github.com/kubernetes-csi/external-snapshotter/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0xa3
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/src/github.com/kubernetes-csi/external-snapshotter/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x82
panic(0x12df160, 0x1fd2770)
	/usr/local/go/src/runtime/panic.go:679 +0x1b2
github.com/kubernetes-csi/external-snapshotter/pkg/common-controller.(*csiSnapshotCommonController).checkandRemoveSnapshotFinalizersAndCheckandDeleteContent(0xc00003c400, 0xc000392780, 0x0, 0x0, 0x2, 0xc0001dc780)
	/go/src/github.com/kubernetes-csi/external-snapshotter/pkg/common-controller/snapshot_controller.go:265 +0x39f
github.com/kubernetes-csi/external-snapshotter/pkg/common-controller.(*csiSnapshotCommonController).processSnapshotWithDeletionTimestamp(0xc00003c400, 0xc000392780, 0x0, 0x0)
	/go/src/github.com/kubernetes-csi/external-snapshotter/pkg/common-controller/snapshot_controller.go:229 +0x35b
github.com/kubernetes-csi/external-snapshotter/pkg/common-controller.(*csiSnapshotCommonController).syncSnapshot(0xc00003c400, 0xc000392780, 0x1438b00, 0xc000392780)
	/go/src/github.com/kubernetes-csi/external-snapshotter/pkg/common-controller/snapshot_controller.go:170 +0x361
github.com/kubernetes-csi/external-snapshotter/pkg/common-controller.(*csiSnapshotCommonController).updateSnapshot(0xc00003c400, 0xc000392780)
	/go/src/github.com/kubernetes-csi/external-snapshotter/pkg/common-controller/snapshot_controller_base.go:364 +0x250
github.com/kubernetes-csi/external-snapshotter/pkg/common-controller.(*csiSnapshotCommonController).snapshotWorker.func1(0x6738796f484a6300)
	/go/src/github.com/kubernetes-csi/external-snapshotter/pkg/common-controller/snapshot_controller_base.go:220 +0x944
github.com/kubernetes-csi/external-snapshotter/pkg/common-controller.(*csiSnapshotCommonController).snapshotWorker(0xc00003c400)
	/go/src/github.com/kubernetes-csi/external-snapshotter/pkg/common-controller/snapshot_controller_base.go:253 +0x4b
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc0002f4180)
	/go/src/github.com/kubernetes-csi/external-snapshotter/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x5e
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0002f4180, 0x0, 0x0, 0x1, 0xc0003d43c0)
	/go/src/github.com/kubernetes-csi/external-snapshotter/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc0002f4180, 0x0, 0xc0003d43c0)
	/go/src/github.com/kubernetes-csi/external-snapshotter/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
created by github.com/kubernetes-csi/external-snapshotter/pkg/common-controller.(*csiSnapshotCommonController).Run
	/go/src/github.com/kubernetes-csi/external-snapshotter/pkg/common-controller/snapshot_controller_base.go:154 +0x2d9
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x1137c7f]

Comment 1 Jan Safranek 2020-03-18 12:25:44 UTC
*** Bug 1814458 has been marked as a duplicate of this bug. ***

Comment 2 Christian Huffman 2020-03-18 15:21:39 UTC
I submitted https://github.com/kubernetes-csi/external-snapshotter/pull/278 to include this fix upstream. I haven't been able to reproduce the issue once this commit is applied.

Comment 4 Christian Huffman 2020-03-18 18:31:30 UTC
Cherrypick PR to OpenShift - https://github.com/openshift/csi-external-snapshotter/pull/16

Comment 7 Qin Ping 2020-03-25 08:15:50 UTC
Checked the upstream ci jobs last 4 days, about 100 jobs, did not find this error.

Comment 9 errata-xmlrpc 2020-08-04 18:05:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.