Bug 2049509 - ocs operator stuck on CrashLoopBackOff while installing with KMS
Summary: ocs operator stuck on CrashLoopBackOff while installing with KMS
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ODF 4.10.0
Assignee: Jiffin
QA Contact: aberner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-02 11:20 UTC by aberner
Modified: 2023-08-09 17:00 UTC (History)
11 users (show)

Fixed In Version: 4.10.0-141
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-04-13 18:52:41 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage ocs-operator pull 1480 0 None open Bug 2049509: [release-4.10] rgw-kms: fix invalid memory issue 2022-02-03 07:19:15 UTC
Github red-hat-storage ocs-operator pull 1483 0 None open Bug 2049509: [release-4.10] rgw-kms: fix invalid memory issue 2022-02-03 09:17:17 UTC
Red Hat Product Errata RHSA-2022:1372 0 None None None 2022-04-13 18:52:59 UTC

Description aberner 2022-02-02 11:20:46 UTC
Description of problem (please be detailed as possible and provide log
snippests):
While trying to install ODF storagesystem with Vault ocs operator gets stuck on CrashLoopBackOff with an error message in the operator logs saying:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x170d20a]


Version of all relevant components (if applicable):
ocp: 4.10.0-0.nightly-2022-01-31-012936
odf: 4.10.0-133


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
yes, ocs operator is unavailable 


Is there any workaround available to the best of your knowledge?
no

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
unknown

Can this issue reproduce from the UI?
was created via installation through the ui, reproduction unknown.

If this is a regression, please provide more details to justify this:
deploying with KMS was operational in previous versions so yes it is a regression

Steps to Reproduce:
1. install an OCP cluster
2. install odf operator
3. create a storagesystem with a full deployment and connect to a KMS (Vault)


Actual results:
Cluster comes up unhealthy with the ocs operator stuck in CrashLoopBackOff

Expected results:
Cluster comes up healthy and operational 

Additional info:

Comment 2 aberner 2022-02-02 14:31:50 UTC
was able to reproduce in odf 4.10.0-137

Comment 3 aberner 2022-02-02 14:39:03 UTC
the platform of both of the failures is vsphere and there was a successful deployment over aws with kms enabled therefore we suspect it to be platform related.

Comment 7 Mudit Agarwal 2022-02-03 01:07:41 UTC
Thanks Amit!!

{"level":"info","ts":1643849128.0077307,"logger":"cmd","msg":"Go Version: go1.16.6"}
{"level":"info","ts":1643849128.0080402,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
I0203 00:45:29.057166       1 request.go:668] Waited for 1.0379668s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/discovery.k8s.io/v1?timeout=32s
{"level":"info","ts":1643849130.7604914,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"level":"info","ts":1643849130.7768652,"logger":"cmd","msg":"OCSInitialization resource already exists"}
{"level":"info","ts":1643849133.5396976,"logger":"cmd","msg":"starting manager"}
I0203 00:45:33.539974       1 leaderelection.go:243] attempting to acquire leader lease openshift-storage/ab76f4c9.openshift.io...
{"level":"info","ts":1643849133.5400252,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
I0203 00:45:51.972963       1 leaderelection.go:253] successfully acquired lease openshift-storage/ab76f4c9.openshift.io
{"level":"info","ts":1643849151.9731855,"logger":"controller-runtime.manager.controller.storagecluster","msg":"Starting EventSource","reconciler group":"ocs.openshift.io","reconciler kind":"StorageCluster","source":"kind source: /, Kind="}
{"level":"info","ts":1643849151.9731698,"logger":"controller-runtime.manager.controller.storageconsumer","msg":"Starting EventSource","reconciler group":"ocs.openshift.io","reconciler kind":"StorageConsumer","source":"kind source: /, Kind="}
{"level":"info","ts":1643849151.9732287,"logger":"controller-runtime.manager.controller.persistentvolume","msg":"Starting EventSource","reconciler group":"","reconciler kind":"PersistentVolume","source":"kind source: /, Kind="}
{"level":"info","ts":1643849151.973263,"logger":"controller-runtime.manager.controller.persistentvolume","msg":"Starting Controller","reconciler group":"","reconciler kind":"PersistentVolume"}
{"level":"info","ts":1643849151.9732392,"logger":"controller-runtime.manager.controller.storagecluster","msg":"Starting EventSource","reconciler group":"ocs.openshift.io","reconciler kind":"StorageCluster","source":"kind source: /, Kind="}
{"level":"info","ts":1643849151.9739225,"logger":"controller-runtime.manager.controller.storagecluster","msg":"Starting EventSource","reconciler group":"ocs.openshift.io","reconciler kind":"StorageCluster","source":"kind source: /, Kind="}
{"level":"info","ts":1643849151.9739504,"logger":"controller-runtime.manager.controller.storagecluster","msg":"Starting EventSource","reconciler group":"ocs.openshift.io","reconciler kind":"StorageCluster","source":"kind source: /, Kind="}
{"level":"info","ts":1643849151.9739673,"logger":"controller-runtime.manager.controller.storagecluster","msg":"Starting EventSource","reconciler group":"ocs.openshift.io","reconciler kind":"StorageCluster","source":"kind source: /, Kind="}
{"level":"info","ts":1643849151.9739735,"logger":"controller-runtime.manager.controller.storagecluster","msg":"Starting EventSource","reconciler group":"ocs.openshift.io","reconciler kind":"StorageCluster","source":"kind source: /, Kind="}
{"level":"info","ts":1643849151.9739833,"logger":"controller-runtime.manager.controller.storagecluster","msg":"Starting Controller","reconciler group":"ocs.openshift.io","reconciler kind":"StorageCluster"}
{"level":"info","ts":1643849151.974249,"logger":"controller-runtime.manager.controller.storageconsumer","msg":"Starting Controller","reconciler group":"ocs.openshift.io","reconciler kind":"StorageConsumer"}
{"level":"info","ts":1643849151.9743242,"logger":"controller-runtime.manager.controller.ocsinitialization","msg":"Starting EventSource","reconciler group":"ocs.openshift.io","reconciler kind":"OCSInitialization","source":"kind source: /, Kind="}
{"level":"info","ts":1643849151.9743755,"logger":"controller-runtime.manager.controller.ocsinitialization","msg":"Starting EventSource","reconciler group":"ocs.openshift.io","reconciler kind":"OCSInitialization","source":"kind source: /, Kind="}
{"level":"info","ts":1643849151.9743888,"logger":"controller-runtime.manager.controller.ocsinitialization","msg":"Starting Controller","reconciler group":"ocs.openshift.io","reconciler kind":"OCSInitialization"}
{"level":"info","ts":1643849152.0760622,"logger":"controller-runtime.manager.controller.storageconsumer","msg":"Starting workers","reconciler group":"ocs.openshift.io","reconciler kind":"StorageConsumer","worker count":1}
{"level":"info","ts":1643849152.0762343,"logger":"controller-runtime.manager.controller.storagecluster","msg":"Starting workers","reconciler group":"ocs.openshift.io","reconciler kind":"StorageCluster","worker count":1}
{"level":"info","ts":1643849152.0763292,"logger":"controller-runtime.manager.controller.persistentvolume","msg":"Starting workers","reconciler group":"","reconciler kind":"PersistentVolume","worker count":1}
{"level":"info","ts":1643849152.0763702,"logger":"controllers.StorageCluster","msg":"Reconciling StorageCluster.","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","StorageCluster":"openshift-storage/ocs-storagecluster"}
{"level":"info","ts":1643849152.0764022,"logger":"controllers.StorageCluster","msg":"Spec.AllowRemoteStorageConsumers is disabled","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":1643849152.0764332,"logger":"controller-runtime.manager.controller.ocsinitialization","msg":"Starting workers","reconciler group":"ocs.openshift.io","reconciler kind":"OCSInitialization","worker count":1}
{"level":"info","ts":1643849152.0765593,"logger":"controllers.OCSInitialization","msg":"Reconciling OCSInitialization.","Request.Namespace":"openshift-storage","Request.Name":"ocsinit","OCSInitialization":"openshift-storage/ocsinit"}
{"level":"info","ts":1643849152.082092,"logger":"controllers.OCSInitialization","msg":"Updating SecurityContextConstraint.","Request.Namespace":"openshift-storage","Request.Name":"ocsinit","SecurityContextConstraint":"rook-ceph"}
{"level":"info","ts":1643849152.0877435,"logger":"controllers.StorageCluster","msg":"Resource deletion for provider succeeded","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":1643849152.091996,"logger":"controllers.OCSInitialization","msg":"Updating SecurityContextConstraint.","Request.Namespace":"openshift-storage","Request.Name":"ocsinit","SecurityContextConstraint":"rook-ceph-csi"}
{"level":"info","ts":1643849152.101062,"logger":"controllers.OCSInitialization","msg":"Updating SecurityContextConstraint.","Request.Namespace":"openshift-storage","Request.Name":"ocsinit","SecurityContextConstraint":"noobaa"}
{"level":"info","ts":1643849152.1127658,"logger":"controllers.OCSInitialization","msg":"Updating SecurityContextConstraint.","Request.Namespace":"openshift-storage","Request.Name":"ocsinit","SecurityContextConstraint":"noobaa-endpoint"}
{"level":"info","ts":1643849152.1313975,"logger":"controllers.OCSInitialization","msg":"Reconciling OCSInitialization.","Request.Namespace":"openshift-storage","Request.Name":"ocsinit","OCSInitialization":"openshift-storage/ocsinit"}
{"level":"info","ts":1643849152.1351314,"logger":"controllers.OCSInitialization","msg":"Updating SecurityContextConstraint.","Request.Namespace":"openshift-storage","Request.Name":"ocsinit","SecurityContextConstraint":"rook-ceph"}
{"level":"info","ts":1643849152.1491027,"logger":"controllers.OCSInitialization","msg":"Updating SecurityContextConstraint.","Request.Namespace":"openshift-storage","Request.Name":"ocsinit","SecurityContextConstraint":"rook-ceph-csi"}
{"level":"info","ts":1643849152.160247,"logger":"controllers.OCSInitialization","msg":"Updating SecurityContextConstraint.","Request.Namespace":"openshift-storage","Request.Name":"ocsinit","SecurityContextConstraint":"noobaa"}
{"level":"info","ts":1643849152.1706407,"logger":"controllers.OCSInitialization","msg":"Updating SecurityContextConstraint.","Request.Namespace":"openshift-storage","Request.Name":"ocsinit","SecurityContextConstraint":"noobaa-endpoint"}
{"level":"info","ts":1643849152.9305122,"logger":"controllers.StorageCluster","msg":"Restoring original CephBlockPool.","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","CephBlockPool":"openshift-storage/ocs-storagecluster-cephblockpool"}
{"level":"info","ts":1643849153.0419562,"logger":"controllers.StorageCluster","msg":"Restoring original CephFilesystem.","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","CephFileSystem":"openshift-storage/ocs-storagecluster-cephfilesystem"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x170d20a]

goroutine 884 [running]:
github.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).newCephObjectStoreInstances(0xc00048a0c0, 0xc000c32000, 0xc000bc0b40, 0x1e2bf80, 0xc00059cb40, 0xc000bc0b40, 0x0, 0x0)
	/remote-source/app/controllers/storagecluster/cephobjectstores.go:218 +0x96a
github.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*ocsCephObjectStores).ensureCreated(0x2a22f90, 0xc00048a0c0, 0xc000c32000, 0x0, 0x0, 0x0, 0x0)
	/remote-source/app/controllers/storagecluster/cephobjectstores.go:59 +0x12c
github.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).reconcilePhases(0xc00048a0c0, 0xc000c32000, 0xc000951140, 0x11, 0xc000951128, 0x12, 0x0, 0x0, 0xc000c32000, 0x0)
	/remote-source/app/controllers/storagecluster/reconcile.go:394 +0xd08
github.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).Reconcile(0xc00048a0c0, 0x1e174b8, 0xc000e44f90, 0xc000951140, 0x11, 0xc000951128, 0x12, 0xc000e44f00, 0x0, 0x0, ...)
	/remote-source/app/controllers/storagecluster/reconcile.go:161 +0x6c5
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000ba25a0, 0x1e17410, 0xc0005b94c0, 0x19e0a40, 0xc0007ec080)
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298 +0x30d
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000ba25a0, 0x1e17410, 0xc0005b94c0, 0xc000c82f00)
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2(0xc000e07b70, 0xc000ba25a0, 0x1e17410, 0xc0005b94c0)
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214 +0x6b
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:210 +0x425

Comment 8 Mudit Agarwal 2022-02-03 03:09:58 UTC
Jiffin/Pranshu, PTAL

Comment 9 Mudit Agarwal 2022-02-03 03:30:15 UTC
Are we supposed to test this on baremetal? 
Was it tested before or it's being tested first time in 4.10?

Comment 18 aberner 2022-03-01 09:32:56 UTC
Verified over odf version 4.10.0-143

Comment 20 errata-xmlrpc 2022-04-13 18:52:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1372


Note You need to log in before you can comment on or make changes to this bug.