Bug 1974609

Summary: Rook operator SIGSEGV when RGW has TLS enabled
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Jiffin <jthottan>
Component: rookAssignee: Jiffin <jthottan>
Status: VERIFIED --- QA Contact: Ben Eli <belimele>
Severity: high Docs Contact:
Priority: medium    
Version: 4.8CC: belimele, jthottan, muagarwa, nberry
Target Milestone: ---Keywords: AutomationBackLog
Target Release: OCS 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.8.0-432.ci Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jiffin 2021-06-22 07:44:49 UTC
Description of problem (please be detailed as possible and provide log
snippests):
When RGW has TLS certificate set, rook-operator pod crash

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1d32433]
goroutine 1221 [running]:
github.com/rook/rook/pkg/operator/ceph/object.(*bucketChecker).checkObjectStoreHealth(0xc002422000, 0xc00158f200, 0xc001a08fb0)
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/object/health.go:141 +0x1173
github.com/rook/rook/pkg/operator/ceph/object.(*bucketChecker).checkObjectStore(0xc002422000, 0xc00066b200)
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/object/health.go:82 +0x45
created by github.com/rook/rook/pkg/operator/ceph/object.(*ReconcileCephObjectStore).startMonitoring
	/home/rook/go/src/github.com/rook/rook/pkg/operator/ceph/object/controller.go:542 +0x3e5


Version of all relevant components (if applicable):
4.8

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
If TLS enabled then 

Is there any workaround available to the best of your knowledge?
Nope

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1.)

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Nope

Steps to Reproduce:
Enable TLS for RGW via https://rook.io/docs/rook/v1.6/ceph-object-store-crd.html#gateway-settings


Actual results:
Rook Operator pod is crashing

Expected results:
Rook Operator should not crash

Additional info:
Currently, TLS enabled RGW endpoint is not tested in OCS and planned for 4.9 as a dependency for RGW-KMS[1]. Please note customers are trying out this feature independently with OCS 4.7[2] since support already exists in Rook

The required change is merged in Rook Upstream https://github.com/rook/rook/pull/8139

[1] https://issues.redhat.com/browse/KNIP-1555
[2] https://issues.redhat.com/browse/RHSTOR-1668?focusedCommentId=16144780&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16144780