2052438 – [KMS] Storagecluster is in progressing state due to failed RGW deployment when using cluster wide encryption with kubernetes auth method

Bug 2052438 - [KMS] Storagecluster is in progressing state due to failed RGW deployment when using cluster wide encryption with kubernetes auth method

Summary: [KMS] Storagecluster is in progressing state due to failed RGW deployment whe...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	ocs-operator
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	ODF 4.10.0
Assignee:	Jiffin
QA Contact:	shylesh
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-02-09 09:45 UTC by Rachael
Modified:	2023-08-09 17:00 UTC (History)
CC List:	7 users (show)
Fixed In Version:	4.10.0-160
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-04-13 18:53:05 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	red-hat-storage ocs-operator pull 1509	None	open	Bug 2052438: [release-4.10] rgw-kms: skip unsupported configs	2022-02-09 13:35:54 UTC
Github	red-hat-storage ocs-operator pull 1513	None	open	Bug 2052438: [release-4.10] rgw-kms: skip unsupported configs	2022-02-14 08:28:35 UTC
Red Hat Product Errata	RHSA-2022:1372	None	None	None	2022-04-13 18:53:18 UTC

Description Rachael 2022-02-09 09:45:59 UTC

Description of problem (please be detailed as possible and provide log
snippets):

When cluster wide encryption is enable with kubernetes authentication method on a Baremetal cluster, the storagecluster is stuck in progressing state. 
 
$ oc get storagecluster
NAME                 AGE   PHASE         EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   19h   Progressing              2022-02-08T13:03:25Z   4.10.0

[...]
    Last Heartbeat Time:   2022-02-09T09:29:17Z
    Last Transition Time:  2022-02-08T13:03:26Z
    Message:               Waiting on Nooba instance to finish initialization
    Reason:                NoobaaInitializing


$ oc describe noobaa 
[...]
    Last Heartbeat Time:   2022-02-08T13:06:40Z
    Last Transition Time:  2022-02-08T13:06:40Z
    Message:               Ceph objectstore user "noobaa-ceph-objectstore-user" is not ready


$ oc describe cephobjectstore 
Name:         ocs-storagecluster-cephobjectstore
Namespace:    openshift-storage
[...]
Status:
  Phase:  Failure
Events:
  Type     Reason           Age                   From                         Message
  ----     ------           ----                  ----                         -------
  Warning  ReconcileFailed  8m23s (x90 over 20h)  rook-ceph-object-controller  failed to create object store deployments: failed to create object store "ocs-storagecluster-cephobjectstore": failed to start rgw pods: failed to create rgw deployment: got empty container for RGW daemon


The following error message was seen in the rook operator logs:

2022-02-09 07:27:10.757391 E | ceph-object-controller: failed to enable KMS. failed to fetch kms token secret "ocs-kms-token": secrets "ocs-kms-token" not found
2022-02-09 07:27:10.768442 E | ceph-object-controller: failed to reconcile CephObjectStore "openshift-storage/ocs-storagecluster-cephobjectstore". failed to create object store deployments: failed to create object store "ocs-storagecluster-cephobjectstore": failed to start rgw pods: failed to create rgw deployment: got empty container for RGW daemon

Since kubernetes authentication method is used, there is no secret called "ocs-kms-token" created in the ODF cluster. 


Version of all relevant components (if applicable):
OCP: 4.10.0-0.nightly-2022-02-07-162517
ODF: odf-operator.v4.10.0  full_version=4.10.0-147

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
No

Steps to Reproduce:
-------------------
1. Deploy an ODF cluster on Baremetal with cluster wide encryption enabled using kubernetes authentication method.
2. Check the status of the storagecluster


Actual results:
---------------
The storgaecluster is in progressing state


Expected results:
-----------------
The storagecluster should succeed.

Comment 13 errata-xmlrpc 2022-04-13 18:53:05 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1372

Note You need to log in before you can comment on or make changes to this bug.