Bug 2052438

Summary: [KMS] Storagecluster is in progressing state due to failed RGW deployment when using cluster wide encryption with kubernetes auth method
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Rachael <rgeorge>
Component: ocs-operatorAssignee: Jiffin <jthottan>
Status: CLOSED ERRATA QA Contact: shylesh <shmohan>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.10CC: jthottan, madam, muagarwa, ocs-bugs, odf-bz-bot, sostapov, tnielsen
Target Milestone: ---   
Target Release: ODF 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.10.0-160 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-04-13 18:53:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rachael 2022-02-09 09:45:59 UTC
Description of problem (please be detailed as possible and provide log
snippets):

When cluster wide encryption is enable with kubernetes authentication method on a Baremetal cluster, the storagecluster is stuck in progressing state. 
 
$ oc get storagecluster
NAME                 AGE   PHASE         EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   19h   Progressing              2022-02-08T13:03:25Z   4.10.0

[...]
    Last Heartbeat Time:   2022-02-09T09:29:17Z
    Last Transition Time:  2022-02-08T13:03:26Z
    Message:               Waiting on Nooba instance to finish initialization
    Reason:                NoobaaInitializing


$ oc describe noobaa 
[...]
    Last Heartbeat Time:   2022-02-08T13:06:40Z
    Last Transition Time:  2022-02-08T13:06:40Z
    Message:               Ceph objectstore user "noobaa-ceph-objectstore-user" is not ready


$ oc describe cephobjectstore 
Name:         ocs-storagecluster-cephobjectstore
Namespace:    openshift-storage
[...]
Status:
  Phase:  Failure
Events:
  Type     Reason           Age                   From                         Message
  ----     ------           ----                  ----                         -------
  Warning  ReconcileFailed  8m23s (x90 over 20h)  rook-ceph-object-controller  failed to create object store deployments: failed to create object store "ocs-storagecluster-cephobjectstore": failed to start rgw pods: failed to create rgw deployment: got empty container for RGW daemon


The following error message was seen in the rook operator logs:

2022-02-09 07:27:10.757391 E | ceph-object-controller: failed to enable KMS. failed to fetch kms token secret "ocs-kms-token": secrets "ocs-kms-token" not found
2022-02-09 07:27:10.768442 E | ceph-object-controller: failed to reconcile CephObjectStore "openshift-storage/ocs-storagecluster-cephobjectstore". failed to create object store deployments: failed to create object store "ocs-storagecluster-cephobjectstore": failed to start rgw pods: failed to create rgw deployment: got empty container for RGW daemon

Since kubernetes authentication method is used, there is no secret called "ocs-kms-token" created in the ODF cluster. 


Version of all relevant components (if applicable):
OCP: 4.10.0-0.nightly-2022-02-07-162517
ODF: odf-operator.v4.10.0  full_version=4.10.0-147

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
No

Steps to Reproduce:
-------------------
1. Deploy an ODF cluster on Baremetal with cluster wide encryption enabled using kubernetes authentication method.
2. Check the status of the storagecluster


Actual results:
---------------
The storgaecluster is in progressing state


Expected results:
-----------------
The storagecluster should succeed.

Comment 13 errata-xmlrpc 2022-04-13 18:53:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1372