Bug 2052438 - [KMS] Storagecluster is in progressing state due to failed RGW deployment when using cluster wide encryption with kubernetes auth method
Summary: [KMS] Storagecluster is in progressing state due to failed RGW deployment whe...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.10.0
Assignee: Jiffin
QA Contact: shylesh
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-09 09:45 UTC by Rachael
Modified: 2023-08-09 17:00 UTC (History)
7 users (show)

Fixed In Version: 4.10.0-160
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-04-13 18:53:05 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage ocs-operator pull 1509 0 None open Bug 2052438: [release-4.10] rgw-kms: skip unsupported configs 2022-02-09 13:35:54 UTC
Github red-hat-storage ocs-operator pull 1513 0 None open Bug 2052438: [release-4.10] rgw-kms: skip unsupported configs 2022-02-14 08:28:35 UTC
Red Hat Product Errata RHSA-2022:1372 0 None None None 2022-04-13 18:53:18 UTC

Description Rachael 2022-02-09 09:45:59 UTC
Description of problem (please be detailed as possible and provide log
snippets):

When cluster wide encryption is enable with kubernetes authentication method on a Baremetal cluster, the storagecluster is stuck in progressing state. 
 
$ oc get storagecluster
NAME                 AGE   PHASE         EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   19h   Progressing              2022-02-08T13:03:25Z   4.10.0

[...]
    Last Heartbeat Time:   2022-02-09T09:29:17Z
    Last Transition Time:  2022-02-08T13:03:26Z
    Message:               Waiting on Nooba instance to finish initialization
    Reason:                NoobaaInitializing


$ oc describe noobaa 
[...]
    Last Heartbeat Time:   2022-02-08T13:06:40Z
    Last Transition Time:  2022-02-08T13:06:40Z
    Message:               Ceph objectstore user "noobaa-ceph-objectstore-user" is not ready


$ oc describe cephobjectstore 
Name:         ocs-storagecluster-cephobjectstore
Namespace:    openshift-storage
[...]
Status:
  Phase:  Failure
Events:
  Type     Reason           Age                   From                         Message
  ----     ------           ----                  ----                         -------
  Warning  ReconcileFailed  8m23s (x90 over 20h)  rook-ceph-object-controller  failed to create object store deployments: failed to create object store "ocs-storagecluster-cephobjectstore": failed to start rgw pods: failed to create rgw deployment: got empty container for RGW daemon


The following error message was seen in the rook operator logs:

2022-02-09 07:27:10.757391 E | ceph-object-controller: failed to enable KMS. failed to fetch kms token secret "ocs-kms-token": secrets "ocs-kms-token" not found
2022-02-09 07:27:10.768442 E | ceph-object-controller: failed to reconcile CephObjectStore "openshift-storage/ocs-storagecluster-cephobjectstore". failed to create object store deployments: failed to create object store "ocs-storagecluster-cephobjectstore": failed to start rgw pods: failed to create rgw deployment: got empty container for RGW daemon

Since kubernetes authentication method is used, there is no secret called "ocs-kms-token" created in the ODF cluster. 


Version of all relevant components (if applicable):
OCP: 4.10.0-0.nightly-2022-02-07-162517
ODF: odf-operator.v4.10.0  full_version=4.10.0-147

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
No

Steps to Reproduce:
-------------------
1. Deploy an ODF cluster on Baremetal with cluster wide encryption enabled using kubernetes authentication method.
2. Check the status of the storagecluster


Actual results:
---------------
The storgaecluster is in progressing state


Expected results:
-----------------
The storagecluster should succeed.

Comment 13 errata-xmlrpc 2022-04-13 18:53:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1372


Note You need to log in before you can comment on or make changes to this bug.