Bug 2209254

Summary: [Backport-4.11.z][KMS][VAULT] Storage cluster remains in 'Progressing' state during deployment with storage class encryption, despite all pods being up and running.
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: arun kumar mohan <amohan>
Component: ocs-operatorAssignee: arun kumar mohan <amohan>
Status: CLOSED ERRATA QA Contact: Parag Kamble <pakamble>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.13CC: amohan, ebenahar, kramdoss, muagarwa, ocs-bugs, odf-bz-bot, pakamble, sheggodu, skatiyar
Target Milestone: ---Keywords: Regression
Target Release: ODF 4.11.9   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.11.9-2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2192596 Environment:
Last Closed: 2023-07-20 16:12:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2189984, 2192596    
Bug Blocks:    

Description arun kumar mohan 2023-05-23 09:10:04 UTC
+++ This bug was initially created as a clone of Bug #2192596 +++

+++ This bug was initially created as a clone of Bug #2189984 +++

Created attachment 1960169 [details]
must gather logs

Description of problem (please be detailed as possible and provide log
snippests):


Version of all relevant components (if applicable): 4.13


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
I can continue work without any issue


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible? YES


Can this issue reproduce from the UI? YES


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Install ODF operator 
2. Configure kubernetes auth method as mention in DOC: https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.11/html/deploying_openshift_data_foundation_using_bare_metal_infrastructure/deploy-using-local-storage-devices-bm#enabling-cluster-wide-encryprtion-with-the-kubernetes-authentication-using-kms_local-bare-metal
  
3. Create storage system
4. Select enable data encryption for block and file.
5. Select StorageClass Encryption (refer attached screenshot)
6. Click on next and complete storage system creation.


Actual results:
Storagecluster not moved out of 'Progressing' Phase.

Expected results:
Storage cluster should be in 'Ready' state.

Additional info:

The storage cluster has been enabled with storage class encryption and the 'ocs-storagecluster-ceph-rbd-encrypted' storage class has been created. However, the storage cluster remains in a 'Progressing' state even though all pods are up and running.

Although I am able to use all the functionality without any issue.

StorgeCluster Details
==============================================
❯ oc get storagecluster   -n openshift-storage                                                                                                                 ─╯
NAME                 AGE   PHASE         EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   23m   Progressing              2023-04-26T16:36:26Z   4.13.0

Storageclass Output
===============================================
❯ oc get storageclass                                                                                                                                          ─╯
NAME                                    PROVISIONER                             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
gp2-csi                                 ebs.csi.aws.com                         Delete          WaitForFirstConsumer   true                   3h54m
gp3-csi (default)                       ebs.csi.aws.com                         Delete          WaitForFirstConsumer   true                   3h54m
ocs-storagecluster-ceph-rbd             openshift-storage.rbd.csi.ceph.com      Delete          Immediate              true                   19m
ocs-storagecluster-ceph-rbd-encrypted   openshift-storage.rbd.csi.ceph.com      Delete          Immediate              false                  19m
ocs-storagecluster-cephfs               openshift-storage.cephfs.csi.ceph.com   Delete          Immediate              true                   19m

--- Additional comment from RHEL Program Management on 2023-04-26 16:54:20 UTC ---

This bug having no release flag set previously, is now set with release flag 'odf‑4.13.0' to '?', and so is being proposed to be fixed at the ODF 4.13.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag.

--- Additional comment from Parag Kamble on 2023-04-26 16:56:11 UTC ---

While Configuring the encryption This UI options are selected.

--- Additional comment from Parag Kamble on 2023-04-26 16:57:13 UTC ---

Advance setting of KMS configuration.

--- Additional comment from Sanjal Katiyar on 2023-04-27 06:08:56 UTC ---

issue is with noobaa reconciler (ocs operator)...
ocs operator is passing the info down to noobaa despite the fact that it is storageclass wide encryption only
https://github.com/red-hat-storage/ocs-operator/pull/1719


sc.Spec.Encryption.KeyManagementService.Enable >> tells us that KMS is enabled.
sc.Spec.Encryption.ClusterWide or sc.Spec.Encryption.Enable >> tells us that it is clusterWide
sc.Spec.Encryption.storageClass >> tells us that it is storageClassWide

--- Additional comment from Sanjal Katiyar on 2023-04-27 06:27:45 UTC ---

also since now noobaa is expecting KMS validation (even though it was storageclass wide encryption not clusterwide), we are getting following: "message: 'failed to get the authentication token: authentication returned nil auth info'"

--- Additional comment from RHEL Program Management on 2023-04-27 08:10:21 UTC ---

This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP.

--- Additional comment from arun kumar mohan on 2023-04-27 10:43:23 UTC ---

PR up for review: https://github.com/red-hat-storage/ocs-operator/pull/2040

--- Additional comment from Sanjal Katiyar on 2023-04-27 10:46:00 UTC ---

MODIFIED will be when PR is merged in 4.13... we need acks for 4.13 for this BZ as well...

--- Additional comment from arun kumar mohan on 2023-05-02 11:03:57 UTC ---

@ebenahar , can you please provide us with QA_ACK+ flag?

--- Additional comment from RHEL Program Management on 2023-05-02 11:22:41 UTC ---

This BZ is being approved for ODF 4.13.0 release, upon receipt of the 3 ACKs (PM,Devel,QA) for the release flag 'odf‑4.13.0

--- Additional comment from RHEL Program Management on 2023-05-02 11:22:41 UTC ---

Since this bug has been approved for ODF 4.13.0 release, through release flag 'odf-4.13.0+', the Target Release is being set to 'ODF 4.13.0

--- Additional comment from RHEL Program Management on 2023-05-02 12:46:17 UTC ---

This bug having no release flag set previously, is now set with release flag 'odf‑4.13.0' to '?', and so is being proposed to be fixed at the ODF 4.13.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag.

--- Additional comment from RHEL Program Management on 2023-05-02 12:46:17 UTC ---

The 'Target Release' is not to be set manually at the Red Hat OpenShift Data Foundation product.

The 'Target Release' will be auto set appropriately, after the 3 Acks (pm,devel,qa) are set to "+" for a specific release flag and that release flag gets auto set to "+".

--- Additional comment from RHEL Program Management on 2023-05-02 12:46:17 UTC ---

This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP.

--- Additional comment from arun kumar mohan on 2023-05-03 14:03:25 UTC ---

Updating the internal whiteboard to include next 4.12.z release

PR up for 4.12 branch: https://github.com/red-hat-storage/ocs-operator/pull/2045

--- Additional comment from RHEL Program Management on 2023-05-16 14:02:33 UTC ---

This BZ is being approved for an ODF 4.12.z z-stream update, upon receipt of the 3 ACKs (PM,Devel,QA) for the release flag 'odf‑4.12.z', and having been marked for an approved z-stream update

--- Additional comment from RHEL Program Management on 2023-05-16 14:02:33 UTC ---

Since this bug has been approved for ODF 4.12.4 release, through release flag 'odf-4.12.z+', and appropriate update number entry at the 'Internal Whiteboard', the Target Release is being set to 'ODF 4.12.4'

--- Additional comment from Sunil Kumar Acharya on 2023-05-22 09:21:38 UTC ---

Please backport the fix to 4.12 and update the RDT appropriately.

--- Additional comment from Sanjal Katiyar on 2023-05-22 09:24:49 UTC ---

Hi Arun,
plz create a BZ for 4.11.z as well, once 4.12.z backport is merged... also plz update the RDT.

Comment 7 arun kumar mohan 2023-07-07 05:21:41 UTC
PR merged now (thanks Mudit)...

Comment 15 errata-xmlrpc 2023-07-20 16:12:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.11.9 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:4238