Bug 2209254 - [Backport-4.11.z][KMS][VAULT] Storage cluster remains in 'Progressing' state during deployment with storage class encryption, despite all pods being up and running.
Summary: [Backport-4.11.z][KMS][VAULT] Storage cluster remains in 'Progressing' state ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.13
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.11.9
Assignee: arun kumar mohan
QA Contact: Parag Kamble
URL:
Whiteboard:
Depends On: 2189984 2192596
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-23 09:10 UTC by arun kumar mohan
Modified: 2023-08-09 17:00 UTC (History)
9 users (show)

Fixed In Version: 4.11.9-2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2192596
Environment:
Last Closed: 2023-07-20 16:12:43 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage ocs-operator pull 2064 0 None open Bug 2209254: [release-4.11] Fix encryption enablement in Noobaa 2023-05-30 07:27:24 UTC
Red Hat Product Errata RHSA-2023:4238 0 None None None 2023-07-20 16:12:51 UTC

Description arun kumar mohan 2023-05-23 09:10:04 UTC
+++ This bug was initially created as a clone of Bug #2192596 +++

+++ This bug was initially created as a clone of Bug #2189984 +++

Created attachment 1960169 [details]
must gather logs

Description of problem (please be detailed as possible and provide log
snippests):


Version of all relevant components (if applicable): 4.13


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
I can continue work without any issue


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible? YES


Can this issue reproduce from the UI? YES


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Install ODF operator 
2. Configure kubernetes auth method as mention in DOC: https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.11/html/deploying_openshift_data_foundation_using_bare_metal_infrastructure/deploy-using-local-storage-devices-bm#enabling-cluster-wide-encryprtion-with-the-kubernetes-authentication-using-kms_local-bare-metal
  
3. Create storage system
4. Select enable data encryption for block and file.
5. Select StorageClass Encryption (refer attached screenshot)
6. Click on next and complete storage system creation.


Actual results:
Storagecluster not moved out of 'Progressing' Phase.

Expected results:
Storage cluster should be in 'Ready' state.

Additional info:

The storage cluster has been enabled with storage class encryption and the 'ocs-storagecluster-ceph-rbd-encrypted' storage class has been created. However, the storage cluster remains in a 'Progressing' state even though all pods are up and running.

Although I am able to use all the functionality without any issue.

StorgeCluster Details
==============================================
❯ oc get storagecluster   -n openshift-storage                                                                                                                 ─╯
NAME                 AGE   PHASE         EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   23m   Progressing              2023-04-26T16:36:26Z   4.13.0

Storageclass Output
===============================================
❯ oc get storageclass                                                                                                                                          ─╯
NAME                                    PROVISIONER                             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
gp2-csi                                 ebs.csi.aws.com                         Delete          WaitForFirstConsumer   true                   3h54m
gp3-csi (default)                       ebs.csi.aws.com                         Delete          WaitForFirstConsumer   true                   3h54m
ocs-storagecluster-ceph-rbd             openshift-storage.rbd.csi.ceph.com      Delete          Immediate              true                   19m
ocs-storagecluster-ceph-rbd-encrypted   openshift-storage.rbd.csi.ceph.com      Delete          Immediate              false                  19m
ocs-storagecluster-cephfs               openshift-storage.cephfs.csi.ceph.com   Delete          Immediate              true                   19m

--- Additional comment from RHEL Program Management on 2023-04-26 16:54:20 UTC ---

This bug having no release flag set previously, is now set with release flag 'odf‑4.13.0' to '?', and so is being proposed to be fixed at the ODF 4.13.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag.

--- Additional comment from Parag Kamble on 2023-04-26 16:56:11 UTC ---

While Configuring the encryption This UI options are selected.

--- Additional comment from Parag Kamble on 2023-04-26 16:57:13 UTC ---

Advance setting of KMS configuration.

--- Additional comment from Sanjal Katiyar on 2023-04-27 06:08:56 UTC ---

issue is with noobaa reconciler (ocs operator)...
ocs operator is passing the info down to noobaa despite the fact that it is storageclass wide encryption only
https://github.com/red-hat-storage/ocs-operator/pull/1719


sc.Spec.Encryption.KeyManagementService.Enable >> tells us that KMS is enabled.
sc.Spec.Encryption.ClusterWide or sc.Spec.Encryption.Enable >> tells us that it is clusterWide
sc.Spec.Encryption.storageClass >> tells us that it is storageClassWide

--- Additional comment from Sanjal Katiyar on 2023-04-27 06:27:45 UTC ---

also since now noobaa is expecting KMS validation (even though it was storageclass wide encryption not clusterwide), we are getting following: "message: 'failed to get the authentication token: authentication returned nil auth info'"

--- Additional comment from RHEL Program Management on 2023-04-27 08:10:21 UTC ---

This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP.

--- Additional comment from arun kumar mohan on 2023-04-27 10:43:23 UTC ---

PR up for review: https://github.com/red-hat-storage/ocs-operator/pull/2040

--- Additional comment from Sanjal Katiyar on 2023-04-27 10:46:00 UTC ---

MODIFIED will be when PR is merged in 4.13... we need acks for 4.13 for this BZ as well...

--- Additional comment from arun kumar mohan on 2023-05-02 11:03:57 UTC ---

@ebenahar , can you please provide us with QA_ACK+ flag?

--- Additional comment from RHEL Program Management on 2023-05-02 11:22:41 UTC ---

This BZ is being approved for ODF 4.13.0 release, upon receipt of the 3 ACKs (PM,Devel,QA) for the release flag 'odf‑4.13.0

--- Additional comment from RHEL Program Management on 2023-05-02 11:22:41 UTC ---

Since this bug has been approved for ODF 4.13.0 release, through release flag 'odf-4.13.0+', the Target Release is being set to 'ODF 4.13.0

--- Additional comment from RHEL Program Management on 2023-05-02 12:46:17 UTC ---

This bug having no release flag set previously, is now set with release flag 'odf‑4.13.0' to '?', and so is being proposed to be fixed at the ODF 4.13.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag.

--- Additional comment from RHEL Program Management on 2023-05-02 12:46:17 UTC ---

The 'Target Release' is not to be set manually at the Red Hat OpenShift Data Foundation product.

The 'Target Release' will be auto set appropriately, after the 3 Acks (pm,devel,qa) are set to "+" for a specific release flag and that release flag gets auto set to "+".

--- Additional comment from RHEL Program Management on 2023-05-02 12:46:17 UTC ---

This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP.

--- Additional comment from arun kumar mohan on 2023-05-03 14:03:25 UTC ---

Updating the internal whiteboard to include next 4.12.z release

PR up for 4.12 branch: https://github.com/red-hat-storage/ocs-operator/pull/2045

--- Additional comment from RHEL Program Management on 2023-05-16 14:02:33 UTC ---

This BZ is being approved for an ODF 4.12.z z-stream update, upon receipt of the 3 ACKs (PM,Devel,QA) for the release flag 'odf‑4.12.z', and having been marked for an approved z-stream update

--- Additional comment from RHEL Program Management on 2023-05-16 14:02:33 UTC ---

Since this bug has been approved for ODF 4.12.4 release, through release flag 'odf-4.12.z+', and appropriate update number entry at the 'Internal Whiteboard', the Target Release is being set to 'ODF 4.12.4'

--- Additional comment from Sunil Kumar Acharya on 2023-05-22 09:21:38 UTC ---

Please backport the fix to 4.12 and update the RDT appropriately.

--- Additional comment from Sanjal Katiyar on 2023-05-22 09:24:49 UTC ---

Hi Arun,
plz create a BZ for 4.11.z as well, once 4.12.z backport is merged... also plz update the RDT.

Comment 7 arun kumar mohan 2023-07-07 05:21:41 UTC
PR merged now (thanks Mudit)...

Comment 15 errata-xmlrpc 2023-07-20 16:12:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.11.9 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:4238


Note You need to log in before you can comment on or make changes to this bug.