Bug 2190508

Summary: [KMS][VAULT][UI]Enabling the Cluster-wide encryption with kubeernetes authentication result in an error state for the storage cluster.
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Parag Kamble <pakamble>
Component: ocs-operatorAssignee: Sanjal Katiyar <skatiyar>
Status: CLOSED NOTABUG QA Contact: Parag Kamble <pakamble>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.13CC: ocs-bugs, odf-bz-bot, sapillai, skatiyar
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-06-09 17:46:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Parag Kamble 2023-04-28 15:22:05 UTC
Created attachment 1960867 [details]
Must Gather logs

Description of problem (please be detailed as possible and provide log
snippests):
Enabling both cluster-wide encryption and storage class encryption simultaneously can result in an error state for the storage cluster. Not all pods are getting up and the Ceph cluster status shows errors. However, if we enable one service at a time, it works.

Version of all relevant components (if applicable): 4.13


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)? Yes, The storage cluster is not forming, hence it is a blocker for all use cases.. 


Is there any workaround available to the best of your knowledge? No


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible? 
YES


Can this issue reproduce from the UI?
YES

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Start to Install ODF operator 
2. Create a storage System
3. Select Cluster-wide encryption and StorageClass encryption checkbox.
4. Continue to create storage cluster.
5. Wait till cluster became 'ready' state.


Actual results:
The StorageCluster is not forming completely, showing errors in logs.


Expected results:
The storage cluster should be in a healthy state, and both cluster-wide encryption and storage class encryption should work together

Additional info:

This issue occurs when both cluster-wide encryption and storage class encryption are enabled together.

Storagecluster remains in 'Progressing' state
=============================================
❯ oc get storagecluster   -n openshift-storage                                                                                                                            ─╯
NAME                 AGE   PHASE         EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   54m   Progressing              2023-04-28T14:18:03Z   4.13.0

cephcluster showing following errors
=====================================
❯ oc get cephcluster -n openshift-storage                                                                                                                                 ─╯
NAME                             DATADIRHOSTPATH   MONCOUNT   AGE   PHASE         MESSAGE                                                                                                                                                                                                                                                                                                       HEALTH   EXTERNAL   FSID
ocs-storagecluster-cephcluster   /var/lib/rook     3          60m   Progressing   failed to perform validation before cluster creation: failed to validate kms connection details: failed to get backend version: failed to initialize vault client: failed to get vault authentication token for kubernetes authentication (missing Service Account?): authentication returned nil auth info

Comment 8 Santosh Pillai 2023-05-03 13:45:39 UTC
Possibly a missing configuration. But proposing it as a blocker until we have more info.

Comment 9 Santosh Pillai 2023-05-03 13:47:48 UTC
We might be missing some connection configuration for vault:

Below are connection configuration from a 4.12 BZ
$ oc get cm ocs-kms-connection-details -n openshift-storage -o yaml
apiVersion: v1
data:
  KMS_PROVIDER: vault
  KMS_SERVICE_NAME: vault
  VAULT_ADDR: https://vault.qe.rh-ocs.com:8200
  VAULT_AUTH_KUBERNETES_ROLE: odf-rook-ceph-op
  VAULT_AUTH_METHOD: kubernetes
  VAULT_BACKEND_PATH: rook
  VAULT_CACERT: ocs-kms-ca-secret-afq7gj
  VAULT_CLIENT_CERT: ocs-kms-client-cert-e4plrg
  VAULT_CLIENT_KEY: ocs-kms-client-key-y5whe6
  VAULT_NAMESPACE: odf
  VAULT_TLS_SERVER_NAME: vault.qe.rh-ocs.com
kind: ConfigMap


This is the current connection configuration when using UI.

❯ oc get cm ocs-kms-connection-details -n openshift-storage -o yaml
apiVersion: v1
data:
  KMS_PROVIDER: vault
  KMS_SERVICE_NAME: encrypt-connetion
  VAULT_ADDR: https://vault-cluster.vault.2467e33a-73f9-408b-b9ff-b0476a654d30.aws.hashicorp.cloud:8200
  VAULT_AUTH_KUBERNETES_ROLE: odf-rook-ceph-op
  VAULT_AUTH_METHOD: kubernetes
  VAULT_AUTH_MOUNT_PATH: /v1/vault/kubernetes/login
  VAULT_BACKEND_PATH: odf/
  VAULT_NAMESPACE: admin
  VAULT_TLS_SERVER_NAME: ""
kind: ConfigMap

Comment 10 Santosh Pillai 2023-05-03 14:47:46 UTC
couple of more observations:
- Selecting only `StorageClass` encryption with KMS in the UI, does not pass the `kms connection` details to ceph cluster. That's a different issue.
- For this BZ, I have suggested to try with certs as well.

Comment 11 Santosh Pillai 2023-05-03 15:39:44 UTC
(In reply to Santosh Pillai from comment #10)
> couple of more observations:
> - Selecting only `StorageClass` encryption with KMS in the UI, does not pass
> the `kms connection` details to ceph cluster. That's a different issue.

This is not an issue. Expected behavior.

> - For this BZ, I have suggested to try with certs as well.

Comment 12 Santosh Pillai 2023-05-04 09:58:44 UTC
Moving it back to Sanjal. I don't see any changes in Rook with respect to validating the kms connection in last release. So I don't think this is rook issue. I feel we are not passing all the kms connections credentials that we need via the UI. Suggesting QE and UI team to confirm if we are missing any KMS connection details while using the UI.

Comment 13 Sanjal Katiyar 2023-05-04 10:06:56 UTC
thanks Santosh and Parag for looking into it...
as discussed offline and pointed out in: https://bugzilla.redhat.com/show_bug.cgi?id=2190508#c4 we need to follow all the required configurations steps to set up KMS with clusterwide encryption... UI only adds whatever is provided by user to the ConfigMap (which to best of my knowledge UI is doing correctly)... OCS passes that info down to rook (which to best of my knowledge OCS is doing correctly as well)...
all that's left is to follow correct steps (if any necessary step is missing from the documentation we should be documenting it properly)...

Comment 14 Sanjal Katiyar 2023-05-04 10:10:19 UTC
I am moving the BZ to ON_QA as so far: https://bugzilla.redhat.com/show_bug.cgi?id=2190508#c12 nothing seems like which needs to be fixed from any component...

Comment 15 Sanjal Katiyar 2023-05-04 10:12:02 UTC
please feel free to "failedQA" this BZ if we are sure nothing is wrong on configuration side and UI/OCS/Rook is missing anything...

Comment 17 Santosh Pillai 2023-05-22 03:54:42 UTC
Based on my latest discussion with Parag, I believe this was fixed. It was related to some missing configuration when using the UI.

Comment 18 Parag Kamble 2023-06-09 17:46:30 UTC
The earlier issue I faced was due to a missed configuration that was specific to the cluster-wide encryption. After following the documentation, I was able to configure cluster-wide encryption with the external KMS service.

Documentation Link: https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.12/html-single/deploying_openshift_data_foundation_using_amazon_web_services/index#enabling-cluster-wide-encryprtion-with-the-kubernetes-authentication-using-kms_cloud-storage

Since this was not a product issue, I am closing the BZ.