Bug 2190508
| Summary: | [KMS][VAULT][UI]Enabling the Cluster-wide encryption with kubeernetes authentication result in an error state for the storage cluster. | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Parag Kamble <pakamble> |
| Component: | ocs-operator | Assignee: | Sanjal Katiyar <skatiyar> |
| Status: | CLOSED NOTABUG | QA Contact: | Parag Kamble <pakamble> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.13 | CC: | ocs-bugs, odf-bz-bot, sapillai, skatiyar |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-06-09 17:46:30 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Possibly a missing configuration. But proposing it as a blocker until we have more info. We might be missing some connection configuration for vault: Below are connection configuration from a 4.12 BZ $ oc get cm ocs-kms-connection-details -n openshift-storage -o yaml apiVersion: v1 data: KMS_PROVIDER: vault KMS_SERVICE_NAME: vault VAULT_ADDR: https://vault.qe.rh-ocs.com:8200 VAULT_AUTH_KUBERNETES_ROLE: odf-rook-ceph-op VAULT_AUTH_METHOD: kubernetes VAULT_BACKEND_PATH: rook VAULT_CACERT: ocs-kms-ca-secret-afq7gj VAULT_CLIENT_CERT: ocs-kms-client-cert-e4plrg VAULT_CLIENT_KEY: ocs-kms-client-key-y5whe6 VAULT_NAMESPACE: odf VAULT_TLS_SERVER_NAME: vault.qe.rh-ocs.com kind: ConfigMap This is the current connection configuration when using UI. ❯ oc get cm ocs-kms-connection-details -n openshift-storage -o yaml apiVersion: v1 data: KMS_PROVIDER: vault KMS_SERVICE_NAME: encrypt-connetion VAULT_ADDR: https://vault-cluster.vault.2467e33a-73f9-408b-b9ff-b0476a654d30.aws.hashicorp.cloud:8200 VAULT_AUTH_KUBERNETES_ROLE: odf-rook-ceph-op VAULT_AUTH_METHOD: kubernetes VAULT_AUTH_MOUNT_PATH: /v1/vault/kubernetes/login VAULT_BACKEND_PATH: odf/ VAULT_NAMESPACE: admin VAULT_TLS_SERVER_NAME: "" kind: ConfigMap couple of more observations: - Selecting only `StorageClass` encryption with KMS in the UI, does not pass the `kms connection` details to ceph cluster. That's a different issue. - For this BZ, I have suggested to try with certs as well. (In reply to Santosh Pillai from comment #10) > couple of more observations: > - Selecting only `StorageClass` encryption with KMS in the UI, does not pass > the `kms connection` details to ceph cluster. That's a different issue. This is not an issue. Expected behavior. > - For this BZ, I have suggested to try with certs as well. Moving it back to Sanjal. I don't see any changes in Rook with respect to validating the kms connection in last release. So I don't think this is rook issue. I feel we are not passing all the kms connections credentials that we need via the UI. Suggesting QE and UI team to confirm if we are missing any KMS connection details while using the UI. thanks Santosh and Parag for looking into it... as discussed offline and pointed out in: https://bugzilla.redhat.com/show_bug.cgi?id=2190508#c4 we need to follow all the required configurations steps to set up KMS with clusterwide encryption... UI only adds whatever is provided by user to the ConfigMap (which to best of my knowledge UI is doing correctly)... OCS passes that info down to rook (which to best of my knowledge OCS is doing correctly as well)... all that's left is to follow correct steps (if any necessary step is missing from the documentation we should be documenting it properly)... I am moving the BZ to ON_QA as so far: https://bugzilla.redhat.com/show_bug.cgi?id=2190508#c12 nothing seems like which needs to be fixed from any component... please feel free to "failedQA" this BZ if we are sure nothing is wrong on configuration side and UI/OCS/Rook is missing anything... Based on my latest discussion with Parag, I believe this was fixed. It was related to some missing configuration when using the UI. The earlier issue I faced was due to a missed configuration that was specific to the cluster-wide encryption. After following the documentation, I was able to configure cluster-wide encryption with the external KMS service. Documentation Link: https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.12/html-single/deploying_openshift_data_foundation_using_amazon_web_services/index#enabling-cluster-wide-encryprtion-with-the-kubernetes-authentication-using-kms_cloud-storage Since this was not a product issue, I am closing the BZ. |
Created attachment 1960867 [details] Must Gather logs Description of problem (please be detailed as possible and provide log snippests): Enabling both cluster-wide encryption and storage class encryption simultaneously can result in an error state for the storage cluster. Not all pods are getting up and the Ceph cluster status shows errors. However, if we enable one service at a time, it works. Version of all relevant components (if applicable): 4.13 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, The storage cluster is not forming, hence it is a blocker for all use cases.. Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? YES Can this issue reproduce from the UI? YES If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Start to Install ODF operator 2. Create a storage System 3. Select Cluster-wide encryption and StorageClass encryption checkbox. 4. Continue to create storage cluster. 5. Wait till cluster became 'ready' state. Actual results: The StorageCluster is not forming completely, showing errors in logs. Expected results: The storage cluster should be in a healthy state, and both cluster-wide encryption and storage class encryption should work together Additional info: This issue occurs when both cluster-wide encryption and storage class encryption are enabled together. Storagecluster remains in 'Progressing' state ============================================= ❯ oc get storagecluster -n openshift-storage ─╯ NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 54m Progressing 2023-04-28T14:18:03Z 4.13.0 cephcluster showing following errors ===================================== ❯ oc get cephcluster -n openshift-storage ─╯ NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL FSID ocs-storagecluster-cephcluster /var/lib/rook 3 60m Progressing failed to perform validation before cluster creation: failed to validate kms connection details: failed to get backend version: failed to initialize vault client: failed to get vault authentication token for kubernetes authentication (missing Service Account?): authentication returned nil auth info