Description of problem (please be detailed as possible and provide log snippets): When OCS is deployed with cluster wide encryption using KMS enabled and the backend path in Vault server uses kv-v2 secret engine, the deployment fails. The OSD pods are stuck in Init:CrashLoopBackOff $ oc get pods |grep osd rook-ceph-osd-0-b846d49dd-gjvv7 0/2 Init:CrashLoopBackOff 1 31s rook-ceph-osd-1-577948c47b-tdhdz 0/2 Init:CrashLoopBackOff 20 81m rook-ceph-osd-2-55f8458498-dc6xn 0/2 Init:CrashLoopBackOff 18 $ oc logs rook-ceph-osd-0-b846d49dd-gjvv7 -c encryption-kms-get-kek ["Invalid path for a versioned K/V secrets engine. See the API docs for the appropriate API endpoints to use. If using the Vault CLI, use 'vault kv get' for this operation."] The encryption keys, however, were created on the vault server. $ vault kv list -namespace=ocs test-kv2 Keys ---- NOOBAA_ROOT_SECRET_PATH/ rook-ceph-osd-encryption-key-ocs-deviceset-thin-0-data-0lzmj7 rook-ceph-osd-encryption-key-ocs-deviceset-thin-1-data-0476gq rook-ceph-osd-encryption-key-ocs-deviceset-thin-2-data-02hkl6 Version of all relevant components (if applicable): OCP: 4.7.0-0.nightly-2021-03-06-183610 OCS: ocs-operator.v4.7.0-284.ci Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, OCS deployment fails when kv-v2 is used Is there any workaround available to the best of your knowledge? Not that I am aware of Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? Yes Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: No Steps to Reproduce: 1. Create a backend path in Vault with kv-v2 $ vault secrets enable -path=test-kv2 kv-v2 Success! Enabled the kv-v2 secrets engine at: test-kv2/ 2. Enter the path created above when deploying OCS with cluster wide encryption using KMS enabled in UI 3. Edit the ocs-kms-connection-details configmap, as soon as the storagecluster creation starts and set VAULT_BACKEND: v2 $ oc get cm ocs-kms-connection-details -o yaml apiVersion: v1 data: KMS_PROVIDER: vault KMS_SERVICE_NAME: vault VAULT_ADDR: https://vault.qe.rh-ocs.com:8200 VAULT_BACKEND: v2 VAULT_BACKEND_PATH: test-kv2 VAULT_CACERT: ocs-kms-ca-secret-znu27r VAULT_CLIENT_CERT: ocs-kms-client-cert-7od4d VAULT_CLIENT_KEY: ocs-kms-client-key-8obbs VAULT_NAMESPACE: ocs VAULT_TLS_SERVER_NAME: vault.qe.rh-ocs.com 4. Check the status of the OSD pods Actual results: The OSD pods are stuck in Init:CrashLoopBackOff state. Expected results: The deployment should be successful and the OSDs should be up and running.
Our documentation has recommended using KV version 1, since we are in a blocker phase only, we should probably move this to 4.8. Raz, thoughts?
Had an offline discussion with Elad and Rachel, moving it to 4.8
This is not yet in 4.8
OSD are up and disk is crypted: NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop1 7:1 0 512G 0 loop nvme0n1 259:0 0 120G 0 disk |-nvme0n1p1 259:1 0 1M 0 part |-nvme0n1p2 259:2 0 127M 0 part |-nvme0n1p3 259:3 0 384M 0 part /boot `-nvme0n1p4 259:4 0 119.5G 0 part /sysroot nvme1n1 259:5 0 50G 0 disk /var/lib/kubelet/pods/424c72a4-c643-403a-bcd5-cf975ba903c1/volumes/kubernetes.io~aws-ebs/pvc-21138176-061c-495f-a705-3f49d94d3ae4 nvme2n1 259:6 0 512G 0 disk `-ocs-deviceset-gp2-1-data-0vgcrg-block-dmcrypt 253:0 0 512G 0 crypt Check on OCS version 4.8.0-417.ci
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Container Storage 4.8.0 container images bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3003