Bug 1936858

Summary: OCS deployment with KMS fails when kv-v2 is used for backend path
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Rachael <rgeorge>
Component: rookAssignee: Sébastien Han <shan>
Status: CLOSED ERRATA QA Contact: Shay Rozen <srozen>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.7CC: ebenahar, jthottan, madam, muagarwa, ocs-bugs, ratamir, shan
Target Milestone: ---   
Target Release: OCS 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-08-03 18:15:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rachael 2021-03-09 11:18:40 UTC
Description of problem (please be detailed as possible and provide log
snippets):

When OCS is deployed with cluster wide encryption using KMS enabled and the backend path in Vault server uses kv-v2 secret engine, the deployment fails. The OSD pods are stuck in Init:CrashLoopBackOff

$ oc get pods |grep osd
rook-ceph-osd-0-b846d49dd-gjvv7                                   0/2     Init:CrashLoopBackOff   1          31s
rook-ceph-osd-1-577948c47b-tdhdz                                  0/2     Init:CrashLoopBackOff   20         81m
rook-ceph-osd-2-55f8458498-dc6xn                                  0/2     Init:CrashLoopBackOff   18  

$ oc logs rook-ceph-osd-0-b846d49dd-gjvv7 -c encryption-kms-get-kek
["Invalid path for a versioned K/V secrets engine. See the API docs for the appropriate API endpoints to use. If using the Vault CLI, use 'vault kv get' for this operation."]

The encryption keys, however, were created on the vault server.

$ vault kv list -namespace=ocs test-kv2
Keys
----
NOOBAA_ROOT_SECRET_PATH/
rook-ceph-osd-encryption-key-ocs-deviceset-thin-0-data-0lzmj7
rook-ceph-osd-encryption-key-ocs-deviceset-thin-1-data-0476gq
rook-ceph-osd-encryption-key-ocs-deviceset-thin-2-data-02hkl6

Version of all relevant components (if applicable):
OCP: 4.7.0-0.nightly-2021-03-06-183610
OCS: ocs-operator.v4.7.0-284.ci


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, OCS deployment fails when kv-v2 is used 

Is there any workaround available to the best of your knowledge?
Not that I am aware of

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
No

Steps to Reproduce:
1. Create a backend path in Vault with kv-v2
   $ vault secrets enable -path=test-kv2 kv-v2
   Success! Enabled the kv-v2 secrets engine at: test-kv2/

2. Enter the path created above when deploying OCS with cluster wide encryption using KMS enabled in UI

3. Edit the ocs-kms-connection-details configmap, as soon as the storagecluster creation starts and set VAULT_BACKEND: v2

   $ oc get cm ocs-kms-connection-details -o yaml
   apiVersion: v1
   data:
     KMS_PROVIDER: vault
     KMS_SERVICE_NAME: vault
     VAULT_ADDR: https://vault.qe.rh-ocs.com:8200
     VAULT_BACKEND: v2
     VAULT_BACKEND_PATH: test-kv2
     VAULT_CACERT: ocs-kms-ca-secret-znu27r
     VAULT_CLIENT_CERT: ocs-kms-client-cert-7od4d
     VAULT_CLIENT_KEY: ocs-kms-client-key-8obbs
     VAULT_NAMESPACE: ocs
     VAULT_TLS_SERVER_NAME: vault.qe.rh-ocs.com

4. Check the status of the OSD pods

Actual results:

The OSD pods are stuck in Init:CrashLoopBackOff state.

Expected results:

The deployment should be successful and the OSDs should be up and running.

Comment 3 Sébastien Han 2021-03-09 15:37:44 UTC
Our documentation has recommended using KV version 1, since we are in a blocker phase only, we should probably move this to 4.8.
Raz, thoughts?

Comment 4 Mudit Agarwal 2021-03-11 15:55:46 UTC
Had an offline discussion with Elad and Rachel, moving it to 4.8

Comment 5 Mudit Agarwal 2021-03-12 08:07:06 UTC
This is not yet in 4.8

Comment 9 Shay Rozen 2021-06-15 17:36:06 UTC
OSD are up and disk is crypted:

NAME                                            MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
loop1                                             7:1    0   512G  0 loop  
nvme0n1                                         259:0    0   120G  0 disk  
|-nvme0n1p1                                     259:1    0     1M  0 part  
|-nvme0n1p2                                     259:2    0   127M  0 part  
|-nvme0n1p3                                     259:3    0   384M  0 part  /boot
`-nvme0n1p4                                     259:4    0 119.5G  0 part  /sysroot
nvme1n1                                         259:5    0    50G  0 disk  /var/lib/kubelet/pods/424c72a4-c643-403a-bcd5-cf975ba903c1/volumes/kubernetes.io~aws-ebs/pvc-21138176-061c-495f-a705-3f49d94d3ae4
nvme2n1                                         259:6    0   512G  0 disk  
`-ocs-deviceset-gp2-1-data-0vgcrg-block-dmcrypt 253:0    0   512G  0 crypt 

Check on OCS version 4.8.0-417.ci

Comment 11 errata-xmlrpc 2021-08-03 18:15:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.8.0 container images bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3003