Bug 2016973

Summary: [KMS] VAULT_SECRET_ENGINE is set to "transit" by default for cluster wide encryption
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Rachael <rgeorge>
Component: ocs-operatorAssignee: Jiffin <jthottan>
Status: CLOSED CURRENTRELEASE QA Contact: Rachael <rgeorge>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.9CC: ebenahar, fbalak, madam, muagarwa, ocs-bugs, odf-bz-bot, rperiyas, shan, sostapov
Target Milestone: ---   
Target Release: ODF 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: v4.9.0-210.ci Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-01-07 17:46:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rachael 2021-10-25 10:23:17 UTC
Description of problem (please be detailed as possible and provide log
snippets):

When deploying an ODF cluster with cluster wide encryption enabled using an external KMS, the VAULT_SECRET_ENGINE variable is set to transit by default, whereas the the KV secret engine is what is supported for cluster wide encryption using KMS. This causes the deployment to fail when using KV-v2 as the backend path (https://bugzilla.redhat.com/show_bug.cgi?id=1975272).

$ oc describe pod rook-ceph-osd-0-748746ff7f-6p6n5
[...]
    Environment:
      KMS_PROVIDER:           vault
      KMS_SERVICE_NAME:       vault
      VAULT_ADDR:             https://vault.qe.rh-ocs.com:8200
      VAULT_BACKEND_PATH:     kv-v2
      VAULT_CACERT:           /etc/vault/vault.ca
      VAULT_CLIENT_CERT:      /etc/vault/vault.crt
      VAULT_CLIENT_KEY:       /etc/vault/vault.key
      VAULT_NAMESPACE:        
      VAULT_SECRET_ENGINE:    transit
      VAULT_TLS_SERVER_NAME:  
      VAULT_TOKEN:            <set to the key 'token' in secret 'ocs-kms-token'>  Optional: false
[...]
Events:
  Type     Reason                 Age                  From               Message
  ----     ------                 ----                 ----               -------
  Normal   Scheduled              2m24s                default-scheduler  Successfully assigned openshift-storage/rook-ceph-osd-0-748746ff7f-6p6n5 to ip-10-0-134-138.us-east-2.compute.internal
  Normal   SuccessfulMountVolume  2m23s                kubelet            MapVolume.MapPodDevice succeeded for volume "pvc-6d169f2a-fd9d-4e59-b752-4b4659c55ff0" globalMapPath "/var/lib/kubelet/plugins/kubernetes.io/aws-ebs/volumeDevices/aws:/us-east-2a/vol-0ccf041eb0edab901"
  Normal   SuccessfulMountVolume  2m23s                kubelet            MapVolume.MapPodDevice succeeded for volume "pvc-6d169f2a-fd9d-4e59-b752-4b4659c55ff0" volumeMapPath "/var/lib/kubelet/pods/d3347c91-3ea3-4fa8-8e69-cfeecfb43939/volumeDevices/kubernetes.io~aws-ebs"
  Normal   AddedInterface         2m21s                multus             Add eth0 [10.128.2.25/23] from openshift-sdn
  Normal   Pulled                 2m21s                kubelet            Container image "quay.io/rhceph-dev/rhceph@sha256:b5ff930b8b35b4ac002f0f34b4be112b3a433b5615f2ea65402a54a84b6edadb" already present on machine
  Normal   Created                2m21s                kubelet            Created container blkdevmapper
  Normal   Started                2m21s                kubelet            Started container blkdevmapper
  Normal   Pulled                 98s (x4 over 2m20s)  kubelet            Container image "quay.io/rhceph-dev/rhceph@sha256:b5ff930b8b35b4ac002f0f34b4be112b3a433b5615f2ea65402a54a84b6edadb" already present on machine
  Normal   Created                98s (x4 over 2m20s)  kubelet            Created container encryption-kms-get-kek
  Normal   Started                97s (x4 over 2m20s)  kubelet            Started container encryption-kms-get-kek
  Warning  BackOff                59s (x8 over 2m18s)  kubelet            Back-off restarting failed container


$ oc logs rook-ceph-osd-0-748746ff7f-6p6n5 -c encryption-kms-get-kek
no encryption key rook-ceph-osd-encryption-key-ocs-deviceset-gp2-0-data-0jcn24 present in vault
["Invalid path for a versioned K/V secrets engine. See the API docs for the appropriate API endpoints to use. If using the Vault CLI, use 'vault kv get' for this operation."]


Version of all relevant components (if applicable):
===================================================
ODF : odf-operator.v4.9.0      OpenShift Data Foundation     4.9.0                Succeeded   full_version=4.9.0-195.ci
OCP: 4.9.0-0.nightly-2021-10-22-102153



Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Yes, this prevents the auto-detection feature from detecting the KV version used in the backend path and hence the deployment with cluster wide encryption using external KMS when using KV-v2 


Is there any workaround available to the best of your knowledge?
Manually editing the ocs-kms-connection-details and setting the VAULT_SECRET_ENGINE to kv.

$ oc get cm ocs-kms-connection-details -n openshift-storage -o yaml
apiVersion: v1
data:
  KMS_PROVIDER: vault
  KMS_SERVICE_NAME: vault
  VAULT_ADDR: https://vault.qe.rh-ocs.com:8200
  VAULT_BACKEND_PATH: kv-v2
  VAULT_CACERT: ocs-kms-ca-secret-dn17hs
  VAULT_CLIENT_CERT: ocs-kms-client-cert-2bivn9
  VAULT_CLIENT_KEY: ocs-kms-client-key-cu8pd1
  VAULT_NAMESPACE: ""
  VAULT_SECRET_ENGINE: kv
  VAULT_TLS_SERVER_NAME: ""
kind: ConfigMap


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
===================

1. Create a backend path in Vault with kv-v2
   $ vault secrets enable -path=test-kv2 kv-v2
   Success! Enabled the kv-v2 secrets engine at: test-kv2/

2. Enter the path created above when deploying OCS with cluster wide encryption using KMS enabled in UI

3. Check the status of the OSD pods


Actual results:
===============
The OSD pods are in CrashLoopBackOff state.

rook-ceph-osd-1-787d4988c9-gdvk4                                  0/2     Init:CrashLoopBackOff   6 (13s ago)     5m52s
rook-ceph-osd-0-748746ff7f-6p6n5                                  0/2     Init:CrashLoopBackOff   6 (15s ago)     6m2s
rook-ceph-osd-2-56899b44c9-4bv52                                  0/2     Init:CrashLoopBackOff   6 (15s ago) 

Expected results:
=================
The deployment should be successful