1936858 – OCS deployment with KMS fails when kv-v2 is used for backend path

Bug 1936858 - OCS deployment with KMS fails when kv-v2 is used for backend path

Summary: OCS deployment with KMS fails when kv-v2 is used for backend path

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	rook
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	OCS 4.8.0
Assignee:	Sébastien Han
QA Contact:	Shay Rozen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-03-09 11:18 UTC by Rachael
Modified:	2021-08-03 18:15 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-08-03 18:15:14 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	rook rook pull 7374	0	None	open	ceph: fix vault when used with kv version 2	2021-03-09 15:29:52 UTC
Red Hat Product Errata	RHBA-2021:3003	0	None	None	None	2021-08-03 18:15:46 UTC

Description Rachael 2021-03-09 11:18:40 UTC

Description of problem (please be detailed as possible and provide log
snippets):

When OCS is deployed with cluster wide encryption using KMS enabled and the backend path in Vault server uses kv-v2 secret engine, the deployment fails. The OSD pods are stuck in Init:CrashLoopBackOff

$ oc get pods |grep osd
rook-ceph-osd-0-b846d49dd-gjvv7                                   0/2     Init:CrashLoopBackOff   1          31s
rook-ceph-osd-1-577948c47b-tdhdz                                  0/2     Init:CrashLoopBackOff   20         81m
rook-ceph-osd-2-55f8458498-dc6xn                                  0/2     Init:CrashLoopBackOff   18  

$ oc logs rook-ceph-osd-0-b846d49dd-gjvv7 -c encryption-kms-get-kek
["Invalid path for a versioned K/V secrets engine. See the API docs for the appropriate API endpoints to use. If using the Vault CLI, use 'vault kv get' for this operation."]

The encryption keys, however, were created on the vault server.

$ vault kv list -namespace=ocs test-kv2
Keys
----
NOOBAA_ROOT_SECRET_PATH/
rook-ceph-osd-encryption-key-ocs-deviceset-thin-0-data-0lzmj7
rook-ceph-osd-encryption-key-ocs-deviceset-thin-1-data-0476gq
rook-ceph-osd-encryption-key-ocs-deviceset-thin-2-data-02hkl6

Version of all relevant components (if applicable):
OCP: 4.7.0-0.nightly-2021-03-06-183610
OCS: ocs-operator.v4.7.0-284.ci


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, OCS deployment fails when kv-v2 is used 

Is there any workaround available to the best of your knowledge?
Not that I am aware of

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
No

Steps to Reproduce:
1. Create a backend path in Vault with kv-v2
   $ vault secrets enable -path=test-kv2 kv-v2
   Success! Enabled the kv-v2 secrets engine at: test-kv2/

2. Enter the path created above when deploying OCS with cluster wide encryption using KMS enabled in UI

3. Edit the ocs-kms-connection-details configmap, as soon as the storagecluster creation starts and set VAULT_BACKEND: v2

   $ oc get cm ocs-kms-connection-details -o yaml
   apiVersion: v1
   data:
     KMS_PROVIDER: vault
     KMS_SERVICE_NAME: vault
     VAULT_ADDR: https://vault.qe.rh-ocs.com:8200
     VAULT_BACKEND: v2
     VAULT_BACKEND_PATH: test-kv2
     VAULT_CACERT: ocs-kms-ca-secret-znu27r
     VAULT_CLIENT_CERT: ocs-kms-client-cert-7od4d
     VAULT_CLIENT_KEY: ocs-kms-client-key-8obbs
     VAULT_NAMESPACE: ocs
     VAULT_TLS_SERVER_NAME: vault.qe.rh-ocs.com

4. Check the status of the OSD pods

Actual results:

The OSD pods are stuck in Init:CrashLoopBackOff state.

Expected results:

The deployment should be successful and the OSDs should be up and running.

Comment 3 Sébastien Han 2021-03-09 15:37:44 UTC

Our documentation has recommended using KV version 1, since we are in a blocker phase only, we should probably move this to 4.8.
Raz, thoughts?

Comment 4 Mudit Agarwal 2021-03-11 15:55:46 UTC

Had an offline discussion with Elad and Rachel, moving it to 4.8

Comment 5 Mudit Agarwal 2021-03-12 08:07:06 UTC

This is not yet in 4.8

Comment 9 Shay Rozen 2021-06-15 17:36:06 UTC

OSD are up and disk is crypted:

NAME                                            MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
loop1                                             7:1    0   512G  0 loop  
nvme0n1                                         259:0    0   120G  0 disk  
|-nvme0n1p1                                     259:1    0     1M  0 part  
|-nvme0n1p2                                     259:2    0   127M  0 part  
|-nvme0n1p3                                     259:3    0   384M  0 part  /boot
`-nvme0n1p4                                     259:4    0 119.5G  0 part  /sysroot
nvme1n1                                         259:5    0    50G  0 disk  /var/lib/kubelet/pods/424c72a4-c643-403a-bcd5-cf975ba903c1/volumes/kubernetes.io~aws-ebs/pvc-21138176-061c-495f-a705-3f49d94d3ae4
nvme2n1                                         259:6    0   512G  0 disk  
`-ocs-deviceset-gp2-1-data-0vgcrg-block-dmcrypt 253:0    0   512G  0 crypt 

Check on OCS version 4.8.0-417.ci

Comment 11 errata-xmlrpc 2021-08-03 18:15:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.8.0 container images bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3003

Note You need to log in before you can comment on or make changes to this bug.