Bug 1941836 - Keys in vault are never deleted when PVC/PV gets deleted
Summary: Keys in vault are never deleted when PVC/PV gets deleted
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: documentation
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: OCS 4.7.0
Assignee: Olive Lakra
QA Contact: Rachael
URL:
Whiteboard:
Depends On:
Blocks: 1938134
TreeView+ depends on / blocked
 
Reported: 2021-03-22 21:37 UTC by Jean-Charles Lopez
Modified: 2022-03-02 07:11 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
.Keys in Hashicorp Vault are not deleted when PVCs are deleted The Hashicorp Vault Token in the Namespace of the Tenant is deleted before the PVCs in the Namespace are removed. This means that not all metadata of the PVCs can be removed, notably, the encryption key for the PVC that is stored in Hashicorp Vault. To work around this issue, remove PVCs from a Tenant's Namespace before removing the Secret that contains the Hashicorp Vault Token. With the Hashicorp Vault Token available in the Secret in the Tenant's Namespace, all metadata, including the encryption key, of the PVCs will be removed.
Clone Of:
Environment:
Last Closed: 2022-03-02 07:11:51 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-csi pull 1932 0 None open e2e: verify (non)existence of keys for VaultTokensKMS 2021-03-24 13:05:08 UTC

Description Jean-Charles Lopez 2021-03-22 21:37:05 UTC
Description of problem (please be detailed as possible and provide log
snippests):
When using an external Hashicorp Vault configuration for PV at rest encryption the deleteion of the application PV leaves the keys untouched and still alive in Vault.

Version of all relevant components (if applicable):
OCP 4.7.2
OCS 4.7.0 latest stable

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
No

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
4

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
N?A

Steps to Reproduce:
1. Deploy an OCS cluster
2. Configure an encrypted storage class
3. Create an app using the storage class
4. Verify the keys are created in Vaukt
5. Delete the app
6. Verify PVC and PV are gone
7. Verify the PV keys are still alive in Vault



Actual results:
Keys still rpesent in Vaule

Expected results:
Keys removed from Vault

Additional info:

Comment 2 Niels de Vos 2021-03-23 07:45:32 UTC
Thanks for reporting!

We do have a test that validates the functioning: https://github.com/openshift/ceph-csi/blob/release-4.7/e2e/rbd.go#L370-L418

It also verifies that the keys are deleted: https://github.com/openshift/ceph-csi/blob/release-4.7/e2e/rbd_helper.go#L277-L283

But, it seems the deletion is only verified for type "vault", and not the newer "vaulttokens".

I'll have a look at updating the test, and figure out what else is missing.

Comment 3 Niels de Vos 2021-03-24 11:13:28 UTC
I am able to reproduce the behaviour. However, after checking the contents of the key that should have been removed, I noticed the `deletion_time`:

---
/ $ vault kv get secret/0001-0011-openshift-storage-0000000000000001-eb00c257-8c8d-11eb-9c37-0a580a800244
====== Metadata ======
Key              Value
---              -----
created_time     2021-03-24T10:44:32.654895768Z
deletion_time    2021-03-24T10:48:23.006111934Z
destroyed        false
version          1
---

There also is no `Data` section, indicating that the contents has been removed.

Depending on the requirements and expectation of customers, they may want to keep this around for some time. It is possible to `undelete` the keys through the Vault tools.

To completely delete/destroy the contents with a command like this:

---
/ $ vault kv metadata delete secret/0001-0011-openshift-storage-0000000000000001-eb00c257-8c8d-11eb-9c37-0a580a800244
Success! Data deleted (if it existed) at: secret/metadata/0001-0011-openshift-storage-0000000000000001-eb00c257-8c8d-11eb-9c37-0a580a800244
---

Subsequent "vault kv list secret" calls do not output the old key anymore.

I am not sure if Hashicorp Vault can be configured to automatically cleanup the metadata of deleted keys.


@JC, what is the behaviour you expect? Can you also please confirm that the key in your Vault environment only contains the metadata, and not the data section?

Comment 4 Niels de Vos 2021-03-24 13:05:09 UTC
The e2e tests have been ammended to validate the (non)existence of the keys when the VaultTokensKMS provider is used. This only checks for the `Data` portion to be deleted, not the metadata. Listing keys will still contains the deleted keys (but those do not have their key/passphrase anymore).

If deleting the metadata of deleted keys is a requirement, that would need to be discussed. The standard github.com/libopenstorage/secrets does not support the functionality directly. This might be an extension that Hashicorp Vault requires, and other KMS providers not.

Comment 5 Humble Chirammal 2021-03-24 13:52:06 UTC
One thing which I would like to confirm here is : how is the key verification done in first place which caused this bug report. 
The description says "Verify the PV keys are still alive in Vault". Would it be possible to list the exact commands used here ?

Comment 6 Humble Chirammal 2021-03-24 13:55:34 UTC
(In reply to Humble Chirammal from comment #5)
> One thing which I would like to confirm here is : how is the key
> verification done in first place which caused this bug report. 
> The description says "Verify the PV keys are still alive in Vault". Would it
> be possible to list the exact commands used here ?

Is it examined using `vault kv get secret/*" command?

Comment 7 Jean-Charles Lopez 2021-03-25 16:54:32 UTC
Hi @Niels

I deleted my test cluster and cleaned up the vault side but I'll bring one up again so I can check hopefully in th enext couple of days.

Best regards
JC

Comment 8 Mudit Agarwal 2021-03-25 16:59:16 UTC
Not a blocker for 4.7, moving it to 4.8
We can take the fix in 4.7.z if required.

Comment 9 Jean-Charles Lopez 2021-03-25 22:24:43 UTC
Here is the last test I did

$ oc create -f vault-config.yaml
namespace/my-rbd-storage created
secret/ceph-csi-kms-token created

$ oc create -f cephrbd-encrypted-loop.yaml
persistentvolumeclaim/pvc-cephrbd1 created
persistentvolumeclaim/pvc-cephrbd2 created
job.batch/batch2 created

$ oc get pod,pvc
NAME               READY   STATUS    RESTARTS   AGE
pod/batch2-8qqdn   1/1     Running   0          38s

NAME                                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/pvc-cephrbd1   Bound    pvc-9498b3b3-0d59-4576-a1c7-94a0a56e529c   500Gi      RWO            rbdenc         38s
persistentvolumeclaim/pvc-cephrbd2   Bound    pvc-00c30123-4ad9-4299-8c87-3bcef9a34682   500Mi      RWO            rbdenc         38s

Pod is running
$ oc logs pod/batch2-8qqdn
Creating temporary file
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00679669 s, 154 MB/s
Copying temporary file
Going to sleep
Removing temporary file
Creating temporary file
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00666265 s, 157 MB/s
Copying temporary file
Going to sleep

Here is the view from Vault while it is running
$ vault kv get ocs/0001-0011-openshift-storage-0000000000000001-d4233195-8db6-11eb-9a0b-0a580a800073
==== Data ====
Key     Value
---     -----
data    map[passphrase:O3wMtL3kX78HeKNpnXzp9dH-Vw0=]
$ vault kv get ocs/0001-0011-openshift-storage-0000000000000001-d42dc2d7-8db6-11eb-9a0b-0a580a800073
==== Data ====
Key     Value
---     -----
data    map[passphrase:BmAJPKvGePamVhawInWY-0mC7Lc=]
$ vault kv metadata get ocs/0001-0011-openshift-storage-0000000000000001-d42dc2d7-8db6-11eb-9a0b-0a580a800073
Metadata not supported on KV Version 1

$ vault status
Key             Value
---             -----
Seal Type       shamir
Initialized     true
Sealed          false
Total Shares    5
Threshold       3
Version         1.7.0-rc1
Storage Type    file
Cluster Name    localvault
Cluster ID      092bb0ef-6886-23e3-cbcd-3eacf39761de
HA Enabled      false

Now delete the application
$ oc delete -f cephrbd-encrypted-loop.yaml
persistentvolumeclaim "pvc-cephrbd1" deleted
persistentvolumeclaim "pvc-cephrbd2" deleted
job.batch "batch2" deleted

$ vault kv get ocs/0001-0011-openshift-storage-0000000000000001-d4233195-8db6-11eb-9a0b-0a580a800073
No value found at ocs/0001-0011-openshift-storage-0000000000000001-d4233195-8db6-11eb-9a0b-0a580a800073
$ vault kv get ocs/0001-0011-openshift-storage-0000000000000001-d42dc2d7-8db6-11eb-9a0b-0a580a800073
No value found at ocs/0001-0011-openshift-storage-0000000000000001-d42dc2d7-8db6-11eb-9a0b-0a580a800073

Looks like I found the source of the problem in my case.

Original I had a single yaml file to do everything

For this run I created two yaml files
File 1 To create the namespace and the ceph-csi-kms-token secret
File 2 To create the PVC and the app

So I think we just need to document the the application must be deleted a specific way.

Let's make sure we change this to be marked as a documentation effort

Comment 10 Mudit Agarwal 2021-03-26 05:27:44 UTC
Niels, please fill the doc text

Comment 11 Humble Chirammal 2021-03-26 06:00:47 UTC
(In reply to Jean-Charles Lopez from comment #9)
> Here is the last test I did
> 
..
> 
> Now delete the application
> $ oc delete -f cephrbd-encrypted-loop.yaml
> persistentvolumeclaim "pvc-cephrbd1" deleted
> persistentvolumeclaim "pvc-cephrbd2" deleted
> job.batch "batch2" deleted
> 
> $ vault kv get
> ocs/0001-0011-openshift-storage-0000000000000001-d4233195-8db6-11eb-9a0b-
> 0a580a800073
> No value found at
> ocs/0001-0011-openshift-storage-0000000000000001-d4233195-8db6-11eb-9a0b-
> 0a580a800073
> $ vault kv get
> ocs/0001-0011-openshift-storage-0000000000000001-d42dc2d7-8db6-11eb-9a0b-
> 0a580a800073
> No value found at
> ocs/0001-0011-openshift-storage-0000000000000001-d42dc2d7-8db6-11eb-9a0b-
> 0a580a800073
> 
> Looks like I found the source of the problem in my case.
> 
> Original I had a single yaml file to do everything
> 
> For this run I created two yaml files
> File 1 To create the namespace and the ceph-csi-kms-token secret
> File 2 To create the PVC and the app
> 
> So I think we just need to document the the application must be deleted a
> specific way.
> 
> Let's make sure we change this to be marked as a documentation effort

Thanks for detailed steps. As per above, the secret is deleted from vault as expected. Please point out if I missed any.

Reg# application deletion, I am not sure we have to specially document anything. Because, the key associated entity here is the PVC, so if the PVC or PVCs are deleted it should get deleted from vault as seen above.

Comment 12 Niels de Vos 2021-03-26 08:27:56 UTC
Documentation only. Added doc-text and moving to ON_QA for validation.

Comment 13 Mudit Agarwal 2021-03-26 08:50:21 UTC
If its only documentation and nothing to fix even in coming releases then I will prefer moving this to doc team.


Note You need to log in before you can comment on or make changes to this bug.