Description of problem (please be detailed as possible and provide log snippests): When using an external Hashicorp Vault configuration for PV at rest encryption the deleteion of the application PV leaves the keys untouched and still alive in Vault. Version of all relevant components (if applicable): OCP 4.7.2 OCS 4.7.0 latest stable Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? No Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 4 Can this issue reproducible? Yes Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: N?A Steps to Reproduce: 1. Deploy an OCS cluster 2. Configure an encrypted storage class 3. Create an app using the storage class 4. Verify the keys are created in Vaukt 5. Delete the app 6. Verify PVC and PV are gone 7. Verify the PV keys are still alive in Vault Actual results: Keys still rpesent in Vaule Expected results: Keys removed from Vault Additional info:
Thanks for reporting! We do have a test that validates the functioning: https://github.com/openshift/ceph-csi/blob/release-4.7/e2e/rbd.go#L370-L418 It also verifies that the keys are deleted: https://github.com/openshift/ceph-csi/blob/release-4.7/e2e/rbd_helper.go#L277-L283 But, it seems the deletion is only verified for type "vault", and not the newer "vaulttokens". I'll have a look at updating the test, and figure out what else is missing.
I am able to reproduce the behaviour. However, after checking the contents of the key that should have been removed, I noticed the `deletion_time`: --- / $ vault kv get secret/0001-0011-openshift-storage-0000000000000001-eb00c257-8c8d-11eb-9c37-0a580a800244 ====== Metadata ====== Key Value --- ----- created_time 2021-03-24T10:44:32.654895768Z deletion_time 2021-03-24T10:48:23.006111934Z destroyed false version 1 --- There also is no `Data` section, indicating that the contents has been removed. Depending on the requirements and expectation of customers, they may want to keep this around for some time. It is possible to `undelete` the keys through the Vault tools. To completely delete/destroy the contents with a command like this: --- / $ vault kv metadata delete secret/0001-0011-openshift-storage-0000000000000001-eb00c257-8c8d-11eb-9c37-0a580a800244 Success! Data deleted (if it existed) at: secret/metadata/0001-0011-openshift-storage-0000000000000001-eb00c257-8c8d-11eb-9c37-0a580a800244 --- Subsequent "vault kv list secret" calls do not output the old key anymore. I am not sure if Hashicorp Vault can be configured to automatically cleanup the metadata of deleted keys. @JC, what is the behaviour you expect? Can you also please confirm that the key in your Vault environment only contains the metadata, and not the data section?
The e2e tests have been ammended to validate the (non)existence of the keys when the VaultTokensKMS provider is used. This only checks for the `Data` portion to be deleted, not the metadata. Listing keys will still contains the deleted keys (but those do not have their key/passphrase anymore). If deleting the metadata of deleted keys is a requirement, that would need to be discussed. The standard github.com/libopenstorage/secrets does not support the functionality directly. This might be an extension that Hashicorp Vault requires, and other KMS providers not.
One thing which I would like to confirm here is : how is the key verification done in first place which caused this bug report. The description says "Verify the PV keys are still alive in Vault". Would it be possible to list the exact commands used here ?
(In reply to Humble Chirammal from comment #5) > One thing which I would like to confirm here is : how is the key > verification done in first place which caused this bug report. > The description says "Verify the PV keys are still alive in Vault". Would it > be possible to list the exact commands used here ? Is it examined using `vault kv get secret/*" command?
Hi @Niels I deleted my test cluster and cleaned up the vault side but I'll bring one up again so I can check hopefully in th enext couple of days. Best regards JC
Not a blocker for 4.7, moving it to 4.8 We can take the fix in 4.7.z if required.
Here is the last test I did $ oc create -f vault-config.yaml namespace/my-rbd-storage created secret/ceph-csi-kms-token created $ oc create -f cephrbd-encrypted-loop.yaml persistentvolumeclaim/pvc-cephrbd1 created persistentvolumeclaim/pvc-cephrbd2 created job.batch/batch2 created $ oc get pod,pvc NAME READY STATUS RESTARTS AGE pod/batch2-8qqdn 1/1 Running 0 38s NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/pvc-cephrbd1 Bound pvc-9498b3b3-0d59-4576-a1c7-94a0a56e529c 500Gi RWO rbdenc 38s persistentvolumeclaim/pvc-cephrbd2 Bound pvc-00c30123-4ad9-4299-8c87-3bcef9a34682 500Mi RWO rbdenc 38s Pod is running $ oc logs pod/batch2-8qqdn Creating temporary file 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.00679669 s, 154 MB/s Copying temporary file Going to sleep Removing temporary file Creating temporary file 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.00666265 s, 157 MB/s Copying temporary file Going to sleep Here is the view from Vault while it is running $ vault kv get ocs/0001-0011-openshift-storage-0000000000000001-d4233195-8db6-11eb-9a0b-0a580a800073 ==== Data ==== Key Value --- ----- data map[passphrase:O3wMtL3kX78HeKNpnXzp9dH-Vw0=] $ vault kv get ocs/0001-0011-openshift-storage-0000000000000001-d42dc2d7-8db6-11eb-9a0b-0a580a800073 ==== Data ==== Key Value --- ----- data map[passphrase:BmAJPKvGePamVhawInWY-0mC7Lc=] $ vault kv metadata get ocs/0001-0011-openshift-storage-0000000000000001-d42dc2d7-8db6-11eb-9a0b-0a580a800073 Metadata not supported on KV Version 1 $ vault status Key Value --- ----- Seal Type shamir Initialized true Sealed false Total Shares 5 Threshold 3 Version 1.7.0-rc1 Storage Type file Cluster Name localvault Cluster ID 092bb0ef-6886-23e3-cbcd-3eacf39761de HA Enabled false Now delete the application $ oc delete -f cephrbd-encrypted-loop.yaml persistentvolumeclaim "pvc-cephrbd1" deleted persistentvolumeclaim "pvc-cephrbd2" deleted job.batch "batch2" deleted $ vault kv get ocs/0001-0011-openshift-storage-0000000000000001-d4233195-8db6-11eb-9a0b-0a580a800073 No value found at ocs/0001-0011-openshift-storage-0000000000000001-d4233195-8db6-11eb-9a0b-0a580a800073 $ vault kv get ocs/0001-0011-openshift-storage-0000000000000001-d42dc2d7-8db6-11eb-9a0b-0a580a800073 No value found at ocs/0001-0011-openshift-storage-0000000000000001-d42dc2d7-8db6-11eb-9a0b-0a580a800073 Looks like I found the source of the problem in my case. Original I had a single yaml file to do everything For this run I created two yaml files File 1 To create the namespace and the ceph-csi-kms-token secret File 2 To create the PVC and the app So I think we just need to document the the application must be deleted a specific way. Let's make sure we change this to be marked as a documentation effort
Niels, please fill the doc text
(In reply to Jean-Charles Lopez from comment #9) > Here is the last test I did > .. > > Now delete the application > $ oc delete -f cephrbd-encrypted-loop.yaml > persistentvolumeclaim "pvc-cephrbd1" deleted > persistentvolumeclaim "pvc-cephrbd2" deleted > job.batch "batch2" deleted > > $ vault kv get > ocs/0001-0011-openshift-storage-0000000000000001-d4233195-8db6-11eb-9a0b- > 0a580a800073 > No value found at > ocs/0001-0011-openshift-storage-0000000000000001-d4233195-8db6-11eb-9a0b- > 0a580a800073 > $ vault kv get > ocs/0001-0011-openshift-storage-0000000000000001-d42dc2d7-8db6-11eb-9a0b- > 0a580a800073 > No value found at > ocs/0001-0011-openshift-storage-0000000000000001-d42dc2d7-8db6-11eb-9a0b- > 0a580a800073 > > Looks like I found the source of the problem in my case. > > Original I had a single yaml file to do everything > > For this run I created two yaml files > File 1 To create the namespace and the ceph-csi-kms-token secret > File 2 To create the PVC and the app > > So I think we just need to document the the application must be deleted a > specific way. > > Let's make sure we change this to be marked as a documentation effort Thanks for detailed steps. As per above, the secret is deleted from vault as expected. Please point out if I missed any. Reg# application deletion, I am not sure we have to specially document anything. Because, the key associated entity here is the PVC, so if the PVC or PVCs are deleted it should get deleted from vault as seen above.
Documentation only. Added doc-text and moving to ON_QA for validation.
If its only documentation and nothing to fix even in coming releases then I will prefer moving this to doc team.