Bug 1979604 - Creation of encrypted RBD PVC fails in OCS 4.7.2
Summary: Creation of encrypted RBD PVC fails in OCS 4.7.2
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: csi-driver
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: OCS 4.7.3
Assignee: Niels de Vos
QA Contact: Rachael
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-06 14:05 UTC by Rachael
Modified: 2023-09-15 01:11 UTC (History)
9 users (show)

Fixed In Version: v4.7.3-457.ci
Doc Type: Bug Fix
Doc Text:
.Persistent Volume encryption passphrases can now be stored/retrieved from Hashicorp Vault Previously, the encrypted RBD PVC creation failed due to failure in parsing the connection parameters for a Hashicorp Vault KMS. Since initializing of a Hashicorp Vault KMS as a store for PV encryption passphrases failed and volumes that wanted to use Hashicorp Vault to store/retrieve PV encryption passphrases could not be created, existing volumes could not be used. This update fixes the parsing of the Vault connection parameters, resulting in initializing of the KMS connection, successful storing and retrieving of PV encryption passphrases from Hashicorp Vault.
Clone Of:
Environment:
Last Closed: 2021-08-11 13:59:14 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ceph-csi pull 64 0 None open BUG 1979604: util: convert standardVault object to map 2021-07-07 14:30:15 UTC
Red Hat Product Errata RHBA-2021:3135 0 None None None 2021-08-11 13:59:26 UTC

Description Rachael 2021-07-06 14:05:50 UTC
Description of problem (please be detailed as possible and provide log
snippets):

In OCS 4.7.2, creation of encrypted RBD PVCs fail with the following error:


  Warning  ProvisioningFailed    42s (x8 over 105s)  openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-7b66b9959c-2smsm_c03689da-56d0-41a3-9ed0-48842e94c381  failed to provision volume with StorageClass "test-pv-encryption": rpc error: code = InvalidArgument desc = invalid encryption kms configuration: missing encryption KMS configuration with 1-vault

However, the 1-vault config is present in the csi-kms-connection-details configmap:

$ oc get cm csi-kms-connection-details -o yaml -n openshift-storage
apiVersion: v1
data:
  1-vault: '{"KMS_PROVIDER":"vaulttokens","KMS_SERVICE_NAME":"vault","VAULT_ADDR":"https://vault.qe.rh-ocs.com:8200","VAULT_BACKEND_PATH":"rbd-encryption","VAULT_CACERT":"ocs-kms-ca-secret-iv4cta","VAULT_TLS_SERVER_NAME":"","VAULT_CLIENT_CERT":"ocs-kms-client-cert-u6yuiq","VAULT_CLIENT_KEY":"ocs-kms-client-key-gz0zb","VAULT_NAMESPACE":"ocs/rbd","VAULT_TOKEN_NAME":"ocs-kms-token","VAULT_CACERT_FILE":"fullchain.pem","VAULT_CLIENT_CERT_FILE":"cert.pem","VAULT_CLIENT_KEY_FILE":"privkey.pem"}'
kind: ConfigMap


Version of all relevant components (if applicable):
OCP: 4.8.0-0.nightly-2021-06-25-182927
OCS: ocs-operator.v4.7.2-429.ci


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, not able to create encrypted RBD PVCs


Is there any workaround available to the best of your knowledge?
Not that I am aware of


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Yes


Can this issue reproduce from the UI?
Yes


If this is a regression, please provide more details to justify this:

Yes, PV encryption was working in OCS 4.7.0 and 4.7.1 as well. It was also tested on 4.7.2-rc1 build and was working fine. The issue is seen with the live build of OCS 4.7.2


Steps to Reproduce:
1. Deploy an OCS cluster with live 4.7.2 builds
2. Create an encryption enabled storageclass for RBD
3. Create a PVC using the SC created above


Actual results:
PVC creation fails with error:
Warning  ProvisioningFailed    42s (x8 over 105s)  openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-7b66b9959c-2smsm_c03689da-56d0-41a3-9ed0-48842e94c381  failed to provision volume with StorageClass "test-pv-encryption": rpc error: code = InvalidArgument desc = invalid encryption kms configuration: missing encryption KMS configuration with 1-vault


Expected results:
PVC creation should be successful.

Additional info:
This issue is not seen in OCS 4.8 builds

Comment 4 Mudit Agarwal 2021-07-06 14:22:03 UTC
This is only seen in 4.7.2-RC2 build where the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1974816 went.
Keeping it for 4.7.z

Comment 8 Mudit Agarwal 2021-07-14 08:22:04 UTC
Niels, please fill the doc text

Comment 9 Niels de Vos 2021-07-20 15:21:27 UTC
There is a workaround for users that want to run the previous released Ceph-CSI container image where PV encryption with Hashicorp Vault Token support was working correctly.
I am not in a position to qualify this as a supported workaround, but for testing the functionality this should be sufficient.


Check the tag (or sha256) from the image registry, for example https://catalog.redhat.com/software/containers/ocs4/cephcsi-rhel8/5ddeeeaabed8bd164a0afa64?tag=4.7-104.60731ec.release_4.7

Install the OCS Operator from OperatorHub through the UI.

Create a StorageCluster once the Operator is installed.

When the StorageCluster becomes available, updated the CSV in the `openshift-storage` namespace:

$ oc -n openshift-storage get csv
NAME                  DISPLAY                       VERSION   REPLACES   PHASE
ocs-operator.v4.7.2   OpenShift Container Storage   4.7.2                Succeeded

Edit the CVS, and replace the references for the `cephcsi-rhel8` image to the
image from OCS-4.7.1.

$ oc -n openshift-storage edit csv/ocs-operator.v4.7.2

Look for the section `deployments:` and find the environment variables for the
`rook-ceph-operator`:

               - name: ROOK_CSI_CEPH_IMAGE
                 value: registry.redhat.io/ocs4/cephcsi-rhel8@sha256:d516aa76acf0ef657919f3d4d3647de8944efb8ce9684b7058fc22a5a7321f10

replace the image with the version from OCS-4.7.1:

               - name: ROOK_CSI_CEPH_IMAGE
                 value: registry.redhat.io/ocs4/cephcsi-rhel8:4.7-104.60731ec.release_4.7


This will cause the deployments and daemonsets for the Ceph-CSI to be updated, and the pods related to Ceph-CSI will get restarted.

$ oc -n openshift-storage get deployment/csi-rbdplugin-provisioner
$ oc -n openshift-storage get daemonset/csi-rbdplugin
$ oc -n openshift-storage get pods -l app=csi-rbdplugin-provisioner
$ oc -n openshift-storage get pods -l app=csi-rbdplugin

Verify that the previous container image is used:

$ oc -n openshift-storage describe pod/csi-rbdplugin-provisioner-fc6bddf8f-6pb6k | grep -m1 4.7-104.60731ec.release_4.7
    Image:         registry.redhat.io/ocs4/cephcsi-rhel8:4.7-104.60731ec.release_4.7

Comment 13 Niels de Vos 2021-07-22 13:12:57 UTC
It seems that the workaround in comment #9 is not always sufficient. Clusters that run a little longer (days instead of deploy-test-discard) 

Enable the Ceph Toolbox:

$ oc -n openshift-storage edit ocsinitializations.ocs.openshift.io/ocsinit

replace `spec: {}` with

   spec:
     enableCephTools: true


check for the running toolbox pod:

$ oc -n openshift-storage get pods -l app=rook-ceph-tools
NAME                               READY   STATUS    RESTARTS   AGE
rook-ceph-tools-5d76f864fd-6bhkk   1/1     Running   0          2m26s

RSH into the Pod, and change the settings to allow connecting from non-current clients (like the Ceph-CSI container images from 4.7.1):

$ oc -n openshift-storage rsh rook-ceph-tools-5d76f864fd-6bhkk
sh-4.4# ceph config set mon mon_warn_on_insecure_global_id_reclaim_allowed false
sh-4.4# ceph config set mon auth_allow_insecure_global_id_reclaim true


By setting these options, the checks for https://docs.ceph.com/en/latest/security/CVE-2021-20288/ are disabled, and the non-patched (old Ceph-CSI) clients can connect. These checks can be enabled again once the container image with the fix for this bug is deployed.

Comment 21 errata-xmlrpc 2021-08-11 13:59:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.7.3 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3135

Comment 22 Red Hat Bugzilla 2023-09-15 01:11:03 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.