Description of problem (please be detailed as possible and provide log snippets): When cluster-wide encryption is enabled on an ODF cluster using the kube auth method and the auth config and backend path in Vault are defined inside a Vault namespace, the deployment fails with the following error in the rook-ceph-operator logs: 2022-02-01 05:41:19.883434 E | ceph-cluster-controller: failed to reconcile CephCluster "openshift-storage/ocs-storagecluster-cephcluster". failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to perform validation before cluster creation: failed to validate kms connection details: failed to get backend version: failed to initialize vault client: failed to get vault authentication token: Error making API request. URL: PUT https://vault.qe.rh-ocs.com:8200/v1/auth/kubernetes/login Code: 500. Errors: * claim "iss" is invalid 2022-02-01 05:41:29.982275 I | clusterdisruption-controller: Ceph "openshift-storage" cluster not ready, cannot check Ceph status yet. This error was seen even after disabling iss validation: # vault write -namespace=odf auth/kubernetes/config token_reviewer_jwt="$(cat vault-sa-token)" kubernetes_host="${K8S_HOST}" kubernetes_ca_cert= disable_iss_validation=true $ oc get cm ocs-kms-connection-details -n openshift-storage -o yaml apiVersion: v1 data: KMS_PROVIDER: vault KMS_SERVICE_NAME: vault VAULT_ADDR: https://vault.qe.rh-ocs.com:8200 VAULT_AUTH_KUBERNETES_ROLE: odf-rook-ceph-op VAULT_AUTH_METHOD: kubernetes VAULT_BACKEND_PATH: rook VAULT_CACERT: ocs-kms-ca-secret-afq7gj VAULT_CLIENT_CERT: ocs-kms-client-cert-e4plrg VAULT_CLIENT_KEY: ocs-kms-client-key-y5whe6 VAULT_NAMESPACE: odf VAULT_TLS_SERVER_NAME: vault.qe.rh-ocs.com kind: ConfigMap metadata: creationTimestamp: "2022-02-01T05:34:44Z" name: ocs-kms-connection-details namespace: openshift-storage resourceVersion: "46732" uid: ef2e9d0d-4137-45ea-becf-bb9adfb1f480 Version of all relevant components (if applicable): --------------------------------------------------- OCP: 4.10.0-0.nightly-2022-01-31-012936 ODF: odf-operator.v4.10.0 full_version=4.10.0-122 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, deployment using Vault namespaces fails. Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? Yes Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: No Steps to Reproduce: ------------------- 1. Install the ODF operator 2. In the openshift-storage namespace, create a service account called odf-vault-auth # oc -n openshift-storage create serviceaccount odf-vault-auth 3. Create clusterrolebinding as shown below # oc -n openshift-storage create clusterrolebinding vault-tokenreview-binding --clusterrole=system:auth-delegator --serviceaccount=openshift-storage:odf-vault-auth 4. Get the secret name from the service account # oc -n openshift-storage get sa odf-vault-auth -o jsonpath="{.secrets[*]['name']}" 5. Get the Token and CA cert used to configure the kube auth in Vault # SA_JWT_TOKEN=$(oc -n openshift-storage get secret "$VAULT_SA_SECRET_NAME" -o jsonpath="{.data.token}" | base64 --decode; echo) # SA_CA_CRT=$(oc -n openshift-storage get secret "$VAULT_SA_SECRET_NAME" -o jsonpath="{.data['ca\.crt']}" | base64 --decode; echo) 6. Get the OCP endpoint and sa issuer # K8S_HOST=$(oc config view --minify --flatten -o jsonpath="{.clusters[0].cluster.server}") # issuer="$(oc get authentication.config cluster -o template="{{ .spec.serviceAccountIssuer }}")" 7. On the vault node/pod, configure the kube auth method # vault auth enable -namespace=odf kubernetes # vault write -namespace=odf auth/kubernetes/config \ token_reviewer_jwt="$SA_JWT_TOKEN" \ kubernetes_host="$K8S_HOST" \ kubernetes_ca_cert="$SA_CA_CRT" \ issuer="$issuer" # vault write -namespace=odf auth/kubernetes/role/odf-rook-ceph-op \ bound_service_account_names=rook-ceph-system,rook-ceph-osd, noobaa \ bound_service_account_namespaces=openshift-storage \ policies=rook \ ttl=1440h # vault write -namespace=odf auth/kubernetes/role/odf-rook-ceph-osd \ bound_service_account_names=rook-ceph-osd \ bound_service_account_namespaces=openshift-storage \ policies=rook \ ttl=1440h 8. From the ODF management console, follow the steps to create the storagesystem. 9. On the Security and network page, click on "Enable data encryption for block and file storage" 10. Select "Cluster-wide encryption" from encryption level and click on "Connect to an external key management service". 11. Set Authentication method to "Kubernetes" and fill out the rest of the details 12. Review and create the storagesystem Actual results: --------------- The deployment fails with the following error: 2022-02-01 05:41:19.883434 E | ceph-cluster-controller: failed to reconcile CephCluster "openshift-storage/ocs-storagecluster-cephcluster". failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to perform validation before cluster creation: failed to validate kms connection details: failed to get backend version: failed to initialize vault client: failed to get vault authentication token: Error making API request. URL: PUT https://vault.qe.rh-ocs.com:8200/v1/auth/kubernetes/login Code: 500. Errors: * claim "iss" is invalid Expected results: ----------------- The deployment should succeed
As per Eran's comment offline, I'm closing this. Testing the Vault Namespace is irrelevant for the cluster-wide encryption scenario. Thanks.
Based on the email conversation with Eran and others, moving this to 4.11
This was reported before this fix went in https://bugzilla.redhat.com/show_bug.cgi?id=2052937. I believe it's the same root cause, can you try this again with the latest 4.10? It should work. Thanks.
Since the documentation and UI for 4.10 do not support vault namespaces, can the target release be kept for ODF 4.11? The kubernetes auth method can then be tested and verified using vault namespaces in 4.11 and the UI changes done for the same can be reverted.
(In reply to Rachael from comment #8) > Since the documentation and UI for 4.10 do not support vault namespaces, can > the target release be kept for ODF 4.11? Yes, it's a bit late for 4.10 changes. > The kubernetes auth method can then be tested and verified using vault > namespaces in 4.11 and the UI changes done for the same can be reverted. Sounds good to me. This can be moved to VERIFIED now I suppose, right? Thanks!
Should this be ON_QA since we're just waiting for 4.11?
Thanks Neha, do you need everything from me at the moment? Let me know, thanks.
Please test with the latest 4.11 build
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Data Foundation 4.12.0 enhancement and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:0551
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days