Bug 2048902

Summary: [KMS] Deployment of cluster wide encryption enabled cluster using kube auth fails when using vault namespaces
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Rachael <rgeorge>
Component: rookAssignee: Sébastien Han <shan>
Status: CLOSED ERRATA QA Contact: Rachael <rgeorge>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.10CC: madam, mmuench, muagarwa, nberry, ocs-bugs, odf-bz-bot, shan, tnielsen
Target Milestone: ---Keywords: Reopened
Target Release: ODF 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2110866 (view as bug list) Environment:
Last Closed: 2023-01-31 00:19:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2089755    
Bug Blocks: 2110868, 2131648, 2051913, 2110866, 2124827    

Description Rachael 2022-02-01 06:24:22 UTC
Description of problem (please be detailed as possible and provide log
snippets):

When cluster-wide encryption is enabled on an ODF cluster using the kube auth method and the auth config and backend path in Vault are defined inside a Vault namespace, the deployment fails with the following error in the rook-ceph-operator logs:

2022-02-01 05:41:19.883434 E | ceph-cluster-controller: failed to reconcile CephCluster "openshift-storage/ocs-storagecluster-cephcluster". failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to perform validation before cluster creation: failed to validate kms connection details: failed to get backend version: failed to initialize vault client: failed to get vault authentication token: Error making API request.

URL: PUT https://vault.qe.rh-ocs.com:8200/v1/auth/kubernetes/login
Code: 500. Errors:

* claim "iss" is invalid

2022-02-01 05:41:29.982275 I | clusterdisruption-controller: Ceph "openshift-storage" cluster not ready, cannot check Ceph status yet.

This error was seen even after disabling iss validation:

# vault write -namespace=odf auth/kubernetes/config token_reviewer_jwt="$(cat vault-sa-token)" kubernetes_host="${K8S_HOST}" kubernetes_ca_cert= disable_iss_validation=true


$ oc get cm ocs-kms-connection-details -n openshift-storage -o yaml
apiVersion: v1
data:
  KMS_PROVIDER: vault
  KMS_SERVICE_NAME: vault
  VAULT_ADDR: https://vault.qe.rh-ocs.com:8200
  VAULT_AUTH_KUBERNETES_ROLE: odf-rook-ceph-op
  VAULT_AUTH_METHOD: kubernetes
  VAULT_BACKEND_PATH: rook
  VAULT_CACERT: ocs-kms-ca-secret-afq7gj
  VAULT_CLIENT_CERT: ocs-kms-client-cert-e4plrg
  VAULT_CLIENT_KEY: ocs-kms-client-key-y5whe6
  VAULT_NAMESPACE: odf
  VAULT_TLS_SERVER_NAME: vault.qe.rh-ocs.com
kind: ConfigMap
metadata:
  creationTimestamp: "2022-02-01T05:34:44Z"
  name: ocs-kms-connection-details
  namespace: openshift-storage
  resourceVersion: "46732"
  uid: ef2e9d0d-4137-45ea-becf-bb9adfb1f480


Version of all relevant components (if applicable):
---------------------------------------------------
OCP: 4.10.0-0.nightly-2022-01-31-012936
ODF: odf-operator.v4.10.0        full_version=4.10.0-122



Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Yes, deployment using Vault namespaces fails.


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
No


Steps to Reproduce:
-------------------

1. Install the ODF operator

2. In the openshift-storage namespace, create a service account called odf-vault-auth
   # oc -n openshift-storage create serviceaccount odf-vault-auth

3. Create clusterrolebinding as shown below
   # oc -n openshift-storage create clusterrolebinding vault-tokenreview-binding --clusterrole=system:auth-delegator --serviceaccount=openshift-storage:odf-vault-auth

4. Get the secret name from the service account
   # oc -n openshift-storage get sa odf-vault-auth -o jsonpath="{.secrets[*]['name']}"

5. Get the Token and CA cert used to configure the kube auth in Vault
   # SA_JWT_TOKEN=$(oc -n openshift-storage get secret "$VAULT_SA_SECRET_NAME" -o jsonpath="{.data.token}" | base64 --decode; echo)
   # SA_CA_CRT=$(oc -n openshift-storage get secret "$VAULT_SA_SECRET_NAME" -o jsonpath="{.data['ca\.crt']}" | base64 --decode; echo)

6. Get the OCP endpoint and sa issuer
   # K8S_HOST=$(oc config view --minify --flatten -o jsonpath="{.clusters[0].cluster.server}")
   # issuer="$(oc get authentication.config cluster -o template="{{ .spec.serviceAccountIssuer }}")"

7. On the vault node/pod, configure the kube auth method
   # vault auth enable -namespace=odf kubernetes
   
   # vault write -namespace=odf auth/kubernetes/config \
          token_reviewer_jwt="$SA_JWT_TOKEN" \
          kubernetes_host="$K8S_HOST" \
          kubernetes_ca_cert="$SA_CA_CRT" \
          issuer="$issuer"

   # vault write -namespace=odf auth/kubernetes/role/odf-rook-ceph-op \
        bound_service_account_names=rook-ceph-system,rook-ceph-osd, noobaa \
        bound_service_account_namespaces=openshift-storage \
        policies=rook \
        ttl=1440h

   # vault write -namespace=odf auth/kubernetes/role/odf-rook-ceph-osd \
        bound_service_account_names=rook-ceph-osd \
        bound_service_account_namespaces=openshift-storage \
        policies=rook \
        ttl=1440h

8. From the ODF management console, follow the steps to create the storagesystem.
9. On the Security and network page, click on "Enable data encryption for block and file storage"
10. Select "Cluster-wide encryption" from encryption level and click on "Connect to an external key management service".
11. Set Authentication method to "Kubernetes" and fill out the rest of the details 
12. Review and create the storagesystem

Actual results:
---------------
The deployment fails with the following error:

2022-02-01 05:41:19.883434 E | ceph-cluster-controller: failed to reconcile CephCluster "openshift-storage/ocs-storagecluster-cephcluster". failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to perform validation before cluster creation: failed to validate kms connection details: failed to get backend version: failed to initialize vault client: failed to get vault authentication token: Error making API request.

URL: PUT https://vault.qe.rh-ocs.com:8200/v1/auth/kubernetes/login
Code: 500. Errors:

* claim "iss" is invalid


Expected results:
-----------------
The deployment should succeed

Comment 3 Sébastien Han 2022-02-01 14:13:46 UTC
As per Eran's comment offline, I'm closing this. Testing the Vault Namespace is irrelevant for the cluster-wide encryption scenario.
Thanks.

Comment 5 Mudit Agarwal 2022-02-08 14:07:36 UTC
Based on the email conversation with Eran and others, moving this to 4.11

Comment 6 Sébastien Han 2022-04-04 13:25:51 UTC
This was reported before this fix went in https://bugzilla.redhat.com/show_bug.cgi?id=2052937. I believe it's the same root cause, can you try this again with the latest 4.10?
It should work.

Thanks.

Comment 8 Rachael 2022-04-05 07:11:46 UTC
Since the documentation and UI for 4.10 do not support vault namespaces, can the target release be kept for ODF 4.11? 
The kubernetes auth method can then be tested and verified using vault namespaces in 4.11 and the UI changes done for the same can be reverted.

Comment 9 Sébastien Han 2022-04-11 08:20:18 UTC
(In reply to Rachael from comment #8)
> Since the documentation and UI for 4.10 do not support vault namespaces, can
> the target release be kept for ODF 4.11? 

Yes, it's a bit late for 4.10 changes.

> The kubernetes auth method can then be tested and verified using vault
> namespaces in 4.11 and the UI changes done for the same can be reverted.

Sounds good to me.
This can be moved to VERIFIED now I suppose, right?

Thanks!

Comment 10 Travis Nielsen 2022-04-12 15:13:40 UTC
Should this be ON_QA since we're just waiting for 4.11?

Comment 15 Sébastien Han 2022-05-02 13:54:31 UTC
Thanks Neha, do you need everything from me at the moment?
Let me know, thanks.

Comment 16 Mudit Agarwal 2022-05-09 07:10:19 UTC
Please test with the latest 4.11 build

Comment 31 errata-xmlrpc 2023-01-31 00:19:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.12.0 enhancement and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:0551

Comment 32 Red Hat Bugzilla 2023-12-08 04:27:29 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days