Bug 2048902 - [KMS] Deployment of cluster wide encryption enabled cluster using kube auth fails when using vault namespaces
Summary: [KMS] Deployment of cluster wide encryption enabled cluster using kube auth f...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.12.0
Assignee: Sébastien Han
QA Contact: Rachael
URL:
Whiteboard:
Depends On: 2089755
Blocks: 2110868 2131648 2051913 2110866 2124827
TreeView+ depends on / blocked
 
Reported: 2022-02-01 06:24 UTC by Rachael
Modified: 2023-12-08 04:27 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2110866 (view as bug list)
Environment:
Last Closed: 2023-01-31 00:19:18 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2023:0551 0 None None None 2023-01-31 00:19:42 UTC

Description Rachael 2022-02-01 06:24:22 UTC
Description of problem (please be detailed as possible and provide log
snippets):

When cluster-wide encryption is enabled on an ODF cluster using the kube auth method and the auth config and backend path in Vault are defined inside a Vault namespace, the deployment fails with the following error in the rook-ceph-operator logs:

2022-02-01 05:41:19.883434 E | ceph-cluster-controller: failed to reconcile CephCluster "openshift-storage/ocs-storagecluster-cephcluster". failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to perform validation before cluster creation: failed to validate kms connection details: failed to get backend version: failed to initialize vault client: failed to get vault authentication token: Error making API request.

URL: PUT https://vault.qe.rh-ocs.com:8200/v1/auth/kubernetes/login
Code: 500. Errors:

* claim "iss" is invalid

2022-02-01 05:41:29.982275 I | clusterdisruption-controller: Ceph "openshift-storage" cluster not ready, cannot check Ceph status yet.

This error was seen even after disabling iss validation:

# vault write -namespace=odf auth/kubernetes/config token_reviewer_jwt="$(cat vault-sa-token)" kubernetes_host="${K8S_HOST}" kubernetes_ca_cert= disable_iss_validation=true


$ oc get cm ocs-kms-connection-details -n openshift-storage -o yaml
apiVersion: v1
data:
  KMS_PROVIDER: vault
  KMS_SERVICE_NAME: vault
  VAULT_ADDR: https://vault.qe.rh-ocs.com:8200
  VAULT_AUTH_KUBERNETES_ROLE: odf-rook-ceph-op
  VAULT_AUTH_METHOD: kubernetes
  VAULT_BACKEND_PATH: rook
  VAULT_CACERT: ocs-kms-ca-secret-afq7gj
  VAULT_CLIENT_CERT: ocs-kms-client-cert-e4plrg
  VAULT_CLIENT_KEY: ocs-kms-client-key-y5whe6
  VAULT_NAMESPACE: odf
  VAULT_TLS_SERVER_NAME: vault.qe.rh-ocs.com
kind: ConfigMap
metadata:
  creationTimestamp: "2022-02-01T05:34:44Z"
  name: ocs-kms-connection-details
  namespace: openshift-storage
  resourceVersion: "46732"
  uid: ef2e9d0d-4137-45ea-becf-bb9adfb1f480


Version of all relevant components (if applicable):
---------------------------------------------------
OCP: 4.10.0-0.nightly-2022-01-31-012936
ODF: odf-operator.v4.10.0        full_version=4.10.0-122



Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Yes, deployment using Vault namespaces fails.


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
No


Steps to Reproduce:
-------------------

1. Install the ODF operator

2. In the openshift-storage namespace, create a service account called odf-vault-auth
   # oc -n openshift-storage create serviceaccount odf-vault-auth

3. Create clusterrolebinding as shown below
   # oc -n openshift-storage create clusterrolebinding vault-tokenreview-binding --clusterrole=system:auth-delegator --serviceaccount=openshift-storage:odf-vault-auth

4. Get the secret name from the service account
   # oc -n openshift-storage get sa odf-vault-auth -o jsonpath="{.secrets[*]['name']}"

5. Get the Token and CA cert used to configure the kube auth in Vault
   # SA_JWT_TOKEN=$(oc -n openshift-storage get secret "$VAULT_SA_SECRET_NAME" -o jsonpath="{.data.token}" | base64 --decode; echo)
   # SA_CA_CRT=$(oc -n openshift-storage get secret "$VAULT_SA_SECRET_NAME" -o jsonpath="{.data['ca\.crt']}" | base64 --decode; echo)

6. Get the OCP endpoint and sa issuer
   # K8S_HOST=$(oc config view --minify --flatten -o jsonpath="{.clusters[0].cluster.server}")
   # issuer="$(oc get authentication.config cluster -o template="{{ .spec.serviceAccountIssuer }}")"

7. On the vault node/pod, configure the kube auth method
   # vault auth enable -namespace=odf kubernetes
   
   # vault write -namespace=odf auth/kubernetes/config \
          token_reviewer_jwt="$SA_JWT_TOKEN" \
          kubernetes_host="$K8S_HOST" \
          kubernetes_ca_cert="$SA_CA_CRT" \
          issuer="$issuer"

   # vault write -namespace=odf auth/kubernetes/role/odf-rook-ceph-op \
        bound_service_account_names=rook-ceph-system,rook-ceph-osd, noobaa \
        bound_service_account_namespaces=openshift-storage \
        policies=rook \
        ttl=1440h

   # vault write -namespace=odf auth/kubernetes/role/odf-rook-ceph-osd \
        bound_service_account_names=rook-ceph-osd \
        bound_service_account_namespaces=openshift-storage \
        policies=rook \
        ttl=1440h

8. From the ODF management console, follow the steps to create the storagesystem.
9. On the Security and network page, click on "Enable data encryption for block and file storage"
10. Select "Cluster-wide encryption" from encryption level and click on "Connect to an external key management service".
11. Set Authentication method to "Kubernetes" and fill out the rest of the details 
12. Review and create the storagesystem

Actual results:
---------------
The deployment fails with the following error:

2022-02-01 05:41:19.883434 E | ceph-cluster-controller: failed to reconcile CephCluster "openshift-storage/ocs-storagecluster-cephcluster". failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to perform validation before cluster creation: failed to validate kms connection details: failed to get backend version: failed to initialize vault client: failed to get vault authentication token: Error making API request.

URL: PUT https://vault.qe.rh-ocs.com:8200/v1/auth/kubernetes/login
Code: 500. Errors:

* claim "iss" is invalid


Expected results:
-----------------
The deployment should succeed

Comment 3 Sébastien Han 2022-02-01 14:13:46 UTC
As per Eran's comment offline, I'm closing this. Testing the Vault Namespace is irrelevant for the cluster-wide encryption scenario.
Thanks.

Comment 5 Mudit Agarwal 2022-02-08 14:07:36 UTC
Based on the email conversation with Eran and others, moving this to 4.11

Comment 6 Sébastien Han 2022-04-04 13:25:51 UTC
This was reported before this fix went in https://bugzilla.redhat.com/show_bug.cgi?id=2052937. I believe it's the same root cause, can you try this again with the latest 4.10?
It should work.

Thanks.

Comment 8 Rachael 2022-04-05 07:11:46 UTC
Since the documentation and UI for 4.10 do not support vault namespaces, can the target release be kept for ODF 4.11? 
The kubernetes auth method can then be tested and verified using vault namespaces in 4.11 and the UI changes done for the same can be reverted.

Comment 9 Sébastien Han 2022-04-11 08:20:18 UTC
(In reply to Rachael from comment #8)
> Since the documentation and UI for 4.10 do not support vault namespaces, can
> the target release be kept for ODF 4.11? 

Yes, it's a bit late for 4.10 changes.

> The kubernetes auth method can then be tested and verified using vault
> namespaces in 4.11 and the UI changes done for the same can be reverted.

Sounds good to me.
This can be moved to VERIFIED now I suppose, right?

Thanks!

Comment 10 Travis Nielsen 2022-04-12 15:13:40 UTC
Should this be ON_QA since we're just waiting for 4.11?

Comment 15 Sébastien Han 2022-05-02 13:54:31 UTC
Thanks Neha, do you need everything from me at the moment?
Let me know, thanks.

Comment 16 Mudit Agarwal 2022-05-09 07:10:19 UTC
Please test with the latest 4.11 build

Comment 31 errata-xmlrpc 2023-01-31 00:19:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.12.0 enhancement and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:0551

Comment 32 Red Hat Bugzilla 2023-12-08 04:27:29 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.