Bug 2039240 - [KMS] Deployment of ODF cluster fails when cluster wide encryption is enabled using service account for KMS auth
Summary: [KMS] Deployment of ODF cluster fails when cluster wide encryption is enabled...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.10.0
Assignee: Sébastien Han
QA Contact: Rachael
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-11 10:34 UTC by Rachael
Modified: 2023-08-09 17:03 UTC (History)
6 users (show)

Fixed In Version: 4.10.0-113
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-04-13 18:51:24 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github rook rook pull 9560 0 None open osd: use cluster name when fetching the cephcluster 2022-01-11 14:06:21 UTC
Red Hat Product Errata RHSA-2022:1372 0 None None None 2022-04-13 18:51:43 UTC

Description Rachael 2022-01-11 10:34:44 UTC
Description of problem (please be detailed as possible and provide log
snippets):

When cluster wide encryption is enabled using service account for KMS authentication, the OSD pods fail to come up and are stuck in Init:CrashLoopBackOff state. The following error is seen in the logs:


$ oc logs rook-ceph-osd-0-7cd85d4c67-9dxvp -c encryption-kms-get-kek 
2022-01-11 08:57:56.688171 C | rookcmd: failed to get ceph cluster in namespace "openshift-storage": cephclusters.ceph.rook.io "openshift-storage" not found


$ oc get pods|grep osd
NAME                                                              READY   STATUS                  RESTARTS         AGE
rook-ceph-osd-0-7cd85d4c67-9dxvp                                  0/2     Init:CrashLoopBackOff   28 (2m46s ago)   120m
rook-ceph-osd-1-6699c6c4f7-26sml                                  0/2     Init:CrashLoopBackOff   28 (2m21s ago)   120m
rook-ceph-osd-2-547ffc96b9-t8v4s                                  0/2     Init:CrashLoopBackOff   28 (2m38s ago)   120m



Version of all relevant components (if applicable):
---------------------------------------------------

ODF: odf-operator.v4.10.0      full_version=4.10.0-79
OCP: 4.10.0-0.nightly-2022-01-10-144202

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, the deployment fails and the cluster is not ready to be used.

Is there any workaround available to the best of your knowledge?
Not that I am aware of

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
Tried it once

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
No

Steps to Reproduce:
-------------------

1. Install the ODF operator

2. In the openshift-storage namespace, create a service account called odf-vault-auth
   # oc -n openshift-storage create serviceaccount odf-vault-auth

3. Create clusterrolebinding as shown below
   # oc -n openshift-storage create clusterrolebinding vault-tokenreview-binding --clusterrole=system:auth-delegator --serviceaccount=openshift-storage:odf-vault-auth

4. Get the secret name from the service account
   # oc -n openshift-storage get sa odf-vault-auth -o jsonpath="{.secrets[*]['name']}"

5. Get the Token and CA cert used to configure the kube auth in Vault
   # SA_JWT_TOKEN=$(oc -n openshift-storage get secret "$VAULT_SA_SECRET_NAME" -o jsonpath="{.data.token}" | base64 --decode; echo)
   # SA_CA_CRT=$(oc -n openshift-storage get secret "$VAULT_SA_SECRET_NAME" -o jsonpath="{.data['ca\.crt']}" | base64 --decode; echo)

6. Get the OCP endpoint and sa issuer
   # K8S_HOST=$(oc config view --minify --flatten -o jsonpath="{.clusters[0].cluster.server}")
   # issuer="$(oc get authentication.config cluster -o template="{{ .spec.serviceAccountIssuer }}")"

7. On the vault node/pod, configure the kube auth method
   # vault auth enable kubernetes
   
   # vault write auth/kubernetes/config \
          token_reviewer_jwt="$SA_JWT_TOKEN" \
          kubernetes_host="$K8S_HOST" \
          kubernetes_ca_cert="$SA_CA_CRT" \
          issuer="$issuer"

   # vault write auth/kubernetes/role/odf-rook-ceph-op \
        bound_service_account_names=rook-ceph-system,rook-ceph-osd, noobaa \
        bound_service_account_namespaces=openshift-storage \
        policies=odf \
        ttl=1440h

   # vault write auth/kubernetes/role/odf-rook-ceph-osd \
        bound_service_account_names=rook-ceph-osd \
        bound_service_account_namespaces=openshift-storage \
        policies=odf \
        ttl=1440h

8. From the ODF management console, follow the steps to create the storagesystem.
9. On the Security and network page, click on "Enable data encryption for block and file storage"
10. Select "Cluster-wide encryption" from encryption level and click on "Connect to an external key management service".
11. Set Authentication method to "Kubernetes" and fill out the rest of the details 
12. Review and create the storagesystem
13. Check the status of the OSD pods


Actual results:
---------------
The OSD pods are in Init:CrashLoopBackOff state. 


Expected results:
-----------------
The deployment should be successful and the OSD pods should be up and running.

Comment 3 Sébastien Han 2022-01-11 15:36:36 UTC
Will be in the next resync https://github.com/red-hat-storage/rook/pull/326

Comment 11 errata-xmlrpc 2022-04-13 18:51:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1372


Note You need to log in before you can comment on or make changes to this bug.