Description of problem (please be detailed as possible and provide log snippets): When cluster wide encryption is enabled using service account for KMS authentication, the OSD pods fail to come up and are stuck in Init:CrashLoopBackOff state. The following error is seen in the logs: $ oc logs rook-ceph-osd-0-7cd85d4c67-9dxvp -c encryption-kms-get-kek 2022-01-11 08:57:56.688171 C | rookcmd: failed to get ceph cluster in namespace "openshift-storage": cephclusters.ceph.rook.io "openshift-storage" not found $ oc get pods|grep osd NAME READY STATUS RESTARTS AGE rook-ceph-osd-0-7cd85d4c67-9dxvp 0/2 Init:CrashLoopBackOff 28 (2m46s ago) 120m rook-ceph-osd-1-6699c6c4f7-26sml 0/2 Init:CrashLoopBackOff 28 (2m21s ago) 120m rook-ceph-osd-2-547ffc96b9-t8v4s 0/2 Init:CrashLoopBackOff 28 (2m38s ago) 120m Version of all relevant components (if applicable): --------------------------------------------------- ODF: odf-operator.v4.10.0 full_version=4.10.0-79 OCP: 4.10.0-0.nightly-2022-01-10-144202 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, the deployment fails and the cluster is not ready to be used. Is there any workaround available to the best of your knowledge? Not that I am aware of Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 3 Can this issue reproducible? Tried it once Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: No Steps to Reproduce: ------------------- 1. Install the ODF operator 2. In the openshift-storage namespace, create a service account called odf-vault-auth # oc -n openshift-storage create serviceaccount odf-vault-auth 3. Create clusterrolebinding as shown below # oc -n openshift-storage create clusterrolebinding vault-tokenreview-binding --clusterrole=system:auth-delegator --serviceaccount=openshift-storage:odf-vault-auth 4. Get the secret name from the service account # oc -n openshift-storage get sa odf-vault-auth -o jsonpath="{.secrets[*]['name']}" 5. Get the Token and CA cert used to configure the kube auth in Vault # SA_JWT_TOKEN=$(oc -n openshift-storage get secret "$VAULT_SA_SECRET_NAME" -o jsonpath="{.data.token}" | base64 --decode; echo) # SA_CA_CRT=$(oc -n openshift-storage get secret "$VAULT_SA_SECRET_NAME" -o jsonpath="{.data['ca\.crt']}" | base64 --decode; echo) 6. Get the OCP endpoint and sa issuer # K8S_HOST=$(oc config view --minify --flatten -o jsonpath="{.clusters[0].cluster.server}") # issuer="$(oc get authentication.config cluster -o template="{{ .spec.serviceAccountIssuer }}")" 7. On the vault node/pod, configure the kube auth method # vault auth enable kubernetes # vault write auth/kubernetes/config \ token_reviewer_jwt="$SA_JWT_TOKEN" \ kubernetes_host="$K8S_HOST" \ kubernetes_ca_cert="$SA_CA_CRT" \ issuer="$issuer" # vault write auth/kubernetes/role/odf-rook-ceph-op \ bound_service_account_names=rook-ceph-system,rook-ceph-osd, noobaa \ bound_service_account_namespaces=openshift-storage \ policies=odf \ ttl=1440h # vault write auth/kubernetes/role/odf-rook-ceph-osd \ bound_service_account_names=rook-ceph-osd \ bound_service_account_namespaces=openshift-storage \ policies=odf \ ttl=1440h 8. From the ODF management console, follow the steps to create the storagesystem. 9. On the Security and network page, click on "Enable data encryption for block and file storage" 10. Select "Cluster-wide encryption" from encryption level and click on "Connect to an external key management service". 11. Set Authentication method to "Kubernetes" and fill out the rest of the details 12. Review and create the storagesystem 13. Check the status of the OSD pods Actual results: --------------- The OSD pods are in Init:CrashLoopBackOff state. Expected results: ----------------- The deployment should be successful and the OSD pods should be up and running.
Will be in the next resync https://github.com/red-hat-storage/rook/pull/326
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1372