Bug 2102440
| Summary: | Ceph daemons are crashing while any app pod remained in container creating state | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Amrita Mahapatra <ammahapa> |
| Component: | ceph | Assignee: | Kotresh HR <khiremat> |
| ceph sub component: | CephFS | QA Contact: | Elad <ebenahar> |
| Status: | CLOSED INSUFFICIENT_DATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | bniver, hyelloji, madam, muagarwa, ocs-bugs, odf-bz-bot, vshankar |
| Version: | 4.11 | Flags: | khiremat:
needinfo?
(ammahapa) |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-10-13 09:35:48 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Not a 4.11 blocker |
Description of problem (please be detailed as possible and provide log snippests): Ceph daemons are crashing while any app pod fails to move to Running and remains in container creating state. Although the pvc is in bound state, the pod failed with error logs, Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 5m5s default-scheduler Successfully assigned openshift-storage/pod-test-cephfs-271a7a8f35df44a299eab46f to ip-10-0-171-178.us-east-2.compute.internal by ip-10-0-182-125 Warning FailedMount 3m5s kubelet MountVolume.SetUp failed for volume "pvc-8b42c93f-a1d6-4b26-b828-d84070fd4736" : rpc error: code = DeadlineExceeded desc = context deadline exceeded Warning FailedMount 47s (x2 over 3m2s) kubelet Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[mypvc kube-api-access-lk5vj]: timed out waiting for the condition Warning FailedMount 35s (x8 over 3m1s) kubelet MountVolume.SetUp failed for volume "pvc-8b42c93f-a1d6-4b26-b828-d84070fd4736" : rpc error: code = Internal desc = mount failed: exit status 32 Version of all relevant components (if applicable): Validated with, OCP version: 4.11.0-0.nightly-2022-06-28-160049 OCS version: 4.11.0-107 ceph version: 16.2.8-59.el8cp Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? No Is there any workaround available to the best of your knowledge? Yes We can archive the crash details, then the ceph health check will pass. Example: [ammahapa@ammahapa ~]$ oc get pods | grep tool rook-ceph-tools-9f8c8976f-zk8ps 1/1 Running [ammahapa@ammahapa ~]$ oc -n openshift-storage exec rook-ceph-tools-9f8c8976f-zk8ps -- ceph crash archive-all Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 3 Can this issue reproducible? Its happening intermittently. Can this issue reproduce from the UI? No If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Deploy an ODF 4.11 cluster 2. Enable nfs using patch command, [ammahapa@ammahapa ~]$ oc patch -n openshift-storage storageclusters.ocs.openshift.io ocs-storagecluster --patch '{"spec": {"nfs":{"enable": true}}}' --type merge 3. Check Cephnfs resource got created [ammahapa@ammahapa ~]$ oc get cephnfs NAME AGE ocs-storagecluster-cephnfs 10s 4. Check nfs-ganesha pod is up and running [ammahapa@ammahapa ~]$ oc get pods | grep rook-ceph-nfs rook-ceph-nfs-ocs-storagecluster-cephnfs-a-f7767ddc8-897nq 2/2 Running 5. Enable rook_cis_nfs oc --namespace openshift-storage patch configmap rook-ceph-operator-config --type merge --patch '{"data":{"ROOK_CSI_ENABLE_NFS": "true"}}' 6. Create nfs pvcs with storageclass ocs-storagecluster-ceph-nfs 7. Create a pod with the nfs pvc mounted Actual results:Sometimes the pod is not moving to Running state and remaining in ContainerCreating state although the pvc is in bound state. Example error log, Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 5m5s default-scheduler Successfully assigned openshift-storage/pod-test-cephfs-271a7a8f35df44a299eab46f to ip-10-0-171-178.us-east-2.compute.internal by ip-10-0-182-125 Warning FailedMount 3m5s kubelet MountVolume.SetUp failed for volume "pvc-8b42c93f-a1d6-4b26-b828-d84070fd4736" : rpc error: code = DeadlineExceeded desc = context deadline exceeded Warning FailedMount 47s (x2 over 3m2s) kubelet Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[mypvc kube-api-access-lk5vj]: timed out waiting for the condition Warning FailedMount 35s (x8 over 3m1s) kubelet MountVolume.SetUp failed for volume "pvc-8b42c93f-a1d6-4b26-b828-d84070fd4736" : rpc error: code = Internal desc = mount failed: exit status 32 Expected results: The app pod should move to Running and ceph should not crash. Additional info: