Bug 2223847
| Summary: | [RDR] rbd-mirror pod goes to ContainerStatusUnknown status | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | kmanohar |
| Component: | rook | Assignee: | Subham Rai <srai> |
| Status: | NEW --- | QA Contact: | Neha Berry <nberry> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.13 | CC: | amagrawa, kramdoss, muagarwa, odf-bz-bot, srai, srangana |
| Target Milestone: | --- | Flags: | srai:
needinfo?
(kmanohar) |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
kmanohar
2023-07-19 06:24:54 UTC
The error is ``` Message: The node was low on resource: ephemeral-storage. Threshold quantity: 19236596271, available: 18664628Ki. Container rbd-mirror was using 342044Ki, request is 0, has larger consumption of ephemeral-storage. Container log-collector was using 5940Ki, request is 0, has larger consumption of ephemeral-storage. ``` @srai There is one more rbd-mirror that got created and in running state. Could you please explain a bit more on that? And why the old pod stuck in the "ContainerStatusUnknown" state. If you see the newer pod is in a different node 10.135.1.12 which doesn't have low resources ``` pod/rook-ceph-osd-prepare-535693a906b30a53d2ba66acba7a8140-jdlk2 0/1 Completed 0 21d 10.135.1.12 compute-2 <none> <none> pod/rook-ceph-rbd-mirror-a-54bb9868f-hqqx7 0/2 ContainerStatusUnknown 2 23d 10.133.2.158 compute-0 <none> <none> ``` If you see the pod which is failing in a different node 10.133.2.158 has the error ``` Message: The node was low on resource: ephemeral-storage. Threshold quantity: 19236596271, available: 18664628Ki. Container rbd-mirror was using 342044Ki, request is 0, has larger consumption of ephemeral-storage. Container log-collector was using 5940Ki, request is 0, has larger consumption of ephemeral-storage. ``` @ @srai Don't we expect the stale pod to get deleted after the creation of new one? What would be right behavior here? Also, similar Kubernetes issue https://github.com/kubernetes/kubernetes/issues/104107 I don't this is something we can fix from the rook or ODF. I looked deeply into this, and there are two ways we can fix the issues 1. Node is on low resources and Kubernetes is thinking the availability is very low to keep the rbd mirror pod since it has logs container too without `spec.containers[].resources.limits.ephemeral-storage` set and same for rbd-mirror container. So, we need space on the node or clean some room on the node. 2. we can possibly set `spec.containers[].resources.limits.ephemeral-storage` on the containers. But I'll suggest 2nd is not the right fix IMO, since the node has the low resource that is the RCA, and we don't know what is the right value to put in `spec.containers[].resources.limits.ephemeral-storage` it could lead to other issues and could also restrict the log collections if we put very low value. Also, everything was working till now until the low resource on node. I think we don't have anything to do here. moving out 4.14, not a blocker. |