Bug 1564974
Summary: | unknown status pod continues to mount storage | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Kenjiro Nakayama <knakayam> |
Component: | Node | Assignee: | Seth Jennings <sjenning> |
Status: | CLOSED WONTFIX | QA Contact: | DeShuai Ma <dma> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.7.0 | CC: | aos-bugs, jokerman, mmccomas, mori, pdwyer, sjenning |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-04-10 01:56:03 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Kenjiro Nakayama
2018-04-09 04:58:41 UTC
This is expected behavior. If the node process is down, the control plane no way to cleanly terminate the pod. This means the pod could still be using the storage. The storage can not be unmounted while the pod is using it and the volume can not be safely detached from the node while it is mounted. The attach-detach controller will continue to ensure that the volume is attached to the node as the pod requiring that volumes is still assigned to the node. There are two ways to resolve the situation: 1) delete the pod or node explicitly (oc delete pod <podname> --grace-period=0 --force) or 2) bring the node back to acknowledge the deleted pod. > There are two ways to resolve the situation: 1) delete the pod or node explicitly (oc delete pod <podname> --grace-period=0 --force) or 2) bring the node back to acknowledge the deleted pod.
We know how to recover the issue. The reason why we opened this ticket is that the application (w/ RWO) will not be able to fail over when Node process down. It is not possible to fix the issue?
I agree it is an unexpected and unfortunate limitation, but there is no safe way for OpenShift/Kubernetes to do this. This upstream issue discusses it in greater detail: https://github.com/kubernetes/kubernetes/issues/26567 If the pod was not using exclusive RWO storage, the RS/SS/DS would start a new pod with no issue. However the storage complicates it as it can only be attached to one node at a time and the storage will not detach as long as the old pod is scheduled to the node. There is no way for OCP/Kube to know that the storage is in a consistent state and not in-use if the node will not respond. If it were to assume the pod is down, which is has no basis to assume, and detach the volume from the node, it could corrupt the data. The admin must intervene with out-of-band knowledge that the pod on the old node is terminated and the storage is in an otherwise consistent state and force delete the pod to allow Openshift to detach the storage from the current node and attach to the node where the new pod lands. This is an unfortunate reality of using RWO persistent volumes on OCP/Kube. A caveat is if you are using cloud provider integration. If a node is terminated, the node controller will notice and delete the node and all pods that were running on that node, freeing up any volumes attached as well. Thank you. I'm sorry for bothering you, but one more question. From different approach, if we asked you to implement a restriction setting, which is that new pod will not be spawned, when pod becomes "Unknown", is it not possible? We are asking because we do not want OpenShift to mount RO volume from 2 pods even if one pod is "unknown" status. The data corruption is what we would like to avoid. How would the data be corrupted if the volume is mounted read-only (assuming that is what you mean by RO)? I'm sorry, "RO volume" was not clear (actually wrong expression). I wanted to mean that "RWO" persistent volume. If we make the backend storage un-exclusive, pods can fail over even its unknown status. However, it means that whenever pods became "unknown" status, it has a possibility to mount PV(RWO) from multiple pods. So, we would like to stop spawning pods when it becomes unknown. RWO volumes can never be mounted by multiple pods at the same time. RWO is inherently exclusive. The the volume is in-use by a pod in Unknown state, the volume is still bound to that pod and can not be used by a new pod. A new pod might try to start using the same PVC and underlying PV, but it will fail to start on the node as the volume will be unable to attach to the node because it is bound to the node where the pod in Unknown state is running (or was running). |