| Summary: | [Dedicated]Pod attached pv cannot terminated after re-deployment if being scheduled to nodes other than ip-172-31-2-203.ec2.internal | ||
|---|---|---|---|
| Product: | OpenShift Online | Reporter: | Wenjing Zheng <wzheng> |
| Component: | Storage | Assignee: | Bradley Childs <bchilds> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Jianwei Hou <jhou> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.x | CC: | aos-bugs, chaoyang, decarr, jhou, jokerman, mmccomas, mturansk |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-05-23 15:08:30 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Wenjing Zheng
2016-04-01 09:59:57 UTC
I believe this issue is the attach/detach logic handled by https://github.com/kubernetes/kubernetes/issues/20262 Basically, Node A has a recon loop in Kubelet that will detach/unmount orphaned volumes (no pod needs them anymore). Unless and until Kubelet on Node A processes that orphaned volume, it will remain unaccessible for Node B (volume already in use error). The centralized attach/detach controller in the above PR seeks to mitigate this problem. (In reply to Mark Turansky from comment #2) > I believe this issue is the attach/detach logic handled by > https://github.com/kubernetes/kubernetes/issues/20262 > > Basically, Node A has a recon loop in Kubelet that will detach/unmount > orphaned volumes (no pod needs them anymore). Unless and until Kubelet on > Node A processes that orphaned volume, it will remain unaccessible for Node > B (volume already in use error). > > The centralized attach/detach controller in the above PR seeks to mitigate > this problem. From what I saw in the environment, during the 2nd deploy, the first mysql pod was stucked in 'Terminating' status, the volume was not detached from the instance, at the same time the new mysql pod was trying to mount the same ebs volume too, and it got the error above. Did you find why it was stuck in "terminating" status? So long as that pod remains on the original node (even in an error state), then its volumes are necessary on that node. Only when the pod is gone completely from a node will Kubelet unmount and detach all volumes. Clayton (I believe) mentioned on the bug scrub call today that pods stuck in Terminating state was an issue that Derek was chasing and is currently working on a fix. I think the pod stuck in terminating issue could have been resolved by: https://github.com/kubernetes/kubernetes/pull/23746 The node could have got stuck if the docker image pull for mysql:latest returned a 50x error response at any point each time it was attempting to restart/start the container. No such issue in ded-stage-aws now, pod is running normally after re-deployment. |