Bug 1337479
| Summary: | After nfs server lost connection ,delete the pod that use nfs pv will made new pod scheduled to the node pendding always | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Wang Haoran <haowang> | |
| Component: | Storage | Assignee: | hchen | |
| Status: | CLOSED ERRATA | QA Contact: | Wenqi He <wehe> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 3.2.0 | CC: | aos-bugs, eparis, mawong, tdawson, wehe | |
| Target Milestone: | --- | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1367161 (view as bug list) | Environment: | ||
| Last Closed: | 2016-09-27 09:32:47 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1367161 | |||
|
Description
Wang Haoran
2016-05-19 09:49:30 UTC
when the nfs server is down, unmount will hang till timeout. It probably takes a while (300 seconds) till timeout happens. I tried to replicate this and for me it hangs indefinitely (e.g. over the weekend). And the pod that uses the pv is stuck terminating. If the nfs server is unreachable umount should have -l and/or -f? Here is a kubelet.log http://paste.fedoraproject.org/378985/46591229/ I stop seeing syncloop after a while (In reply to hchen from comment #1) > when the nfs server is down, unmount will hang till timeout. It probably > takes a while (300 seconds) till timeout happens. Hi, do you have a pr to resolve this , as you changed the status to modified ? opened an upstream issue https://github.com/kubernetes/kubernetes/issues/27463 fixed by upstream PR https://github.com/kubernetes/kubernetes/pull/26801 This has been merged and is in OSE v3.3.0.9 or newer. I have tested this in below version:
openshift v3.3.0.9
kubernetes v1.3.0+57fb9ac
etcd 2.3.0+git
The pod of mysql keeps in "Terminating" status, but I can create another pod:
[wehe@wehepc octest]$ oc get pods
NAME READY STATUS RESTARTS AGE
hello-openshift 1/1 Running 0 8m
mysql-1-dfotg 0/1 Terminating 0 49m
on the node, check the mounted path, after nfs server is deleted, the mount path still exist
device 172.30.98.62:/ mounted on /var/lib/origin/openshift.local.volumes/pods/755640e5-5242-11e6-a3ee-fa163e5577f0/volumes/kubernetes.io~nfs/nfs with fstype nfs4 statvers=1.1
opts: rw,vers=4.0,rsize=524288,wsize=524288,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.1.1.1,local_lock=none
age: 1722
@hchen, I think this is not fully fixed because the pod of mysql can not be terminated within 300 seconds and also for the mounted path disappearing.
The kuberenetes fix is to allow new pod creation not blocked by dead mount. However, the dead mount path cannot be not cleaned because the nfs server is unreachable. Verified on below version: openshift v3.3.0.19 kubernetes v1.3.0+507d3a7 etcd 2.3.0+git This bug is fixed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1933 |