Bug 1505687
Summary: | Pods in unknown state, cannot be forcibly deleted. | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Sergi Jimenez Romero <sjr> |
Component: | Node | Assignee: | Avesh Agarwal <avagarwa> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | DeShuai Ma <dma> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.5.1 | CC: | aos-bugs, avagarwa, fcami, jokerman, mmccomas, rkrawitz, rpuccini, sjenning, sjr, sreber |
Target Milestone: | --- | ||
Target Release: | 3.9.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-02-06 17:59:30 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Sergi Jimenez Romero
2017-10-24 06:49:28 UTC
I believe the pods being on unknown state is the effect of the following proposal: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/pod-safety.md What doesn't seem to be working as expected is the "--force" parameter as per: https://github.com/kubernetes/kubernetes/pull/37263 kubectl delete pods <pod> --grace-period=0 --force should do the trick. As far as I have seen, oc delete will send any parameter directly to kubectl. Avesh, PTAL Hi Sergi Jimenez Romero, I looked into it and here is my understanding: First of all: oc delete --grace-period=0 --force should have done the job (though you tried oc delete --force --grace-period=0, but i think the order of force and grace-period does not matter). So now the question is why it did not, and it may be due to several reasons, for example, kubelet might be wedged on the node, or any other issues with the node. (There is a similar upstream issue: https://github.com/kubernetes/kubernetes/issues/43279.) As a next step, I'd suggest you to provide: 1) logs from node 2) logs from master 3) oc describe node 4) oc describe pod To really see what is going on. Hi Sergi Jimenez Romero, I have been trying to reproduce this on my 1 master and 2-node cluster but unable too. Could you provide some details about the cluster in addition to the info I asked in https://bugzilla.redhat.com/show_bug.cgi?id=1505687#c3? Since yesterday, I have been trying to reproduce but not able to: I have been running 50 pods rc: http://pastebin.test.redhat.com/527504 I have been running following commands several times for both nodes: oadm drain 192.168.122.186 --config=./openshift.local.config/master/admin.kubeconfig oadm drain 192.168.122.239 --config=./openshift.local.config/master/admin.kubeconfig oadm uncordon 192.168.122.186 --config=./openshift.local.config/master/admin.kubeconfig oadm uncordon 192.168.122.239 --config=./openshift.local.config/master/admin.kubeconfig I also tried Seth's suggestion to hold onto shell in mounted dirs on host for one of the pods basically by going into the pod's mounted dir and running a watch ls -al. But I dont see any pod stuck and drain is always successful. But the bad news is that I can delete pods in unkown state with out any issue so Before: # oc get pods -a -o wide --config=./openshift.local.config/master/admin.kubeconfig |grep Unknown nginx8-009hz 0/1 Unknown 0 17m <none> 192.168.122.239 nginx8-0zgcp 0/1 Unknown 0 17m <none> 192.168.122.239 nginx8-1mksh 0/1 Unknown 0 17m <none> 192.168.122.239 nginx8-3lghb 0/1 Unknown 0 17m <none> 192.168.122.239 nginx8-4cp93 0/1 Unknown 0 17m <none> 192.168.122.239 nginx8-4n9sm 0/1 Unknown 0 17m <none> 192.168.122.239 nginx8-66tkm 0/1 Unknown 0 17m <none> 192.168.122.239 nginx8-84llr 0/1 Unknown 0 17m <none> 192.168.122.239 nginx8-cb4v3 0/1 Unknown 0 17m <none> 192.168.122.239 nginx8-cxb6q 0/1 Unknown 0 17m <none> 192.168.122.239 nginx8-d4726 0/1 Unknown 0 17m <none> 192.168.122.239 nginx8-frk9n 0/1 Unknown 0 17m <none> 192.168.122.239 nginx8-g65xt 0/1 Unknown 0 17m <none> 192.168.122.239 nginx8-gdjfz 0/1 Unknown 0 17m <none> 192.168.122.239 nginx8-ktx78 0/1 Unknown 0 17m <none> 192.168.122.239 nginx8-ljb2f 0/1 Unknown 0 17m <none> 192.168.122.239 nginx8-m6lb4 0/1 Unknown 0 17m <none> 192.168.122.239 nginx8-pg93t 0/1 Unknown 0 17m <none> 192.168.122.239 nginx8-rq46z 0/1 Unknown 0 17m <none> 192.168.122.239 nginx8-s13pd 0/1 Unknown 0 17m <none> 192.168.122.239 nginx8-s4222 0/1 Unknown 0 17m <none> 192.168.122.239 nginx8-stjd5 0/1 Unknown 0 17m <none> 192.168.122.239 nginx8-stl9z 0/1 Unknown 0 17m <none> 192.168.122.239 nginx8-vkwt4 0/1 Unknown 0 17m <none> 192.168.122.239 nginx8-wkj4q 0/1 Unknown 0 17m <none> 192.168.122.239 Now force delete one of the above: #oc delete --force --grace-period=0 pod nginx8-009hz --config=./openshift.local.config/master/admin.kubeconfig warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. pod "nginx8-009hz" deleted After: # oc get pods -a -o wide --config=./openshift.local.config/master/admin.kubeconfig |grep Unknown nginx8-0zgcp 0/1 Unknown 0 18m <none> 192.168.122.239 nginx8-1mksh 0/1 Unknown 0 18m <none> 192.168.122.239 nginx8-3lghb 0/1 Unknown 0 18m <none> 192.168.122.239 nginx8-4cp93 0/1 Unknown 0 18m <none> 192.168.122.239 nginx8-4n9sm 0/1 Unknown 0 18m <none> 192.168.122.239 nginx8-66tkm 0/1 Unknown 0 18m <none> 192.168.122.239 nginx8-84llr 0/1 Unknown 0 18m <none> 192.168.122.239 nginx8-cb4v3 0/1 Unknown 0 18m <none> 192.168.122.239 nginx8-cxb6q 0/1 Unknown 0 18m <none> 192.168.122.239 nginx8-d4726 0/1 Unknown 0 18m <none> 192.168.122.239 nginx8-frk9n 0/1 Unknown 0 18m <none> 192.168.122.239 nginx8-g65xt 0/1 Unknown 0 18m <none> 192.168.122.239 nginx8-gdjfz 0/1 Unknown 0 18m <none> 192.168.122.239 nginx8-ktx78 0/1 Unknown 0 18m <none> 192.168.122.239 nginx8-ljb2f 0/1 Unknown 0 18m <none> 192.168.122.239 nginx8-m6lb4 0/1 Unknown 0 18m <none> 192.168.122.239 nginx8-pg93t 0/1 Unknown 0 18m <none> 192.168.122.239 nginx8-rq46z 0/1 Unknown 0 18m <none> 192.168.122.239 nginx8-s13pd 0/1 Unknown 0 18m <none> 192.168.122.239 nginx8-s4222 0/1 Unknown 0 18m <none> 192.168.122.239 nginx8-stjd5 0/1 Unknown 0 18m <none> 192.168.122.239 nginx8-stl9z 0/1 Unknown 0 18m <none> 192.168.122.239 nginx8-vkwt4 0/1 Unknown 0 18m <none> 192.168.122.239 nginx8-wkj4q 0/1 Unknown 0 18m <none> 192.168.122.239 Reference bug 1557306 |