Description of problem: The --force option seems to misbehaving when there is a completely orphaned pod in the cluster (no rc, dc, or daemonset). The drain command with force option fails to remove orphaned pods, daemonset pods (like fluentd pods for example) that somehow get their DC/RC removed. Version-Release number of selected component (if applicable): 3.4 How reproducible: As reproducible as creating orphaned pods. Steps to Reproduce: 1. Deploy daemonset pods, like fluentd. Ensure DC/RC non-existent. 2. Run "drain" with the --force option. 3. Observe undrained orphan pod. Actual results: Normal pods removed, orphaned pods not removed. Expected results: Normal and orphaned pods removed. Additional info: None
just to clear up the above. Daemonset pods get cleaned up just fine, its the scenario where someone goes and deletes a DC through the UI and due to whatever conditions a pod ends up being left lying around, once that happens, drain fails and complains that it cannot find an RC for the pod (which doesnt exist since it was deleted), however based on the -h for drain --force should remove such pods.
(In reply to Boris Kurktchiev from comment #1) > just to clear up the above. Daemonset pods get cleaned up just fine, its the > scenario where someone goes and deletes a DC through the UI and due to > whatever conditions a pod ends up being left lying around, once that > happens, drain fails and complains that it cannot find an RC for the pod > (which doesnt exist since it was deleted), however based on the -h for drain > --force should remove such pods. Your explanation is consistent with the code, and I don't see -h making a claim about orphaned pods. --force is intended to proceed with drain when a node is hosting pods that are not managed. If a pod indicates it is managed (as an orphaned pod would) and the managing resource cannot be found, I'm not sure --force should delete it. Having drain fail in that case gives the user a chance to detect that something serious is wrong.
SO here is what I am seeing when i view oc adm drain -h: Examples: # Drain node "foo", even if there are pods not managed by a ReplicationController, ReplicaSet, Job, or DaemonSet on it. $ oc adm drain foo --force Reading the line above, it makes it seem as it would do exactly what I described. root@osmaster0p:/etc/origin/master: ----> oc version oc v3.4.0.39 kubernetes v1.4.0+776c994 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://api.cloudapps.unc.edu:443 openshift v3.4.0.39 kubernetes v1.4.0+776c994 I do not know if that text has changed in the latest 3.4.1.* release and if it has then ok, if not, then there is still a problem between what the code is doing and what the user is told should happen, according to the help text.
(In reply to Boris Kurktchiev from comment #3) > SO here is what I am seeing when i view oc adm drain -h: > > Examples: > # Drain node "foo", even if there are pods not managed by a > ReplicationController, ReplicaSet, Job, or DaemonSet on > it. > $ oc adm drain foo --force > > Reading the line above, it makes it seem as it would do exactly what I > described. Why do you think it would remove orphans, when there is no mention of orphans in the text? When orphans are detected, I think a user needs to figure out what to do with them. I would expect a user to recreate the managing resource or run delete with --selector that targets the orphans, rather than blindly removing orphaned pods.
The text reads: Drain node "foo", even if there are pods not managed by a ReplicationController, ReplicaSet, Job, or DaemonSet on it. My scenario as described: Daemonset pods get cleaned up just fine, its the scenario where someone goes and deletes a DC through the UI and due to whatever conditions a pod ends up being left lying around, once that happens, drain fails and complains that it cannot find an RC for the pod As I read the current text said pod above should be deleted. I am not saying you are wrong, your assertion that what happens is the correct behavior is right, what I am driving at is that I went in assuming something would happen based on what I read, so if anything the help text needs to reflect said behavior. Unless there is some way to make sure that users do NOT end up in the state described (pods lying around because their DC/RCs get deleted and system doesnt know what to do with them), I built a process around what I assumed the behavior of --force was going to be based on the information provided by -h.
UPSTREAM PR: https://github.com/kubernetes/kubernetes/pull/41864
I have no idea why this was moved to POST or MODIFIED. Per comment 8, we still need this in release-1.5 Moving back to assigned.
The ORIGIN PR was here for release-1.5 and has merged. https://github.com/openshift/origin/pull/13123
This has been merged into ocp and is in OCP v3.5.0.38 or newer.
Verified on openshift v3.5.0.39. Fixed. steps: 1. create a pod which is not managed by any controller. (pod created and running) 2. # oadm drain <nodeName> --force --delete-local-data pod "hello-pod1" evicted node "<nodeName>" drained 3. check pod and node status # oc get pod hello-pod1 Error from server (NotFound): pods "hello-pod1" not found # oc describe node <nodeName> <---snip---> Non-terminated Pods: (0 in total) <---snip--->
Test steps: 1. create a RC. # oc create -f rc.yaml 2. delete RC, keep the pod. # oc delete rc <RCname> --cascade=false 3. oadm drain <nodeName> --force --delete-local-data node "<nodeName>" drained 4. check node status # oc describe node <nodeName> <---snip---> Non-terminated Pods: (0 in total) <---snip--->
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0884