Description of problem: MCO calls into drain to evict pods on nodes. It ignores daemonsets and correctly drain the whole node. There's a bug in the drain library being fixed here https://github.com/openshift/machine-config-operator/pull/962 which showed a weird behavior from the image-registry pod though (it wasn't happening before that PR since the drain library wasn't correctly waiting on pod evictions). With the PR linked above, a drain operation on the node where the image-registry pod lives take 600s+ and even after evicting you can still see the image-registry container laying around. Version-Release number of selected component (if applicable): 4.2 for this bug - will open for 4.1 as well later How reproducible: always, found a pure kubectl drain reproducer as well Steps to Reproduce: $ node=$(oc get pod -o go-template --template '{{.spec.nodeName}}' -n openshift-image-registry $(oc get pods --all-namespaces -l docker-registry=default -o go-template --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}' | head -1)) $ kubectl drain --delete-local-data=true --force=true --grace-period=600 --ignore-daemonsets=true $node Actual results: drain takes 600+s to drain - other pods are evicted just fine Expected results: drain works w/o waiting so much time on the image-registry Additional info:
Has this been cloned and targeted for 4.1? The MCO needs a 4.1 fix for https://github.com/openshift/machine-config-operator/pull/1023
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922