+++ This bug was initially created as a clone of Bug #1748844 +++ Description of problem: MCO uses openshift/kubernetes-drain for draining nodes before rebooting. The library had a serious bug where the apiserver is flooded with requests. The bug has been fixed in https://github.com/openshift/kubernetes-drain/pull/3 but a deadlock fix went in with https://github.com/openshift/kubernetes-drain/pull/4. However, https://github.com/openshift/kubernetes-drain/pull/4 never landed and instead the library has been forked for supportability to cluster-api/pkg/drain. The MCO has to switch to the cluster-api fork to fix the deadlock and verify the drain fix is in place. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. $ cat pdb.yaml apiVersion: policy/v1beta1 kind: PodDisruptionBudget metadata: name: nginx-pdb spec: minAvailable: 1 selector: matchLabels: "app": "nginx" $ oc create -f pdb.yaml $ cat nginxrs.yaml apiVersion: apps/v1 kind: ReplicaSet metadata: name: nginx labels: app: nginx spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx $ oc create -f nginxrs.yaml 2. $ oc edit mcp/worker <- set "maxUnavailable: 3" in "spec" # find the node where the nginx pod has landed on $ oc describe pod nginx-h7q5b | grep -i node # grab the MCD for that node $ oc get pods -l k8s-app=machine-config-daemon --field-selector spec.nodeName=<NODE> -nopenshift-machine-config-operator # start streaming the logs $ oc logs -f <MCD_FOR_NODE> 3. # create a test MC to trigger a drain $ cat file.yaml 130 ↵ apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: test-file spec: config: ignition: version: 2.2.0 storage: files: - contents: source: data:text/plain;charset=utf;base64,c2VydmVyIGZvby5leGFtcGxlLm5ldCBtYXhkZWxheSAwLjQgb2ZmbGluZQpzZXJ2ZXIgYmFyLmV4YW1wbGUubmV0IG1heGRlbGF5IDAuNCBvZmZsaW5lCnNlcnZlciBiYXouZXhhbXBsZS5uZXQgbWF4ZGVsYXkgMC40IG9mZmxpbmUK filesystem: root mode: 0644 path: /etc/test $ oc create -f file.yaml # now look at the logs from the point above Actual results: eviction is tried for the nginx pod over and over w/o timeout Expected results: eviction is tried for the nginx pod over and over but with a 5s delay between requests Additional info: Having a pdb=1 and rsreplica=1 is a logical bug also, so it's expected the drain tries over and over, that's not a bug.
Verified on 4.1.0-0.nightly-2019-09-16-165032 eviction is tried for the nginx pod over and over with a 5 second delay I0918 14:10:28.275984 2730 update.go:89] pod "grafana-bf8f7bdf5-phr4c" removed (evicted) I0918 14:10:29.244958 2730 update.go:89] error when evicting pod "nginx-pdwfg" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0918 14:10:34.249058 2730 update.go:89] error when evicting pod "nginx-pdwfg" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0918 14:10:39.253445 2730 update.go:89] error when evicting pod "nginx-pdwfg" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0918 14:10:44.257521 2730 update.go:89] error when evicting pod "nginx-pdwfg" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0918 14:10:49.261931 2730 update.go:89] error when evicting pod "nginx-pdwfg" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0918 14:10:54.266075 2730 update.go:89] error when evicting pod "nginx-pdwfg" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0918 14:10:59.271507 2730 update.go:89] error when evicting pod "nginx-pdwfg" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0918 14:11:04.276535 2730 update.go:89] error when evicting pod "nginx-pdwfg" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0918 14:11:09.280714 2730 update.go:89] error when evicting pod "nginx-pdwfg" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0918 14:11:14.284921 2730 update.go:89] error when evicting pod "nginx-pdwfg" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0918 14:11:19.289207 2730 update.go:89] error when evicting pod "nginx-pdwfg" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0918 14:11:24.292810 2730 update.go:89] error when evicting pod "nginx-pdwfg" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2820