1733922 – 4.1.4 cluster creating 400+ eviction requests a second

Bug 1733922 - 4.1.4 cluster creating 400+ eviction requests a second

Summary: 4.1.4 cluster creating 400+ eviction requests a second

Keywords:
Status:	CLOSED DUPLICATE of bug 1733708
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cloud Compute
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Jan Chaloupka
QA Contact:	Jianwei Hou
Docs Contact:
URL:
Whiteboard:
Depends On:	1732929 1733708
Blocks:
TreeView+	depends on / blocked

Reported:	2019-07-29 08:33 UTC by Alberto
Modified:	2019-07-29 08:39 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1733708
Environment:
Last Closed:	2019-07-29 08:39:45 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Alberto 2019-07-29 08:33:22 UTC

+++ This bug was initially created as a clone of Bug #1733708 +++

+++ This bug was initially created as a clone of Bug #1732929 +++

A 4.1.4 cluster identified via telemetry was reporting 500+ 429 requests a second via the API.  Upon inspection of the cluster the machine-controller-manager was observed generating 400+ pod eviction requests (that are all errors) a second continuously.

The cluster is available for inspection, the top three requests are all abnormal and a result of the very large number of eviction requests:

{client="machine-controller-manager/v0.0.0 (linux/amd64) kubernetes/$Format",code="429",endpoint="https",job="apiserver",namespace="default",resource="pods",scope="namespace",service="kubernetes",subresource="eviction",verb="CREATE"}	405.99523809523805
{client="machine-controller-manager/v0.0.0 (linux/amd64) kubernetes/$Format",code="429",contentType="application/json",endpoint="https",job="apiserver",namespace="default",resource="pods",scope="namespace",service="kubernetes",subresource="eviction",verb="POST"}	50
{client="openshift-kube-apiserver/v1.13.4+c62ce01 (linux/amd64) kubernetes/c62ce01",code="200",contentType="application/vnd.kubernetes.protobuf",endpoint="https",job="apiserver",namespace="default",resource="poddisruptionbudgets",scope="namespace",service="kubernetes",verb="LIST"}	50
{client="openshift-apiserver/v1.13.4+c62ce01 (linux/amd64) kubernetes/c62ce01",code="429",endpoint="https",job="apiserver",namespace="default",resource="subjectaccessreviews",scope="cluster",service="kubernetes",verb="CREATE"}	12.873109243697478

--- Additional comment from Clayton Coleman on 2019-07-24 18:18:41 UTC ---

There are 5 clusters reporting over 300 429-response code errors per second, which means this is likely impacting multiple.

--- Additional comment from Michael Gugino on 2019-07-25 02:03:33 UTC ---

openshift/kubernetes drain uses a goroutine for every pod that needs to be drained on a node.  If 429 is returned (can't do eviction, probably PDB), it sleeps 5 seconds before trying again.  There's a 20 second timeout on draining that we configure in the machine-controller, and we delay requeue of the machine-object delete request after a drain fails.

My guess is that a large scale-down (many machines) took place, and there's not a place to schedule all the pods with PDBs.

In 4.1, machine controller only drains when deleting, and drains (evictions) only get 429 from PDBs.

--- Additional comment from Vikas Choudhary on 2019-07-25 03:02:57 UTC ---

Behavior is as expected.

There is a pdb object in the namespace which says "minAvailable: 1".
There is a replicaset which says replica 1:
```
$ kubectl get replicaset -n istio-system
NAME                               DESIRED   CURRENT   READY     AGE
istio-pilot-77d5cd65b4             1         1         1         9d
```

PDB wont let the pod evict.
Now a machine is tried to be deleted. Corresponding node has such a pod. Machine controller calls drain api. drain api marks the node unschedulable and then it tries to evict pod. This pod eviction fails forever and it keeps retrying to evict. Node gets stuck in unschedulable state and machine never gets deleted.

--- Additional comment from Vikas Choudhary on 2019-07-25 03:05:04 UTC ---

I verified above in one of the clusters where this is happening. Also this is very easy to reproduce and verify.

--- Additional comment from Michael Gugino on 2019-07-25 04:55:49 UTC ---

(In reply to Vikas Choudhary from comment #4)
> I verified above in one of the clusters where this is happening. Also this
> is very easy to reproduce and verify.

How many machines are being deleted?  It shouldn't be spamming 300+ per second for a single pod.  There should be 5 seconds between each 429 response, timeout after 20 seconds, and then it should wait a bit longer before attempting to drain the node again.  If we're seeing 300+ requests/sec for one pod, something is broken.

--- Additional comment from Clayton Coleman on 2019-07-25 14:52:38 UTC ---

The possibility that drain is impossible needs to be part of the design of the drain code, so something akin to image pull backoff is of course necessary.  Setting severity appropriately.

--- Additional comment from Vikas Choudhary on 2019-07-25 14:58:52 UTC ---

(In reply to Michael Gugino from comment #5)
> (In reply to Vikas Choudhary from comment #4)
> > I verified above in one of the clusters where this is happening. Also this
> > is very easy to reproduce and verify.
> 
> How many machines are being deleted?  It shouldn't be spamming 300+ per
> second for a single pod.  There should be 5 seconds between each 429
> response, timeout after 20 seconds, and then it should wait a bit longer
> before attempting to drain the node again.  If we're seeing 300+
> requests/sec for one pod, something is broken.

retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0724 18:38:44.959919       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0724 18:38:44.977066       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): the server has received too many requests and has asked us to try again later (post pods.policy istio-pilot-77d5cd65b4-2s9rz)
I0724 18:38:44.977109       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0724 18:38:44.981985       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0724 18:38:44.982317       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0724 18:38:44.982477       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0724 18:38:44.982994       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0724 18:38:44.983569       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0724 18:38:44.990721       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): the server has received too many requests and has asked us to try again later (post pods.policy istio-pilot-77d5cd65b4-2s9rz)
I0724 18:38:44.991017       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0724 18:38:44.991579       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): the server has received too many requests and has asked us to try again later (post pods.policy istio-pilot-77d5cd65b4-2s9rz)
I0724 18:38:45.002211       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): the server has received too many requests and has asked us to try again later (post pods.policy istio-pilot-77d5cd65b4-2s9rz)
I0724 18:38:45.003483       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): the server has received too many requests and has asked us to try again later (post pods.policy istio-pilot-77d5cd65b4-2s9rz)
I0724 18:38:45.011189       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): the server has received too many requests and has asked us to try again later (post pods.policy istio-pilot-77d5cd65b4-2s9rz)
I0724 18:38:45.019750       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0724 18:38:45.020467       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0724 18:38:45.075287       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0724 18:38:45.084313       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0724 18:38:45.084433       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0724 18:38:45.085064       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): the server has received too many requests and has asked us to try again later (post pods.policy istio-pilot-77d5cd65b4-2s9rz)
I0724 18:38:45.107962       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): the server has received too many requests and has asked us to try again later (post pods.policy istio-pilot-77d5cd65b4-2s9rz)
I0724 18:38:45.109397       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0724 18:38:45.109540       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0724 18:38:45.109574       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0724 18:38:45.176538       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0724 18:38:45.191653       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): the server has received too many requests and has asked us to try again later (post pods.policy istio-pilot-77d5cd65b4-2s9rz)
I0724 18:38:45.236642       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): the server has received too many requests and has asked us to try again later (post pods.policy istio-pilot-77d5cd65b4-2s9rz)
I0724 18:38:45.236979       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): the server has received too many requests and has asked us to try again later (post pods.policy istio-pilot-77d5cd65b4-2s9rz)
I0724 18:38:45.300564       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0724 18:38:45.302414       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): the server has received too many requests and has asked us to try again later (post pods.policy istio-pilot-77d5cd65b4-2s9rz)
I0724 18:38:45.303310       1 info.go:20] error when evicting pod "istio-pilot-77d5cd65b4-2s9rz" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

...

77d5cd65b4-2s9rz" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

--- Additional comment from Michael Gugino on 2019-07-25 15:01:04 UTC ---

(In reply to Clayton Coleman from comment #6)
> The possibility that drain is impossible needs to be part of the design of
> the drain code, so something akin to image pull backoff is of course
> necessary.  Setting severity appropriately.

There should be a delay.  If this (many requests per second) is caused by pod-count-of-one, then it's a bug in that logic.  There's no scenario where you should see hundreds (or even dozens) requests per second for only one pod drain blocked by pdb.

--- Additional comment from Vikas Choudhary on 2019-07-27 18:46:23 UTC ---

In the drain logic, a goroutine is run for evicting each pod running on the node. If the eviction fails with error "too many requests", goroutine keeps creating eviction requests forever with a sleep of 5 sec. After global timeout, Drain() returns but the goroutine for the pods continues to create eviction requests.

machine controller gets error from the Drain() and then after waiting for 20 secs again invokes Drain(). Again goroutine for eviction of each pod gets created. And this way after each Drain() return and a wait of 20 secs, goroutines for making eviction requesting keep getting created.

Created fix PR, https://github.com/openshift/kubernetes-drain/pull/3, which is ensuring that after global timeout, all the goroutines return before the return of the Drain()

This problem is so serious that cluster will die if even a single pod failed to evict.

--- Additional comment from Vikas Choudhary on 2019-07-27 19:44:07 UTC ---

Steps to reproduce the issue:

1. Create the pdb with minAvailable 1:

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: nginx-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      "app": "nginx"


2. Create a replicaset with 1 replica and labels as `"app": "nginx"`

3. Find out the machine for the node on which pod is running.
4. Delete the machine

Machine deletion command will not return.
In another terminal, do a `tail -f` on machine-controller logs. You will see logs like below:
late the pod's disruption budget.
I0727 19:40:08.240725       1 info.go:20] error when evicting pod "pdb-workload-q24lb" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0727 19:40:13.250043       1 info.go:20] error when evicting pod "pdb-workload-q24lb" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0727 19:40:18.265153       1 info.go:20] error when evicting pod "pdb-workload-q24lb" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
 
In couple of minutes, you will see that time difference between the two eviction failure logs is decreasing. Failure logs will keep on increasing per second with time.


After applying the fix, you will see that time difference is always constant 5 secs.

--- Additional comment from Vikas Choudhary on 2019-07-27 19:49:51 UTC ---

WORKAROUND:
Delete the pdb object which is making the eviction fail. 

PREVENTIVE STEP:
PDB configuration and replica count should not be such that it could lead to eviction failure. For example, if minAvailable is 1 in the PDB and replica count is 1, this pod eviction will fail for sure and the issue will occur.

--- Additional comment from Vikas Choudhary on 2019-07-27 19:50:07 UTC ---

WORKAROUND:
Delete the pdb object which is making the eviction fail. 

PREVENTIVE STEP:
PDB configuration and replica count should not be such that it could lead to eviction failure. For example, if minAvailable is 1 in the PDB and replica count is 1, this pod eviction will fail for sure and the issue will occur.

Comment 1 Alberto 2019-07-29 08:39:45 UTC


*** This bug has been marked as a duplicate of bug 1733708 ***

Note You need to log in before you can comment on or make changes to this bug.