1764412 – [sig-scheduling] SchedulerPriorities [Serial] Pod should avoid nodes that have avoidPod annotation [Suite:openshift/conformance/serial] [Suite:k8s]

Bug 1764412 - [sig-scheduling] SchedulerPriorities [Serial] Pod should avoid nodes that have avoidPod annotation [Suite:openshift/conformance/serial] [Suite:k8s]

Summary: [sig-scheduling] SchedulerPriorities [Serial] Pod should avoid nodes that hav...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-scheduler
Sub Component:
Version:	4.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	high
Target Milestone:	---
Target Release:	4.5.0
Assignee:	Jan Chaloupka
QA Contact:	RamaKasturi
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1765736 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-10-23 03:35 UTC by huirwang
Modified:	2020-07-13 17:12 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-07-13 17:11:31 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:2409	0	None	None	None	2020-07-13 17:12:02 UTC

Description huirwang 2019-10-23 03:35:33 UTC

Description of problem:


[sig-scheduling] SchedulerPriorities [Serial] Pod should avoid nodes that have avoidPod annotation [Suite:openshift/conformance/serial] [Suite:k8s] expand_less 	2m34s
fail [k8s.io/kubernetes/test/e2e/framework/util.go:1355]: Expected
    <string>: ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-jcmtc
not to equal
    <string>: ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-jcmtc

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-serial-4.3/208

related jobs:
https://testgrid.k8s.io/redhat-openshift-release-4.3-informing-ocp#release-openshift-ocp-installer-e2e-azure-serial-4.3&show-stale-tests=

Comment 1 Jan Chaloupka 2019-10-25 10:54:38 UTC

From `[sig-scheduling] SchedulerPriorities [Serial] Pod should avoid nodes that have avoidPod annotation` logs:

```
STEP: Scaling ReplicationController scheduler-priority-avoid-pod in namespace e2e-sched-priority-9709 to 2
STEP: Verify the pods should not scheduled to the node: ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-jcmtc
Oct 23 02:07:21.488: INFO: Expect to be unequal: ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-jcmtc and ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-jcmtc
```

From test logs:

```
I1023 02:07:11.102791       1 scheduler.go:667] pod e2e-sched-priority-9709/scheduler-priority-avoid-pod-k58k8 is bound successfully on node "ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-shrs4", 6 nodes evaluated, 2 nodes were found feasible. Bound node resource: "Capacity: CPU<2>|Memory<8141556Ki>|Pods<250>|StorageEphemeral<133682156Ki>; Allocatable: CPU<1500m>|Memory<7527156Ki>|Pods<250>|StorageEphemeral<123201474766>.".
I1023 02:07:11.129010       1 scheduler.go:667] pod e2e-sched-priority-9709/scheduler-priority-avoid-pod-22s5k is bound successfully on node "ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-jcmtc", 6 nodes evaluated, 1 nodes were found feasible. Bound node resource: "Capacity: CPU<2>|Memory<8141556Ki>|Pods<250>|StorageEphemeral<133682156Ki>; Allocatable: CPU<1500m>|Memory<7527156Ki>|Pods<250>|StorageEphemeral<123201474766>.".
```

What's happening here:
1. "Pod should avoid nodes that have avoidPod annotation" test wants to verify the scheduler does not schedule pods to a node which has "scheduler.alpha.kubernetes.io/preferAvoidPods" annotation set.
2. It creates RC that will produce pods which will be used for testing
3. The test picks node "ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-jcmtc" and annotates it with about-to-be produced pods which are expected not to be scheduled to the node
4. RC is scaled up, two new pods are created
5. the first pod "e2e-sched-priority-9709/scheduler-priority-avoid-pod-k58k8" is scheduled and bounded, ignoring node "ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-jcmtc" as expected
6. the second pod "e2e-sched-priority-9709/scheduler-priority-avoid-pod-22s5k" is scheduled, bounded to node "ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-jcmtc" which is unexpected
7. the test fails since the second pod is not supposed to be scheduled to node "ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-jcmtc"

As can be seen from the test logs, when the scheduler tries to schedule the second pod it has only one node left (the one to be ignored). So it has no other choice than to bind the pod to the node. Under the hood "scheduler.alpha.kubernetes.io/preferAvoidPods" annotation is used only when node priorities are computed. In case where there is only one node left, the priorities has no effect. Thus the reason why node "ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-jcmtc" is not excluded from the list of target nodes for the second pod.

Comment 2 Jan Chaloupka 2019-10-25 12:57:55 UTC

Checking other part of the logs:

```
Oct 23 02:06:32.708: INFO: ComputeCPUMemFraction for node: ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-jcmtc
Oct 23 02:06:32.816: INFO: Pod for on the node: tuned-87gz6, Cpu: 10, Mem: 52428800
Oct 23 02:06:32.816: INFO: Pod for on the node: dns-default-7nb87, Cpu: 110, Mem: 283115520
Oct 23 02:06:32.816: INFO: Pod for on the node: node-ca-gkpn5, Cpu: 10, Mem: 10485760
Oct 23 02:06:32.816: INFO: Pod for on the node: machine-config-daemon-2n4sr, Cpu: 40, Mem: 104857600
Oct 23 02:06:32.816: INFO: Pod for on the node: node-exporter-klgq5, Cpu: 110, Mem: 230686720
Oct 23 02:06:32.816: INFO: Pod for on the node: multus-5dtkw, Cpu: 10, Mem: 157286400
Oct 23 02:06:32.816: INFO: Pod for on the node: ovs-5t74s, Cpu: 200, Mem: 419430400
Oct 23 02:06:32.816: INFO: Pod for on the node: sdn-b2mml, Cpu: 100, Mem: 209715200
Oct 23 02:06:32.816: INFO: Node: ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-jcmtc, totalRequestedCPUResource: 690, cpuAllocatableMil: 1500, cpuFraction: 0.46
Oct 23 02:06:32.816: INFO: Node: ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-jcmtc, totalRequestedMemResource: 1572864000, memAllocatableVal: 7707807744, memFraction: 0.20406113544079596
Oct 23 02:06:32.816: INFO: ComputeCPUMemFraction for node: ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-shrs4
Oct 23 02:06:32.927: INFO: Pod for on the node: tuned-x8hjb, Cpu: 10, Mem: 52428800
Oct 23 02:06:32.927: INFO: Pod for on the node: dns-default-h74fz, Cpu: 110, Mem: 283115520
Oct 23 02:06:32.927: INFO: Pod for on the node: node-ca-h4mpb, Cpu: 10, Mem: 10485760
Oct 23 02:06:32.927: INFO: Pod for on the node: router-default-64f8bcb478-v6sxh, Cpu: 100, Mem: 268435456
Oct 23 02:06:32.927: INFO: Pod for on the node: machine-config-daemon-27fm4, Cpu: 40, Mem: 104857600
Oct 23 02:06:32.927: INFO: Pod for on the node: alertmanager-main-0, Cpu: 300, Mem: 445644800
Oct 23 02:06:32.927: INFO: Pod for on the node: alertmanager-main-1, Cpu: 300, Mem: 445644800
Oct 23 02:06:32.927: INFO: Pod for on the node: grafana-65cd8c7fff-qwhf5, Cpu: 200, Mem: 314572800
Oct 23 02:06:32.927: INFO: Pod for on the node: node-exporter-vhfmc, Cpu: 110, Mem: 230686720
Oct 23 02:06:32.927: INFO: Pod for on the node: prometheus-adapter-5f5d9cfd57-gc5bz, Cpu: 10, Mem: 20971520
Oct 23 02:06:32.927: INFO: Pod for on the node: prometheus-k8s-1, Cpu: 530, Mem: 1398800384
Oct 23 02:06:32.927: INFO: Pod for on the node: telemeter-client-756d9dddc4-xhwcz, Cpu: 210, Mem: 440401920
Oct 23 02:06:32.927: INFO: Pod for on the node: multus-rcvkr, Cpu: 10, Mem: 157286400
Oct 23 02:06:32.927: INFO: Pod for on the node: ovs-6gxnm, Cpu: 200, Mem: 419430400
Oct 23 02:06:32.927: INFO: Pod for on the node: sdn-4d2kn, Cpu: 100, Mem: 209715200
Oct 23 02:06:32.927: INFO: Node: ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-shrs4, totalRequestedCPUResource: 2340, cpuAllocatableMil: 1500, cpuFraction: 1
Oct 23 02:06:32.927: INFO: Node: ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-shrs4, totalRequestedMemResource: 4907335680, memAllocatableVal: 7707807744, memFraction: 0.6366707425752834
Oct 23 02:06:32.927: INFO: ComputeCPUMemFraction for node: ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-tfbnm
Oct 23 02:06:33.033: INFO: Pod for on the node: tuned-272r5, Cpu: 10, Mem: 52428800
Oct 23 02:06:33.033: INFO: Pod for on the node: dns-default-t6h9n, Cpu: 110, Mem: 283115520
Oct 23 02:06:33.033: INFO: Pod for on the node: image-registry-66cf94d4cd-2bwb2, Cpu: 100, Mem: 268435456
Oct 23 02:06:33.033: INFO: Pod for on the node: node-ca-xvk4j, Cpu: 10, Mem: 10485760
Oct 23 02:06:33.033: INFO: Pod for on the node: router-default-64f8bcb478-gfcft, Cpu: 100, Mem: 268435456
Oct 23 02:06:33.033: INFO: Pod for on the node: machine-config-daemon-gz48t, Cpu: 40, Mem: 104857600
Oct 23 02:06:33.033: INFO: Pod for on the node: certified-operators-6d6bbf5f54-4n7vv, Cpu: 10, Mem: 104857600
Oct 23 02:06:33.033: INFO: Pod for on the node: community-operators-6bc445f7db-d2b9f, Cpu: 10, Mem: 104857600
Oct 23 02:06:33.033: INFO: Pod for on the node: redhat-operators-79644f5cf-tr85m, Cpu: 10, Mem: 104857600
Oct 23 02:06:33.033: INFO: Pod for on the node: alertmanager-main-2, Cpu: 300, Mem: 445644800
Oct 23 02:06:33.033: INFO: Pod for on the node: kube-state-metrics-5db5d576f6-ncrx7, Cpu: 30, Mem: 125829120
Oct 23 02:06:33.033: INFO: Pod for on the node: node-exporter-p4fsk, Cpu: 110, Mem: 230686720
Oct 23 02:06:33.033: INFO: Pod for on the node: openshift-state-metrics-64bf9548b6-245k7, Cpu: 120, Mem: 199229440
Oct 23 02:06:33.033: INFO: Pod for on the node: prometheus-adapter-5f5d9cfd57-tvkcg, Cpu: 10, Mem: 20971520
Oct 23 02:06:33.033: INFO: Pod for on the node: prometheus-k8s-0, Cpu: 530, Mem: 1398800384
Oct 23 02:06:33.033: INFO: Pod for on the node: multus-7zh95, Cpu: 10, Mem: 157286400
Oct 23 02:06:33.033: INFO: Pod for on the node: ovs-z4p7x, Cpu: 200, Mem: 419430400
Oct 23 02:06:33.033: INFO: Pod for on the node: sdn-ck9mw, Cpu: 100, Mem: 209715200
Oct 23 02:06:33.033: INFO: Node: ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-tfbnm, totalRequestedCPUResource: 1910, cpuAllocatableMil: 1500, cpuFraction: 1
Oct 23 02:06:33.033: INFO: Node: ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-tfbnm, totalRequestedMemResource: 4614782976, memAllocatableVal: 7707807744, memFraction: 0.5987153713832953
```

Interesting lines:

```
Oct 23 02:06:32.927: INFO: Node: ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-shrs4, totalRequestedCPUResource: 2340, cpuAllocatableMil: 1500, cpuFraction: 1
Oct 23 02:06:33.033: INFO: Node: ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-tfbnm, totalRequestedCPUResource: 1910, cpuAllocatableMil: 1500, cpuFraction: 1
```

Both nodes have more CPU resources requested than there is available. In other words both nodes are running more pods than they should be.

After the nodes are resource balanced (which in this case means no node has CPU resource left) we get:

```
Oct 23 02:07:03.866: INFO: Node: ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-jcmtc, totalRequestedCPUResource: 1500, cpuAllocatableMil: 1500, cpuFraction: 1
Oct 23 02:07:04.041: INFO: Node: ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-shrs4, totalRequestedCPUResource: 2340, cpuAllocatableMil: 1500, cpuFraction: 1
Oct 23 02:07:04.151: INFO: Node: ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-tfbnm, totalRequestedCPUResource: 1910, cpuAllocatableMil: 1500, cpuFraction: 1
```

Clearly, no node has any CPU resource left for scheduling. Yet, the scheduler were still able to bound both pods to "ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-shrs4" and "ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-jcmtc".

The nodeInfo cache in the scheduler might be outdated and not take into account all pods scheduled on particular node. Given the cache is updated based on event handler reactions, some of the pod.Add events could have been dropped due to enormous number of events generated at a single point of time. Though it's just a speculation.

I was not able to find how much CPU resources are requested for cases when the test goes green. Though, the expected scheduler output should look like [1]:

```
I1022 22:25:41.424393       1 scheduler.go:667] pod e2e-sched-priority-7740/scheduler-priority-avoid-pod-nvvsq is bound successfully on node "ci-op-x376v89r-09f59-4pq2c-worker-centralus2-pvv9c", 6 nodes evaluated, 2 nodes were found feasible. Bound node resource: "Capacity: CPU<2>|Memory<8140500Ki>|Pods<250>|StorageEphemeral<133682156Ki>; Allocatable: CPU<1500m>|Memory<7526100Ki>|Pods<250>|StorageEphemeral<123201474766>.".
I1022 22:25:41.443855       1 scheduler.go:667] pod e2e-sched-priority-7740/scheduler-priority-avoid-pod-zg26l is bound successfully on node "ci-op-x376v89r-09f59-4pq2c-worker-centralus2-pvv9c", 6 nodes evaluated, 2 nodes were found feasible. Bound node resource: "Capacity: CPU<2>|Memory<8140500Ki>|Pods<250>|StorageEphemeral<133682156Ki>; Allocatable: CPU<1500m>|Memory<7526100Ki>|Pods<250>|StorageEphemeral<123201474766>.".
```

In any case, even tough the scheduler were able to bound only a single pod to "ci-op-jjgjm1z5-09f59-jcx9s-worker-westus-shrs4", it means the node ran out of free CPU. The test assumes there is still sufficient CPU and memory left on each of the node after they are resource balanced. Which is not the case here.

[1] From https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-serial-4.3/207/artifacts/e2e-azure-serial/pods/openshift-kube-scheduler_openshift-kube-scheduler-ci-op-x376v89r-09f59-4pq2c-master-2_scheduler.log

Comment 3 Jan Chaloupka 2019-10-25 14:08:41 UTC

Also from https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-serial-4.1/818/artifacts/e2e-aws-serial/junit/ we can see the following computations:

```
May 17 17:17:08.075: INFO: ComputeCpuMemFraction for node: ip-10-0-128-108.ec2.internal
May 17 17:17:08.201: INFO: Pod for on the node: tuned-r2drh, Cpu: 100, Mem: 209715200
May 17 17:17:08.201: INFO: Pod for on the node: dns-default-rr5fz, Cpu: 110, Mem: 283115520
May 17 17:17:08.201: INFO: Pod for on the node: image-registry-669565b768-ng6gd, Cpu: 100, Mem: 268435456
May 17 17:17:08.201: INFO: Pod for on the node: node-ca-vhqwd, Cpu: 100, Mem: 209715200
May 17 17:17:08.201: INFO: Pod for on the node: router-default-54d85f8787-f8pxb, Cpu: 100, Mem: 268435456
May 17 17:17:08.201: INFO: Pod for on the node: machine-config-daemon-x666j, Cpu: 20, Mem: 52428800
May 17 17:17:08.201: INFO: Pod for on the node: certified-operators-579dfd4c79-xt5js, Cpu: 100, Mem: 209715200
May 17 17:17:08.201: INFO: Pod for on the node: community-operators-67448849fd-9289n, Cpu: 100, Mem: 209715200
May 17 17:17:08.201: INFO: Pod for on the node: redhat-operators-5f6c48b6dd-8w297, Cpu: 100, Mem: 209715200
May 17 17:17:08.201: INFO: Pod for on the node: alertmanager-main-0, Cpu: 300, Mem: 629145600
May 17 17:17:08.201: INFO: Pod for on the node: alertmanager-main-2, Cpu: 300, Mem: 629145600
May 17 17:17:08.201: INFO: Pod for on the node: kube-state-metrics-dcb5d99cc-cfpjl, Cpu: 300, Mem: 629145600
May 17 17:17:08.201: INFO: Pod for on the node: node-exporter-rzxbd, Cpu: 110, Mem: 230686720
May 17 17:17:08.201: INFO: Pod for on the node: prometheus-adapter-85476c4bbd-c2jpn, Cpu: 100, Mem: 209715200
May 17 17:17:08.201: INFO: Pod for on the node: prometheus-k8s-0, Cpu: 600, Mem: 1258291200
May 17 17:17:08.201: INFO: Pod for on the node: telemeter-client-76c555c857-65hbv, Cpu: 210, Mem: 440401920
May 17 17:17:08.201: INFO: Pod for on the node: multus-lgvtp, Cpu: 100, Mem: 209715200
May 17 17:17:08.201: INFO: Pod for on the node: ovs-fwr8d, Cpu: 200, Mem: 419430400
May 17 17:17:08.201: INFO: Pod for on the node: sdn-cvknz, Cpu: 100, Mem: 209715200
May 17 17:17:08.201: INFO: Node: ip-10-0-128-108.ec2.internal, totalRequestedCpuResource: 1650, cpuAllocatableMil: 1500, cpuFraction: 1
May 17 17:17:08.201: INFO: Node: ip-10-0-128-108.ec2.internal, totalRequestedMemResource: 3535798272, memAllocatableVal: 7729664000, memFraction: 0.4574323375505067
May 17 17:17:08.201: INFO: ComputeCpuMemFraction for node: ip-10-0-131-112.ec2.internal
May 17 17:17:08.317: INFO: Pod for on the node: tuned-vnkwf, Cpu: 100, Mem: 209715200
May 17 17:17:08.317: INFO: Pod for on the node: dns-default-l5486, Cpu: 110, Mem: 283115520
May 17 17:17:08.317: INFO: Pod for on the node: node-ca-7nkmq, Cpu: 100, Mem: 209715200
May 17 17:17:08.317: INFO: Pod for on the node: machine-config-daemon-rdjdz, Cpu: 20, Mem: 52428800
May 17 17:17:08.317: INFO: Pod for on the node: node-exporter-tf28q, Cpu: 110, Mem: 230686720
May 17 17:17:08.317: INFO: Pod for on the node: multus-ff4rj, Cpu: 100, Mem: 209715200
May 17 17:17:08.317: INFO: Pod for on the node: ovs-5lqcp, Cpu: 200, Mem: 419430400
May 17 17:17:08.317: INFO: Pod for on the node: sdn-plsrf, Cpu: 100, Mem: 209715200
May 17 17:17:08.317: INFO: Node: ip-10-0-131-112.ec2.internal, totalRequestedCpuResource: 640, cpuAllocatableMil: 1500, cpuFraction: 0.4266666666666667
May 17 17:17:08.317: INFO: Node: ip-10-0-131-112.ec2.internal, totalRequestedMemResource: 1300234240, memAllocatableVal: 7729655808, memFraction: 0.16821373063652978
May 17 17:17:08.317: INFO: ComputeCpuMemFraction for node: ip-10-0-145-122.ec2.internal
May 17 17:17:08.431: INFO: Pod for on the node: tuned-8gcc9, Cpu: 100, Mem: 209715200
May 17 17:17:08.431: INFO: Pod for on the node: dns-default-j9qxd, Cpu: 110, Mem: 283115520
May 17 17:17:08.431: INFO: Pod for on the node: node-ca-ncgsr, Cpu: 100, Mem: 209715200
May 17 17:17:08.431: INFO: Pod for on the node: router-default-54d85f8787-jlv84, Cpu: 100, Mem: 268435456
May 17 17:17:08.431: INFO: Pod for on the node: machine-config-daemon-6vpmt, Cpu: 20, Mem: 52428800
May 17 17:17:08.431: INFO: Pod for on the node: alertmanager-main-1, Cpu: 300, Mem: 629145600
May 17 17:17:08.431: INFO: Pod for on the node: grafana-d77564f95-rhtsn, Cpu: 200, Mem: 314572800
May 17 17:17:08.431: INFO: Pod for on the node: node-exporter-hgvd5, Cpu: 110, Mem: 230686720
May 17 17:17:08.431: INFO: Pod for on the node: prometheus-adapter-85476c4bbd-d8ktz, Cpu: 100, Mem: 209715200
May 17 17:17:08.431: INFO: Pod for on the node: prometheus-k8s-1, Cpu: 600, Mem: 1258291200
May 17 17:17:08.431: INFO: Pod for on the node: prometheus-operator-6d494fb54-kknqt, Cpu: 100, Mem: 209715200
May 17 17:17:08.431: INFO: Pod for on the node: multus-v8t4s, Cpu: 100, Mem: 209715200
May 17 17:17:08.431: INFO: Pod for on the node: ovs-hcm8t, Cpu: 200, Mem: 419430400
May 17 17:17:08.431: INFO: Pod for on the node: sdn-cdj5q, Cpu: 100, Mem: 209715200
May 17 17:17:08.431: INFO: Node: ip-10-0-145-122.ec2.internal, totalRequestedCpuResource: 1240, cpuAllocatableMil: 1500, cpuFraction: 0.8266666666666667
May 17 17:17:08.431: INFO: Node: ip-10-0-145-122.ec2.internal, totalRequestedMemResource: 2512388096, memAllocatableVal: 7729664000, memFraction: 0.32503199311121417
```

Again no more cpu resource left:

```
May 17 17:17:08.201: INFO: Node: ip-10-0-128-108.ec2.internal, totalRequestedCpuResource: 1650, cpuAllocatableMil: 1500, cpuFraction: 1

```

Fortunately, there are two more nodes with sufficient cpu so the test would most likely pass.

Comment 4 Jan Chaloupka 2019-10-29 09:33:16 UTC

*** Bug 1765736 has been marked as a duplicate of this bug. ***

Comment 5 Hongkai Liu 2019-11-21 19:17:52 UTC

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-vsphere-upi-serial-4.2/152

Comment 6 Jan Chaloupka 2019-11-22 11:19:03 UTC

Thanks Hongkai Liu for the artifacts.

Actually it failed for a different reason:

```
Nov 21 16:51:22.815: INFO: Waiting for running...
Nov 21 17:01:22.903: INFO: Unexpected error occurred: Error waiting for 1 pods to be running - probably a timeout: Timeout while waiting for pods with labels "startPodsID=2616505a-0c7f-11ea-851d-0a58ac102307" to be running
[AfterEach] [sig-scheduling] SchedulerPriorities [Serial]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:150
```

It's failing in the process of balancing nodes. Not related to the flake.

Comment 8 Jan Chaloupka 2019-12-03 11:38:19 UTC

The test flakes due to insufficient CPU resources. Not much we can do about it right now. Unless we restrict requested cpu resource of core pods currently deployed in vanilla cluster (see https://github.com/openshift/origin/issues/24247). Moving to 4.4 to give more time to address #24247.

Comment 9 Jan Chaloupka 2020-01-14 12:38:12 UTC

Can't find any failing `Pod should avoid nodes that have avoidPod annotation` test under https://search.svc.ci.openshift.org/?search=validates+resource+limits+of+pods+that+are+allowed+to+run&maxAge=336h&context=2&type=all for the last two weeks

Closing for now until the test starts to flake/fail again.

Comment 10 Jan Chaloupka 2020-01-14 12:48:27 UTC

Wrong link, sorry.

https://search.svc.ci.openshift.org/?search=Pod+should+avoid+nodes+that+have+avoidPod+annotation&maxAge=336h&context=2&type=all

- build-log from build release-openshift-ocp-installer-e2e-azure-serial-4.4 #392 9 hours:
  I0114 02:10:11.384975       1 scheduler.go:751] pod e2e-sched-priority-6202/scheduler-priority-avoid-pod-55x27 is bound successfully on node "ci-op-pdmlmqdf-0ba00-lcf4l-worker-westus-54srq", 6 nodes evaluated, 1 nodes were found feasible.

- build-log from build release-openshift-ocp-installer-e2e-azure-fips-serial-4.3 #1 13 days
  I0101 01:37:12.280041       1 scheduler.go:667] pod e2e-sched-priority-1186/scheduler-priority-avoid-pod-r8whc is bound successfully on node "ci-op-9tzd46x9-6e04f-cwsrj-worker-centralus1-64txk", 6 nodes evaluated, 1 nodes were found feasible.

Let's keep it open and revisit in the next 14 days.

Comment 11 Ben Bennett 2020-02-25 20:08:10 UTC

This happened on 4.1 too https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-serial-4.1/1362

Comment 12 Jan Chaloupka 2020-02-26 14:29:06 UTC

Checking the first jobs for the last 6 days:

I0225 10:07:54.409310       1 scheduler.go:751] pod e2e-sched-priority-3913/scheduler-priority-avoid-pod-7szvs is bound successfully on node "ci-op-91hqv97v-0ba00-wwk6q-worker-centralus1-x6xm7", 6 nodes evaluated, 1 nodes were found feasible.
I0225 10:07:54.442035       1 scheduler.go:751] pod e2e-sched-priority-3913/scheduler-priority-avoid-pod-5hql8 is bound successfully on node "ci-op-91hqv97v-0ba00-wwk6q-worker-centralus1-x6xm7", 6 nodes evaluated, 1 nodes were found

I0225 08:06:58.819539       1 scheduler.go:751] pod e2e-sched-priority-364/scheduler-priority-avoid-pod-xjvm2 is bound successfully on node "ci-op-z28ltq2w-0ba00-5tqcn-worker-eastus23-h2kr6", 6 nodes evaluated, 2 nodes were found feasible.
I0225 08:06:58.849374       1 scheduler.go:751] pod e2e-sched-priority-364/scheduler-priority-avoid-pod-46wg9 is bound successfully on node "ci-op-z28ltq2w-0ba00-5tqcn-worker-eastus21-dh52m", 6 nodes evaluated, 1 nodes were found 

I0225 07:38:08.988402       1 scheduler.go:751] pod e2e-sched-priority-2242/scheduler-priority-avoid-pod-gjrqh is bound successfully on node "ci-op-1j8nrg3s-0ba00-7hnmx-worker-centralus1-mhzxn", 6 nodes evaluated, 1 nodes were found feasible.
I0225 07:38:09.018183       1 factory.go:453] Unable to schedule e2e-sched-priority-2242/scheduler-priority-avoid-pod-2985r: no fit: 0/6 nodes are available: 3 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.; waiting
I0225 07:38:09.046326       1 factory.go:453] Unable to schedule e2e-sched-priority-2242/scheduler-priority-avoid-pod-2985r: no fit: 0/6 nodes are available: 3 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.; waiting
I0225 07:39:09.168416       1 factory.go:453] Unable to schedule e2e-sched-priority-2242/scheduler-priority-avoid-pod-2985r: no fit: 0/6 nodes are available: 3 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.; waiting
I0225 07:40:39.169323       1 factory.go:453] Unable to schedule e2e-sched-priority-2242/scheduler-priority-avoid-pod-2985r: no fit: 0/6 nodes are available: 3 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.; waiting
I0225 07:42:09.170004       1 factory.go:453] Unable to schedule e2e-sched-priority-2242/scheduler-priority-avoid-pod-2985r: no fit: 0/6 nodes are available: 3 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.; waiting
I0225 07:43:39.170557       1 factory.go:453] Unable to schedule e2e-sched-priority-2242/scheduler-priority-avoid-pod-2985r: no fit: 0/6 nodes are available: 3 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.; waiting
I0225 07:45:09.171512       1 factory.go:453] Unable to schedule e2e-sched-priority-2242/scheduler-priority-avoid-pod-2985r: no fit: 0/6 nodes are available: 3 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.; waiting
I0225 07:46:39.172203       1 factory.go:453] Unable to schedule e2e-sched-priority-2242/scheduler-priority-avoid-pod-2985r: no fit: 0/6 nodes are available: 3 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.; waiting
I0225 07:48:09.172787       1 factory.go:453] Unable to schedule e2e-sched-priority-2242/scheduler-priority-avoid-pod-2985r: no fit: 0/6 nodes are available: 3 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.; waiting


I0225 05:21:02.838452       1 scheduler.go:751] pod e2e-sched-priority-1402/scheduler-priority-avoid-pod-gbvvn is bound successfully on node "ci-op-53797g79-0ba00-c25tb-worker-centralus1-r5cft", 6 nodes evaluated, 1 nodes were found feasible.
I0225 05:21:02.869585       1 scheduler.go:751] pod e2e-sched-priority-1402/scheduler-priority-avoid-pod-5xn4r is bound successfully on node "ci-op-53797g79-0ba00-c25tb-worker-centralus1-r5cft", 6 nodes evaluated, 1 nodes were found feasible.

I0225 03:23:02.135507       1 scheduler.go:751] pod e2e-sched-priority-6667/scheduler-priority-avoid-pod-gmz6c is bound successfully on node "ci-op-1y6q8dy2-0ba00-9clsl-worker-centralus3-265qr", 6 nodes evaluated, 2 nodes were found feasible.
I0225 03:23:02.174187       1 scheduler.go:751] pod e2e-sched-priority-6667/scheduler-priority-avoid-pod-4b74q is bound successfully on node "ci-op-1y6q8dy2-0ba00-9clsl-worker-centralus1-cvd26", 6 nodes evaluated, 1 nodes were found feasible.

Still the same reason as in the previous cases. Only one node left, priorities are not taken into account.

Comment 13 tflannag 2020-03-10 21:30:09 UTC

Seeing this in 4.4 azure as well: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-serial-4.4/1280

Comment 14 Ben Parees 2020-03-10 22:07:48 UTC

this is the top flake on our azure 4.4 release informing job, raising severity.
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.4-informing#release-openshift-ocp-installer-e2e-azure-serial-4.4&sort-by-flakiness=

Comment 15 Ben Parees 2020-03-10 22:10:35 UTC

this is a top flake on our 4.4 release informing e2e-azure-serial job.

Comment 16 Jan Chaloupka 2020-03-11 12:31:04 UTC

Checking the last failed job https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-serial-4.4/1283 the cause of the failure is exactly the same:

junit.xml logs:
```
STEP: Verify the pods should not scheduled to the node: ci-op-2dw9ts2x-0ba00-gblxl-worker-westus-d5pjv

...
Mar 11 05:22:34.139: INFO: Deleting ReplicationController scheduler-priority-avoid-pod took: 84.031989ms
Mar 11 05:22:34.340: INFO: Terminating ReplicationController scheduler-priority-avoid-pod pods took: 200.390071ms
```

kube-scheduler logs:
```
I0311 05:22:28.321088       1 scheduler.go:751] pod e2e-sched-priority-2647/scheduler-priority-avoid-pod-tf55c is bound successfully on node "ci-op-2dw9ts2x-0ba00-gblxl-worker-westus-d5pjv", 6 nodes evaluated, 1 nodes were found feasible.
I0311 05:22:28.362868       1 scheduler.go:751] pod e2e-sched-priority-2647/scheduler-priority-avoid-pod-8lmcs is bound successfully on node "ci-op-2dw9ts2x-0ba00-gblxl-worker-westus-d5pjv", 6 nodes evaluated, 1 nodes were found 
```

So only a single node with sufficient capacity is left for scheduling.

There's nothing to be done on the scheduling side of the tests. Azure worker nodes just do not have sufficient resources to run all the tests.

Comment 17 Maciej Szulik 2020-03-12 16:29:07 UTC

This is not a release blocker, moving to 4.5

Comment 18 Ben Parees 2020-04-29 01:09:08 UTC

> There's nothing to be done on the scheduling side of the tests. Azure worker nodes just do not have sufficient resources to run all the tests.

please request the azure CI cluster node size/count be increased, disable this test on Azure, or pursue another solution.


(Given this is happening in the "serial" suite, i'm surprised it's a pure resource issue... this should be the only test running at the time it runs).

Comment 19 Jan Chaloupka 2020-04-29 15:39:00 UTC

Based on https://testgrid.k8s.io/redhat-openshift-ocp-release-4.4-informing#release-openshift-ocp-installer-e2e-azure-serial-4.4 "Pod should avoid nodes that have avoidPod annotation" test is flaking very rarely. I recall we merged some PRs reducing resource requests for core components, e.g. scheduler, apiserver, etc.

The same holds for https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-informing#release-openshift-ocp-installer-e2e-azure-serial-4.5

Can't tell the same about https://testgrid.k8s.io/redhat-openshift-ocp-release-4.3-informing#release-openshift-ocp-installer-e2e-azure-serial-4.3. We need to wait until https://github.com/openshift/cluster-kube-scheduler-operator/pull/239 and other PRs from https://bugzilla.redhat.com/show_bug.cgi?id=1822770 gets merged.

Comment 20 Jan Chaloupka 2020-04-29 16:03:42 UTC

Once the PRs in https://bugzilla.redhat.com/show_bug.cgi?id=1822770 merges, 4.3 can be verified as well. Switching to MODIFIED so 4.5 and 4.4 can be verified as target release is set to 4.5.

Comment 23 RamaKasturi 2020-05-05 14:07:01 UTC

Verified the bug in the 4.4 & 4.5 CI and did not see any failures for about 4-5 days, hence moving this to verified.

Comment 25 errata-xmlrpc 2020-07-13 17:11:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Note You need to log in before you can comment on or make changes to this bug.