1731263 – e2e flake: [sig-scheduling] SchedulerPreemption

Bug 1731263 - e2e flake: [sig-scheduling] SchedulerPreemption

Summary: e2e flake: [sig-scheduling] SchedulerPreemption

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-scheduler
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	low
Target Milestone:	---
Target Release:	4.5.0
Assignee:	Mike Dame
QA Contact:	RamaKasturi
Docs Contact:
URL:
Whiteboard:	buildcop
Duplicates (2):	1733005 1751661 (view as bug list)
Depends On:
Blocks:	1751626
TreeView+	depends on / blocked

Reported:	2019-07-18 19:49 UTC by Matthew Staebler
Modified:	2020-07-13 17:11 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-07-13 17:11:03 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift origin pull 23050	None	closed	Bug 1731263: Re-enable preemption test with fixes	2021-02-02 19:27:32 UTC
Github	openshift origin pull 23645	None	closed	Bug 1731263: Re-enable preemption test with fixes	2021-02-02 19:27:32 UTC
Github	openshift origin pull 23758	None	closed	Bug 1731263: Reenable basic pod preemption e2es	2021-02-02 19:27:32 UTC
Github	openshift origin pull 24868	None	closed	Bug 1731263: Disable failing preemption e2es	2021-02-02 19:28:17 UTC
Github	openshift origin pull 24947	None	closed	Bug 1731263: Balance preemption e2e nodes	2021-02-02 19:27:32 UTC
Red Hat Product Errata	RHBA-2020:2409	None	None	None	2020-07-13 17:11:19 UTC

Description Matthew Staebler 2019-07-18 19:49:49 UTC

[sig-scheduling] SchedulerPreemption [Serial] validates lower priority pod preemption by critical pod [Suite:openshift/conformance/serial] [Suite:k8s]

[sig-scheduling] SchedulerPreemption [Serial] validates basic preemption works [Suite:openshift/conformance/serial] [Suite:k8s]

        ErrStatus: {
            TypeMeta: {Kind: "", APIVersion: ""},
            ListMeta: {SelfLink: "", ResourceVersion: "", Continue: ""},
            Status: "Failure",
            Message: "pods \"pod0-sched-preemption-low-priority\" not found",
            Reason: "NotFound",
            Details: {
                Name: "pod0-sched-preemption-low-priority",
                Group: "",
                Kind: "pods",
                UID: "",
                Causes: nil,
                RetryAfterSeconds: 0,
            },
            Code: 404,
        },


See https://bugzilla.redhat.com/show_bug.cgi?id=1700458 for where the test failed previously in 4.1.

Comment 1 Matthew Staebler 2019-07-18 19:52:41 UTC

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.2/197

Comment 3 Mike Dame 2019-07-31 18:24:09 UTC

*** Bug 1733005 has been marked as a duplicate of this bug. ***

Comment 4 ravig 2019-08-13 14:29:58 UTC

Origin PR: https://github.com/openshift/origin/pull/23050

Comment 5 Maciej Szulik 2019-08-21 12:43:45 UTC

Mike another one, please.

Comment 6 Mike Dame 2019-08-21 17:01:53 UTC

Updated PR: https://github.com/openshift/origin/pull/23645

Comment 8 Xingxing Xia 2019-09-05 10:06:00 UTC

Hi, the quickest way to move browse https://testgrid.k8s.io/redhat-openshift-release-blocking#redhat-release-openshift-origin-installer-e2e-aws-serial-4.2&sort-by-flakiness results of runs. But search "SchedulerPreemption" there, only saw the case "SchedulerPreemption [Serial] validates pod anti-affinity works in preemption", no the cases of comment 0:
validates lower priority pod preemption by critical pod
validates basic preemption works

Why? How to check the fix effect then if not to manually run openshift-tests?

Comment 9 Xingxing Xia 2019-09-10 09:25:29 UTC

The tested cases "validates lower priority pod preemption by critical pod" and "validates basic preemption works" originally reported by this bug are seen in 4.1 job https://testgrid.k8s.io/redhat-openshift-release-blocking#redhat-release-openshift-origin-installer-e2e-aws-serial-4.1&sort-by-flakiness testing results.
But are not seen in 4.2 job https://testgrid.k8s.io/redhat-openshift-release-blocking#redhat-release-openshift-origin-installer-e2e-aws-serial-4.2&sort-by-flakiness .
I also checked other platforms' jobs, didn't see the cases shown either.
Not sure why. This is confusing, due to the fix landed in 4.2. Thus assigning back to confirm if the cases are disabled or skipped.

Comment 10 Mike Dame 2019-09-10 15:26:04 UTC

I've opened https://github.com/openshift/origin/pull/23758 to reenable these tests

Comment 12 Maciej Szulik 2019-11-05 11:13:41 UTC

*** Bug 1751661 has been marked as a duplicate of this bug. ***

Comment 13 Mike Dame 2019-11-07 19:21:03 UTC

From what I can tell, and as I commented here: https://github.com/openshift/origin/pull/23758#issuecomment-551215646, these tests should be re-enabled since they are no longer marked flaky. I can't find any runs for them, but I'll switch it to QA hoping you guys can see it. If not we need to dig deeper to find out why it isn't running

Comment 15 Mike Dame 2019-11-07 20:00:18 UTC

Turns out it was disabled here: https://github.com/openshift/origin/commit/55358f73aa477f401445a47620dea7e81b27bce9 I reopened my PR above to remove those lines and see how the test runs now

Comment 17 Mike Dame 2020-04-13 16:13:48 UTC

These tests were re-enabled here: https://github.com/openshift/origin/commit/5f52d7e1d9007956a243c7a84d3a9774f230270e#diff-6ba77494282f6e840f44b01ce97335afL88

But continue to flake heavily in 4.5: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-informing#release-openshift-ocp-installer-e2e-azure-serial-4.5

See discussion upstream in deciding to promote these tests to conformance: https://github.com/kubernetes/kubernetes/pull/80545#issuecomment-558274179, in which changes are made upstream which were previously attempted in origin.

Comment 18 Mike Dame 2020-04-13 16:48:34 UTC

While debugging this I noticed that the actual nodename that's set on the preemptor pod isn't respected, so it could end up on any node (not necessarily the one we expect) (https://github.com/openshift/origin/pull/24867). However, in doing that I get the following error in test runs showing that the preemptor pod fails to run due to lack of CPU. This is very confusing, because lack of resources should be exactly what triggers preemption, correct?

$ openshift-tests run all --dry-run | grep -E "validates basic preemption works" | openshift-tests run -f -
I0413 12:37:37.987097   86868 test_context.go:423] Tolerating taints "node-role.kubernetes.io/master" when considering if nodes are ready
openshift-tests version: v4.1.0+dbe978f-2592-dirty
I0413 12:37:38.898728   86870 test_context.go:423] Tolerating taints "node-role.kubernetes.io/master" when considering if nodes are ready
started: (0/1/1) "[sig-scheduling] SchedulerPreemption [Serial] validates basic preemption works [Suite:openshift/conformance/serial] [Suite:k8s]"

I0413 12:37:39.046067   86871 test_context.go:423] Tolerating taints "node-role.kubernetes.io/master" when considering if nodes are ready
Apr 13 12:37:39.071: INFO: Waiting up to 30m0s for all (but 100) nodes to be schedulable
Apr 13 12:37:39.593: INFO: Waiting up to 10m0s for all pods (need at least 0) in namespace 'kube-system' to be running and ready
Apr 13 12:37:39.914: INFO: 0 / 0 pods in namespace 'kube-system' are running and ready (0 seconds elapsed)
Apr 13 12:37:39.914: INFO: expected 0 pod replicas in namespace 'kube-system', 0 are Running and Ready.
Apr 13 12:37:39.914: INFO: Waiting up to 5m0s for all daemonsets in namespace 'kube-system' to start
Apr 13 12:37:40.025: INFO: e2e test version: v1.18.0-rc.1
Apr 13 12:37:40.126: INFO: kube-apiserver version: v1.18.0-rc.1
Apr 13 12:37:40.233: INFO: Cluster IP family: ipv4
[BeforeEach] [Top Level]
  /Users/mdame/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/framework.go:1346
[BeforeEach] [Top Level]
  /Users/mdame/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/framework.go:1346
[BeforeEach] [Top Level]
  /Users/mdame/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/test.go:58
[BeforeEach] [sig-scheduling] SchedulerPreemption [Serial]
  /Users/mdame/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:178
STEP: Creating a kubernetes client
STEP: Building a namespace api object, basename sched-preemption
Apr 13 12:37:40.550: INFO: About to run a Kube e2e test, ensuring namespace is privileged
Apr 13 12:37:41.626: INFO: No PodSecurityPolicies found; assuming PodSecurityPolicy is disabled.
STEP: Waiting for a default service account to be provisioned in namespace
[BeforeEach] [sig-scheduling] SchedulerPreemption [Serial]
  /Users/mdame/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/scheduling/preemption.go:80
Apr 13 12:37:42.039: INFO: Waiting up to 1m0s for all nodes to be ready
Apr 13 12:38:45.595: INFO: Waiting for terminating namespaces to be deleted...
[It] validates basic preemption works [Suite:openshift/conformance/serial] [Suite:k8s]
  /Users/mdame/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/scheduling/preemption.go:104
STEP: Create pods that use 60% of node resources.
Apr 13 12:38:46.339: INFO: Current cpu and memory usage 990, 3207593984
Apr 13 12:38:46.459: INFO: Created pod: pod0-sched-preemption-low-priority
Apr 13 12:38:46.459: INFO: Current cpu and memory usage 730, 4466933760
Apr 13 12:38:46.568: INFO: Created pod: pod1-sched-preemption-medium-priority
Apr 13 12:38:46.568: INFO: Current cpu and memory usage 965, 4788846592
Apr 13 12:38:46.684: INFO: Created pod: pod2-sched-preemption-medium-priority
STEP: Wait for pods to be scheduled.
STEP: Run a high priority pod that has same requirements as that of lower priority pod
[AfterEach] [sig-scheduling] SchedulerPreemption [Serial]
  /Users/mdame/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:179
STEP: Collecting events from namespace "e2e-sched-preemption-5130".
STEP: Found 16 events.
Apr 13 12:38:51.546: INFO: At 2020-04-13 12:38:48 -0400 EDT - event for pod0-sched-preemption-low-priority: {kubelet ip-10-0-138-254.us-west-2.compute.internal} Pulling: Pulling image "k8s.gcr.io/pause:3.2"
Apr 13 12:38:51.546: INFO: At 2020-04-13 12:38:48 -0400 EDT - event for pod0-sched-preemption-low-priority: {multus } AddedInterface: Add eth0 [10.131.0.19/23]
Apr 13 12:38:51.546: INFO: At 2020-04-13 12:38:48 -0400 EDT - event for pod1-sched-preemption-medium-priority: {kubelet ip-10-0-140-91.us-west-2.compute.internal} Pulling: Pulling image "k8s.gcr.io/pause:3.2"
Apr 13 12:38:51.546: INFO: At 2020-04-13 12:38:48 -0400 EDT - event for pod1-sched-preemption-medium-priority: {multus } AddedInterface: Add eth0 [10.129.2.14/23]
Apr 13 12:38:51.546: INFO: At 2020-04-13 12:38:48 -0400 EDT - event for pod2-sched-preemption-medium-priority: {kubelet ip-10-0-153-113.us-west-2.compute.internal} Pulling: Pulling image "k8s.gcr.io/pause:3.2"
Apr 13 12:38:51.546: INFO: At 2020-04-13 12:38:48 -0400 EDT - event for pod2-sched-preemption-medium-priority: {multus } AddedInterface: Add eth0 [10.128.2.20/23]
Apr 13 12:38:51.546: INFO: At 2020-04-13 12:38:49 -0400 EDT - event for pod0-sched-preemption-low-priority: {kubelet ip-10-0-138-254.us-west-2.compute.internal} Started: Started container pod0-sched-preemption-low-priority
Apr 13 12:38:51.546: INFO: At 2020-04-13 12:38:49 -0400 EDT - event for pod0-sched-preemption-low-priority: {kubelet ip-10-0-138-254.us-west-2.compute.internal} Pulled: Successfully pulled image "k8s.gcr.io/pause:3.2"
Apr 13 12:38:51.546: INFO: At 2020-04-13 12:38:49 -0400 EDT - event for pod0-sched-preemption-low-priority: {kubelet ip-10-0-138-254.us-west-2.compute.internal} Created: Created container pod0-sched-preemption-low-priority
Apr 13 12:38:51.546: INFO: At 2020-04-13 12:38:49 -0400 EDT - event for pod1-sched-preemption-medium-priority: {kubelet ip-10-0-140-91.us-west-2.compute.internal} Pulled: Successfully pulled image "k8s.gcr.io/pause:3.2"
Apr 13 12:38:51.546: INFO: At 2020-04-13 12:38:50 -0400 EDT - event for pod1-sched-preemption-medium-priority: {kubelet ip-10-0-140-91.us-west-2.compute.internal} Created: Created container pod1-sched-preemption-medium-priority
Apr 13 12:38:51.546: INFO: At 2020-04-13 12:38:50 -0400 EDT - event for pod1-sched-preemption-medium-priority: {kubelet ip-10-0-140-91.us-west-2.compute.internal} Started: Started container pod1-sched-preemption-medium-priority
Apr 13 12:38:51.546: INFO: At 2020-04-13 12:38:50 -0400 EDT - event for pod2-sched-preemption-medium-priority: {kubelet ip-10-0-153-113.us-west-2.compute.internal} Pulled: Successfully pulled image "k8s.gcr.io/pause:3.2"
Apr 13 12:38:51.546: INFO: At 2020-04-13 12:38:50 -0400 EDT - event for pod2-sched-preemption-medium-priority: {kubelet ip-10-0-153-113.us-west-2.compute.internal} Created: Created container pod2-sched-preemption-medium-priority
Apr 13 12:38:51.546: INFO: At 2020-04-13 12:38:50 -0400 EDT - event for pod2-sched-preemption-medium-priority: {kubelet ip-10-0-153-113.us-west-2.compute.internal} Started: Started container pod2-sched-preemption-medium-priority
Apr 13 12:38:51.546: INFO: At 2020-04-13 12:38:51 -0400 EDT - event for preemptor-pod: {kubelet ip-10-0-138-254.us-west-2.compute.internal} OutOfcpu: Node didn't have enough resource: cpu, requested: 1506, used: 2096, capacity: 3500
Apr 13 12:38:51.650: INFO: POD                                    NODE                                        PHASE    GRACE  CONDITIONS
Apr 13 12:38:51.650: INFO: pod0-sched-preemption-low-priority     ip-10-0-138-254.us-west-2.compute.internal  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2020-04-13 12:38:46 -0400 EDT  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2020-04-13 12:38:50 -0400 EDT  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2020-04-13 12:38:50 -0400 EDT  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2020-04-13 12:38:46 -0400 EDT  }]
Apr 13 12:38:51.650: INFO: pod1-sched-preemption-medium-priority  ip-10-0-140-91.us-west-2.compute.internal   Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2020-04-13 12:38:46 -0400 EDT  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2020-04-13 12:38:50 -0400 EDT  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2020-04-13 12:38:50 -0400 EDT  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2020-04-13 12:38:46 -0400 EDT  }]
Apr 13 12:38:51.650: INFO: pod2-sched-preemption-medium-priority  ip-10-0-153-113.us-west-2.compute.internal  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2020-04-13 12:38:46 -0400 EDT  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2020-04-13 12:38:50 -0400 EDT  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2020-04-13 12:38:50 -0400 EDT  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2020-04-13 12:38:46 -0400 EDT  }]
Apr 13 12:38:51.650: INFO: preemptor-pod                          ip-10-0-138-254.us-west-2.compute.internal  Failed          []
Apr 13 12:38:51.651: INFO: 
Apr 13 12:38:51.765: INFO: pod0-sched-preemption-low-priority[e2e-sched-preemption-5130].container[pod0-sched-preemption-low-priority].log

Apr 13 12:38:51.878: INFO: pod1-sched-preemption-medium-priority[e2e-sched-preemption-5130].container[pod1-sched-preemption-medium-priority].log

Apr 13 12:38:51.995: INFO: pod2-sched-preemption-medium-priority[e2e-sched-preemption-5130].container[pod2-sched-preemption-medium-priority].log

Apr 13 12:38:52.103: INFO: unable to fetch logs for pods: preemptor-pod[e2e-sched-preemption-5130].container[preemptor-pod].error=the server rejected our request for an unknown reason (get pods preemptor-pod)
Apr 13 12:38:52.306: INFO: skipping dumping cluster info - cluster too large
Apr 13 12:38:52.306: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready
STEP: Destroying namespace "e2e-sched-preemption-5130" for this suite.
[AfterEach] [sig-scheduling] SchedulerPreemption [Serial]
  /Users/mdame/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/scheduling/preemption.go:74
Apr 13 12:38:52.961: INFO: Running AfterSuite actions on all nodes
Apr 13 12:38:52.961: INFO: Running AfterSuite actions on node 1
fail [k8s.io/kubernetes/test/e2e/scheduling/predicates.go:869]: Unexpected error:
    <*errors.errorString | 0xc00037ee00>: {
        s: "pod ran to completion",
    }
    pod ran to completion
occurred

failed: (1m14s) 2020-04-13T16:38:52 "[sig-scheduling] SchedulerPreemption [Serial] validates basic preemption works [Suite:openshift/conformance/serial] [Suite:k8s]"


Timeline:

Apr 13 12:37:38.922 - 74s   I test="[sig-scheduling] SchedulerPreemption [Serial] validates basic preemption works [Suite:openshift/conformance/serial] [Suite:k8s]" running
Apr 13 16:38:46.463 I ns/e2e-sched-preemption-5130 pod/pod0-sched-preemption-low-priority node/ip-10-0-138-254.us-west-2.compute.internal reason/Created
Apr 13 16:38:46.568 I ns/e2e-sched-preemption-5130 pod/pod1-sched-preemption-medium-priority node/ip-10-0-140-91.us-west-2.compute.internal reason/Created
Apr 13 16:38:46.684 I ns/e2e-sched-preemption-5130 pod/pod2-sched-preemption-medium-priority node/ip-10-0-153-113.us-west-2.compute.internal reason/Created
Apr 13 16:38:48.316 I ns/e2e-sched-preemption-5130 pod/pod0-sched-preemption-low-priority reason/AddedInterface Add eth0 [10.131.0.19/23]
Apr 13 16:38:48.325 I ns/e2e-sched-preemption-5130 pod/pod0-sched-preemption-low-priority node/ip-10-0-138-254.us-west-2.compute.internal container/pod0-sched-preemption-low-priority reason/Pulling image/k8s.gcr.io/pause:3.2
Apr 13 16:38:48.409 I ns/e2e-sched-preemption-5130 pod/pod1-sched-preemption-medium-priority reason/AddedInterface Add eth0 [10.129.2.14/23]
Apr 13 16:38:48.425 I ns/e2e-sched-preemption-5130 pod/pod1-sched-preemption-medium-priority node/ip-10-0-140-91.us-west-2.compute.internal container/pod1-sched-preemption-medium-priority reason/Pulling image/k8s.gcr.io/pause:3.2
Apr 13 16:38:48.832 I ns/e2e-sched-preemption-5130 pod/pod2-sched-preemption-medium-priority reason/AddedInterface Add eth0 [10.128.2.20/23]
Apr 13 16:38:48.845 I ns/e2e-sched-preemption-5130 pod/pod2-sched-preemption-medium-priority node/ip-10-0-153-113.us-west-2.compute.internal container/pod2-sched-preemption-medium-priority reason/Pulling image/k8s.gcr.io/pause:3.2
Apr 13 16:38:49.763 I ns/e2e-sched-preemption-5130 pod/pod0-sched-preemption-low-priority node/ip-10-0-138-254.us-west-2.compute.internal container/pod0-sched-preemption-low-priority reason/Pulled image/k8s.gcr.io/pause:3.2
Apr 13 16:38:49.896 I ns/e2e-sched-preemption-5130 pod/pod1-sched-preemption-medium-priority node/ip-10-0-140-91.us-west-2.compute.internal container/pod1-sched-preemption-medium-priority reason/Pulled image/k8s.gcr.io/pause:3.2
Apr 13 16:38:49.958 I ns/e2e-sched-preemption-5130 pod/pod0-sched-preemption-low-priority node/ip-10-0-138-254.us-west-2.compute.internal container/pod0-sched-preemption-low-priority reason/Created
Apr 13 16:38:49.979 I ns/e2e-sched-preemption-5130 pod/pod0-sched-preemption-low-priority node/ip-10-0-138-254.us-west-2.compute.internal container/pod0-sched-preemption-low-priority reason/Started
Apr 13 16:38:50.052 I ns/e2e-sched-preemption-5130 pod/pod1-sched-preemption-medium-priority node/ip-10-0-140-91.us-west-2.compute.internal container/pod1-sched-preemption-medium-priority reason/Created
Apr 13 16:38:50.073 I ns/e2e-sched-preemption-5130 pod/pod1-sched-preemption-medium-priority node/ip-10-0-140-91.us-west-2.compute.internal container/pod1-sched-preemption-medium-priority reason/Started
Apr 13 16:38:50.285 I ns/e2e-sched-preemption-5130 pod/pod2-sched-preemption-medium-priority node/ip-10-0-153-113.us-west-2.compute.internal container/pod2-sched-preemption-medium-priority reason/Pulled image/k8s.gcr.io/pause:3.2
Apr 13 16:38:50.435 I ns/e2e-sched-preemption-5130 pod/pod2-sched-preemption-medium-priority node/ip-10-0-153-113.us-west-2.compute.internal container/pod2-sched-preemption-medium-priority reason/Created
Apr 13 16:38:50.460 I ns/e2e-sched-preemption-5130 pod/pod2-sched-preemption-medium-priority node/ip-10-0-153-113.us-west-2.compute.internal container/pod2-sched-preemption-medium-priority reason/Started
Apr 13 16:38:50.480 I ns/e2e-sched-preemption-5130 pod/pod0-sched-preemption-low-priority node/ip-10-0-138-254.us-west-2.compute.internal container/pod0-sched-preemption-low-priority reason/Ready
Apr 13 16:38:50.778 I ns/e2e-sched-preemption-5130 pod/pod1-sched-preemption-medium-priority node/ip-10-0-140-91.us-west-2.compute.internal container/pod1-sched-preemption-medium-priority reason/Ready
Apr 13 16:38:50.963 I ns/e2e-sched-preemption-5130 pod/pod2-sched-preemption-medium-priority node/ip-10-0-153-113.us-west-2.compute.internal container/pod2-sched-preemption-medium-priority reason/Ready
Apr 13 16:38:51.229 I ns/e2e-sched-preemption-5130 pod/preemptor-pod node/ip-10-0-138-254.us-west-2.compute.internal reason/Created
Apr 13 16:38:51.239 W ns/e2e-sched-preemption-5130 pod/preemptor-pod node/ip-10-0-138-254.us-west-2.compute.internal reason/OutOfcpu Node didn't have enough resource: cpu, requested: 1506, used: 2096, capacity: 3500
Apr 13 16:38:51.243 E ns/e2e-sched-preemption-5130 pod/preemptor-pod node/ip-10-0-138-254.us-west-2.compute.internal reason/Failed (OutOfcpu): Pod Node didn't have enough resource: cpu, requested: 1506, used: 2096, capacity: 3500
Apr 13 12:38:52.969 I test="[sig-scheduling] SchedulerPreemption [Serial] validates basic preemption works [Suite:openshift/conformance/serial] [Suite:k8s]" failed

Failing tests:

[sig-scheduling] SchedulerPreemption [Serial] validates basic preemption works [Suite:openshift/conformance/serial] [Suite:k8s]

error: 1 fail, 0 pass, 0 skip (1m14s)

Comment 19 Mike Dame 2020-04-13 17:37:24 UTC

I've opened this PR to disable these tests again: https://github.com/openshift/origin/pull/24868

Comment 23 RamaKasturi 2020-05-04 13:23:48 UTC

Tried to run the tests as said in the description, but i see that the tests have failed. @mike, can you please take a look and let me know if iam missing something here ? thanks

[ramakasturinarra@dhcp35-60 origin]$ ./_output/local/bin/linux/amd64/openshift-tests run-test "[sig-scheduling] SchedulerPreemption [Serial] validates lower priority pod preemption by critical pod [Disabled:Broken] [Suite:k8s]"
I0504 18:46:42.523083   20020 test_context.go:423] Tolerating taints "node-role.kubernetes.io/master" when considering if nodes are ready
May  4 18:46:42.583: INFO: Waiting up to 30m0s for all (but 100) nodes to be schedulable
May  4 18:46:42.923: INFO: Waiting up to 10m0s for all pods (need at least 0) in namespace 'kube-system' to be running and ready
May  4 18:46:43.843: INFO: 0 / 0 pods in namespace 'kube-system' are running and ready (0 seconds elapsed)
May  4 18:46:43.843: INFO: expected 0 pod replicas in namespace 'kube-system', 0 are Running and Ready.
May  4 18:46:43.843: INFO: Waiting up to 5m0s for all daemonsets in namespace 'kube-system' to start
May  4 18:46:44.157: INFO: e2e test version: v1.18.0-rc.1
May  4 18:46:44.452: INFO: kube-apiserver version: v1.18.0-rc.1
May  4 18:46:44.681: INFO: Cluster IP family: ipv4
[BeforeEach] [Top Level]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/framework.go:1413
[BeforeEach] [Top Level]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/framework.go:1413
[BeforeEach] [Top Level]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/test.go:58
[BeforeEach] [sig-scheduling] SchedulerPreemption [Serial]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:178
STEP: Creating a kubernetes client
STEP: Building a namespace api object, basename sched-preemption
May  4 18:46:45.478: INFO: About to run a Kube e2e test, ensuring namespace is privileged
May  4 18:46:48.073: INFO: No PodSecurityPolicies found; assuming PodSecurityPolicy is disabled.
STEP: Waiting for a default service account to be provisioned in namespace
[BeforeEach] [sig-scheduling] SchedulerPreemption [Serial]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/scheduling/preemption.go:80
May  4 18:46:49.053: INFO: Waiting up to 1m0s for all nodes to be ready
May  4 18:47:51.477: INFO: Waiting for terminating namespaces to be deleted...
[It] validates lower priority pod preemption by critical pod [Disabled:Broken] [Suite:k8s]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/scheduling/preemption.go:184
STEP: Create pods that use 60% of node resources.
May  4 18:47:52.034: INFO: Current cpu usage and memory usage is 739, 4147118080
May  4 18:47:52.285: INFO: Created pod: pod0-sched-preemption-low-priority
May  4 18:47:52.285: INFO: Current cpu usage and memory usage is 930, 4303355904
May  4 18:47:52.519: INFO: Created pod: pod1-sched-preemption-medium-priority
May  4 18:47:52.519: INFO: Current cpu usage and memory usage is 642, 4243587072
May  4 18:47:52.754: INFO: Created pod: pod2-sched-preemption-medium-priority
STEP: Wait for pods to be scheduled.
STEP: Run a critical pod that use same resources as that of a lower priority pod
[AfterEach] [sig-scheduling] SchedulerPreemption [Serial]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:179
STEP: Collecting events from namespace "e2e-sched-preemption-9350".
STEP: Found 18 events.
May  4 18:48:06.329: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for pod0-sched-preemption-low-priority: {default-scheduler } Scheduled: Successfully assigned e2e-sched-preemption-9350/pod0-sched-preemption-low-priority to ip-10-0-138-68.us-east-2.compute.internal
May  4 18:48:06.329: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for pod1-sched-preemption-medium-priority: {default-scheduler } Scheduled: Successfully assigned e2e-sched-preemption-9350/pod1-sched-preemption-medium-priority to ip-10-0-144-59.us-east-2.compute.internal
May  4 18:48:06.329: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for pod2-sched-preemption-medium-priority: {default-scheduler } Scheduled: Successfully assigned e2e-sched-preemption-9350/pod2-sched-preemption-medium-priority to ip-10-0-175-160.us-east-2.compute.internal
May  4 18:48:06.329: INFO: At 2020-05-04 18:47:54 +0530 IST - event for pod0-sched-preemption-low-priority: {multus } AddedInterface: Add eth0 [10.128.2.12/23]
May  4 18:48:06.329: INFO: At 2020-05-04 18:47:54 +0530 IST - event for pod0-sched-preemption-low-priority: {kubelet ip-10-0-138-68.us-east-2.compute.internal} Pulling: Pulling image "k8s.gcr.io/pause:3.2"
May  4 18:48:06.329: INFO: At 2020-05-04 18:47:54 +0530 IST - event for pod1-sched-preemption-medium-priority: {multus } AddedInterface: Add eth0 [10.131.0.26/23]
May  4 18:48:06.329: INFO: At 2020-05-04 18:47:54 +0530 IST - event for pod1-sched-preemption-medium-priority: {kubelet ip-10-0-144-59.us-east-2.compute.internal} Pulling: Pulling image "k8s.gcr.io/pause:3.2"
May  4 18:48:06.329: INFO: At 2020-05-04 18:47:54 +0530 IST - event for pod2-sched-preemption-medium-priority: {kubelet ip-10-0-175-160.us-east-2.compute.internal} Pulling: Pulling image "k8s.gcr.io/pause:3.2"
May  4 18:48:06.329: INFO: At 2020-05-04 18:47:54 +0530 IST - event for pod2-sched-preemption-medium-priority: {multus } AddedInterface: Add eth0 [10.129.2.12/23]
May  4 18:48:06.329: INFO: At 2020-05-04 18:47:56 +0530 IST - event for pod0-sched-preemption-low-priority: {kubelet ip-10-0-138-68.us-east-2.compute.internal} Pulled: Successfully pulled image "k8s.gcr.io/pause:3.2"
May  4 18:48:06.329: INFO: At 2020-05-04 18:47:56 +0530 IST - event for pod0-sched-preemption-low-priority: {kubelet ip-10-0-138-68.us-east-2.compute.internal} Created: Created container pod0-sched-preemption-low-priority
May  4 18:48:06.329: INFO: At 2020-05-04 18:47:56 +0530 IST - event for pod0-sched-preemption-low-priority: {kubelet ip-10-0-138-68.us-east-2.compute.internal} Started: Started container pod0-sched-preemption-low-priority
May  4 18:48:06.329: INFO: At 2020-05-04 18:47:56 +0530 IST - event for pod1-sched-preemption-medium-priority: {kubelet ip-10-0-144-59.us-east-2.compute.internal} Pulled: Successfully pulled image "k8s.gcr.io/pause:3.2"
May  4 18:48:06.329: INFO: At 2020-05-04 18:47:56 +0530 IST - event for pod1-sched-preemption-medium-priority: {kubelet ip-10-0-144-59.us-east-2.compute.internal} Started: Started container pod1-sched-preemption-medium-priority
May  4 18:48:06.329: INFO: At 2020-05-04 18:47:56 +0530 IST - event for pod1-sched-preemption-medium-priority: {kubelet ip-10-0-144-59.us-east-2.compute.internal} Created: Created container pod1-sched-preemption-medium-priority
May  4 18:48:06.329: INFO: At 2020-05-04 18:47:56 +0530 IST - event for pod2-sched-preemption-medium-priority: {kubelet ip-10-0-175-160.us-east-2.compute.internal} Pulled: Successfully pulled image "k8s.gcr.io/pause:3.2"
May  4 18:48:06.329: INFO: At 2020-05-04 18:47:56 +0530 IST - event for pod2-sched-preemption-medium-priority: {kubelet ip-10-0-175-160.us-east-2.compute.internal} Created: Created container pod2-sched-preemption-medium-priority
May  4 18:48:06.329: INFO: At 2020-05-04 18:47:56 +0530 IST - event for pod2-sched-preemption-medium-priority: {kubelet ip-10-0-175-160.us-east-2.compute.internal} Started: Started container pod2-sched-preemption-medium-priority
May  4 18:48:06.554: INFO: POD                                    NODE                                        PHASE    GRACE  CONDITIONS
May  4 18:48:06.555: INFO: pod0-sched-preemption-low-priority     ip-10-0-138-68.us-east-2.compute.internal   Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:47:52 +0530 IST  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:47:57 +0530 IST  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:47:57 +0530 IST  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:47:52 +0530 IST  }]
May  4 18:48:06.555: INFO: pod1-sched-preemption-medium-priority  ip-10-0-144-59.us-east-2.compute.internal   Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:47:52 +0530 IST  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:47:57 +0530 IST  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:47:57 +0530 IST  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:47:52 +0530 IST  }]
May  4 18:48:06.555: INFO: pod2-sched-preemption-medium-priority  ip-10-0-175-160.us-east-2.compute.internal  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:47:52 +0530 IST  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:47:57 +0530 IST  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:47:57 +0530 IST  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:47:52 +0530 IST  }]
May  4 18:48:06.555: INFO: 
May  4 18:48:06.797: INFO: pod0-sched-preemption-low-priority[e2e-sched-preemption-9350].container[pod0-sched-preemption-low-priority].log

May  4 18:48:07.089: INFO: pod1-sched-preemption-medium-priority[e2e-sched-preemption-9350].container[pod1-sched-preemption-medium-priority].log

May  4 18:48:07.345: INFO: pod2-sched-preemption-medium-priority[e2e-sched-preemption-9350].container[pod2-sched-preemption-medium-priority].log

May  4 18:48:07.810: INFO: skipping dumping cluster info - cluster too large
May  4 18:48:07.810: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready
STEP: Destroying namespace "e2e-sched-preemption-9350" for this suite.
[AfterEach] [sig-scheduling] SchedulerPreemption [Serial]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/scheduling/preemption.go:74
May  4 18:48:09.146: INFO: Running AfterSuite actions on all nodes
May  4 18:48:09.146: INFO: Running AfterSuite actions on node 1
fail [k8s.io/kubernetes/test/e2e/scheduling/preemption.go:272]: Expected
    <bool>: false
to equal
    <bool>: true


[ramakasturinarra@dhcp35-60 origin]$ ./_output/local/bin/linux/amd64/openshift-tests run-test "[sig-scheduling] SchedulerPreemption [Serial] validates basic preemption works [Disabled:Broken] [Suite:k8s]"
I0504 18:50:58.411324   20587 test_context.go:423] Tolerating taints "node-role.kubernetes.io/master" when considering if nodes are ready
May  4 18:50:58.457: INFO: Waiting up to 30m0s for all (but 100) nodes to be schedulable
May  4 18:50:58.721: INFO: Waiting up to 10m0s for all pods (need at least 0) in namespace 'kube-system' to be running and ready
May  4 18:50:59.644: INFO: 0 / 0 pods in namespace 'kube-system' are running and ready (0 seconds elapsed)
May  4 18:50:59.644: INFO: expected 0 pod replicas in namespace 'kube-system', 0 are Running and Ready.
May  4 18:50:59.644: INFO: Waiting up to 5m0s for all daemonsets in namespace 'kube-system' to start
May  4 18:50:59.953: INFO: e2e test version: v1.18.0-rc.1
May  4 18:51:00.250: INFO: kube-apiserver version: v1.18.0-rc.1
May  4 18:51:00.561: INFO: Cluster IP family: ipv4
[BeforeEach] [Top Level]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/framework.go:1413
[BeforeEach] [Top Level]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/framework.go:1413
[BeforeEach] [Top Level]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/test.go:58
[BeforeEach] [sig-scheduling] SchedulerPreemption [Serial]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:178
STEP: Creating a kubernetes client
STEP: Building a namespace api object, basename sched-preemption
May  4 18:51:01.409: INFO: About to run a Kube e2e test, ensuring namespace is privileged
May  4 18:51:04.150: INFO: No PodSecurityPolicies found; assuming PodSecurityPolicy is disabled.
STEP: Waiting for a default service account to be provisioned in namespace
[BeforeEach] [sig-scheduling] SchedulerPreemption [Serial]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/scheduling/preemption.go:80
May  4 18:51:05.200: INFO: Waiting up to 1m0s for all nodes to be ready
May  4 18:52:08.156: INFO: Waiting for terminating namespaces to be deleted...
[It] validates basic preemption works [Disabled:Broken] [Suite:k8s]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/scheduling/preemption.go:104
STEP: Create pods that use 60% of node resources.
May  4 18:52:08.772: INFO: Current cpu and memory usage 739, 4147118080
May  4 18:52:09.027: INFO: Created pod: pod0-sched-preemption-low-priority
May  4 18:52:09.027: INFO: Current cpu and memory usage 930, 4303355904
May  4 18:52:09.370: INFO: Created pod: pod1-sched-preemption-medium-priority
May  4 18:52:09.370: INFO: Current cpu and memory usage 642, 4243587072
May  4 18:52:09.677: INFO: Created pod: pod2-sched-preemption-medium-priority
STEP: Wait for pods to be scheduled.
STEP: Run a high priority pod that has same requirements as that of lower priority pod
[AfterEach] [sig-scheduling] SchedulerPreemption [Serial]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:179
STEP: Collecting events from namespace "e2e-sched-preemption-7893".
STEP: Found 20 events.
May  4 18:52:21.201: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for pod0-sched-preemption-low-priority: {default-scheduler } Scheduled: Successfully assigned e2e-sched-preemption-7893/pod0-sched-preemption-low-priority to ip-10-0-138-68.us-east-2.compute.internal
May  4 18:52:21.201: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for pod1-sched-preemption-medium-priority: {default-scheduler } Scheduled: Successfully assigned e2e-sched-preemption-7893/pod1-sched-preemption-medium-priority to ip-10-0-144-59.us-east-2.compute.internal
May  4 18:52:21.201: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for pod2-sched-preemption-medium-priority: {default-scheduler } Scheduled: Successfully assigned e2e-sched-preemption-7893/pod2-sched-preemption-medium-priority to ip-10-0-175-160.us-east-2.compute.internal
May  4 18:52:21.201: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for preemptor-pod: {default-scheduler } Scheduled: Successfully assigned e2e-sched-preemption-7893/preemptor-pod to ip-10-0-138-68.us-east-2.compute.internal
May  4 18:52:21.201: INFO: At 2020-05-04 18:52:11 +0530 IST - event for pod0-sched-preemption-low-priority: {kubelet ip-10-0-138-68.us-east-2.compute.internal} Started: Started container pod0-sched-preemption-low-priority
May  4 18:52:21.201: INFO: At 2020-05-04 18:52:11 +0530 IST - event for pod0-sched-preemption-low-priority: {kubelet ip-10-0-138-68.us-east-2.compute.internal} Pulled: Container image "k8s.gcr.io/pause:3.2" already present on machine
May  4 18:52:21.201: INFO: At 2020-05-04 18:52:11 +0530 IST - event for pod0-sched-preemption-low-priority: {multus } AddedInterface: Add eth0 [10.128.2.14/23]
May  4 18:52:21.201: INFO: At 2020-05-04 18:52:11 +0530 IST - event for pod0-sched-preemption-low-priority: {kubelet ip-10-0-138-68.us-east-2.compute.internal} Created: Created container pod0-sched-preemption-low-priority
May  4 18:52:21.201: INFO: At 2020-05-04 18:52:11 +0530 IST - event for pod1-sched-preemption-medium-priority: {multus } AddedInterface: Add eth0 [10.131.0.27/23]
May  4 18:52:21.201: INFO: At 2020-05-04 18:52:11 +0530 IST - event for pod1-sched-preemption-medium-priority: {kubelet ip-10-0-144-59.us-east-2.compute.internal} Pulled: Container image "k8s.gcr.io/pause:3.2" already present on machine
May  4 18:52:21.201: INFO: At 2020-05-04 18:52:11 +0530 IST - event for pod1-sched-preemption-medium-priority: {kubelet ip-10-0-144-59.us-east-2.compute.internal} Created: Created container pod1-sched-preemption-medium-priority
May  4 18:52:21.201: INFO: At 2020-05-04 18:52:11 +0530 IST - event for pod1-sched-preemption-medium-priority: {kubelet ip-10-0-144-59.us-east-2.compute.internal} Started: Started container pod1-sched-preemption-medium-priority
May  4 18:52:21.201: INFO: At 2020-05-04 18:52:11 +0530 IST - event for pod2-sched-preemption-medium-priority: {kubelet ip-10-0-175-160.us-east-2.compute.internal} Pulled: Container image "k8s.gcr.io/pause:3.2" already present on machine
May  4 18:52:21.201: INFO: At 2020-05-04 18:52:11 +0530 IST - event for pod2-sched-preemption-medium-priority: {multus } AddedInterface: Add eth0 [10.129.2.13/23]
May  4 18:52:21.201: INFO: At 2020-05-04 18:52:12 +0530 IST - event for pod2-sched-preemption-medium-priority: {kubelet ip-10-0-175-160.us-east-2.compute.internal} Started: Started container pod2-sched-preemption-medium-priority
May  4 18:52:21.201: INFO: At 2020-05-04 18:52:12 +0530 IST - event for pod2-sched-preemption-medium-priority: {kubelet ip-10-0-175-160.us-east-2.compute.internal} Created: Created container pod2-sched-preemption-medium-priority
May  4 18:52:21.201: INFO: At 2020-05-04 18:52:16 +0530 IST - event for preemptor-pod: {multus } AddedInterface: Add eth0 [10.128.2.15/23]
May  4 18:52:21.201: INFO: At 2020-05-04 18:52:16 +0530 IST - event for preemptor-pod: {kubelet ip-10-0-138-68.us-east-2.compute.internal} Pulled: Container image "k8s.gcr.io/pause:3.2" already present on machine
May  4 18:52:21.201: INFO: At 2020-05-04 18:52:16 +0530 IST - event for preemptor-pod: {kubelet ip-10-0-138-68.us-east-2.compute.internal} Created: Created container preemptor-pod
May  4 18:52:21.201: INFO: At 2020-05-04 18:52:17 +0530 IST - event for preemptor-pod: {kubelet ip-10-0-138-68.us-east-2.compute.internal} Started: Started container preemptor-pod
May  4 18:52:21.454: INFO: POD                                    NODE                                        PHASE    GRACE  CONDITIONS
May  4 18:52:21.454: INFO: pod0-sched-preemption-low-priority     ip-10-0-138-68.us-east-2.compute.internal   Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:52:08 +0530 IST  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:52:11 +0530 IST  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:52:11 +0530 IST  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:52:08 +0530 IST  }]
May  4 18:52:21.454: INFO: pod1-sched-preemption-medium-priority  ip-10-0-144-59.us-east-2.compute.internal   Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:52:09 +0530 IST  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:52:12 +0530 IST  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:52:12 +0530 IST  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:52:09 +0530 IST  }]
May  4 18:52:21.454: INFO: pod2-sched-preemption-medium-priority  ip-10-0-175-160.us-east-2.compute.internal  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:52:09 +0530 IST  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:52:13 +0530 IST  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:52:13 +0530 IST  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:52:09 +0530 IST  }]
May  4 18:52:21.454: INFO: preemptor-pod                          ip-10-0-138-68.us-east-2.compute.internal   Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:52:15 +0530 IST  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:52:17 +0530 IST  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:52:17 +0530 IST  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2020-05-04 18:52:15 +0530 IST  }]
May  4 18:52:21.454: INFO: 
May  4 18:52:21.701: INFO: pod0-sched-preemption-low-priority[e2e-sched-preemption-7893].container[pod0-sched-preemption-low-priority].log

May  4 18:52:21.948: INFO: pod1-sched-preemption-medium-priority[e2e-sched-preemption-7893].container[pod1-sched-preemption-medium-priority].log

May  4 18:52:22.197: INFO: pod2-sched-preemption-medium-priority[e2e-sched-preemption-7893].container[pod2-sched-preemption-medium-priority].log

May  4 18:52:22.476: INFO: preemptor-pod[e2e-sched-preemption-7893].container[preemptor-pod].log

May  4 18:52:23.047: INFO: skipping dumping cluster info - cluster too large
May  4 18:52:23.047: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready
STEP: Destroying namespace "e2e-sched-preemption-7893" for this suite.
[AfterEach] [sig-scheduling] SchedulerPreemption [Serial]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/scheduling/preemption.go:74
May  4 18:52:24.619: INFO: Running AfterSuite actions on all nodes
May  4 18:52:24.619: INFO: Running AfterSuite actions on node 1
fail [k8s.io/kubernetes/test/e2e/scheduling/preemption.go:178]: Expected
    <bool>: false
to equal
    <bool>: true

Comment 24 Mike Dame 2020-05-04 20:13:34 UTC

By enabling scheduler tracing I can see that our current calculated resource usage for these tests, which should eliminate every worker node during the Filter plugin scheduling phase, does not work as expected and not every node is eliminated for not having enough resources available.

This is because we assume we can take 60% of the available space on the node and fill that with a victim pod. We then take the lowest priority pod's resources and make that the request of the preemptor-pod.

This approach works to eliminate the lowest priority pod's node, but for preemption to take place we need every node to be eliminated. And in cases of significant differences of resource usage, the preemptor pod may still be able to fit on other nodes.

To address that I've opened https://github.com/openshift/origin/pull/24947 (and a fix upstream as well)

Comment 25 RamaKasturi 2020-05-05 06:51:54 UTC

Based on comment 24, moving the bug back to assigned state.

@mike, can you move this to POST state ?

Comment 26 Mike Dame 2020-05-12 12:58:01 UTC

#24947 has merged and has hopefully addressed the flakes, note that we need to confirm these 2 tests are now passing in CI (not just local runs)

Moving to Modified since the github bot didn't apparently

Comment 28 RamaKasturi 2020-05-14 14:56:04 UTC

Verified in both CI and running it locally, both the places i see it works.

[ramakasturinarra@dhcp35-60 origin]$ ./_output/local/bin/linux/amd64/openshift-tests run-test "[sig-scheduling] SchedulerPreemption [Serial] validates lower priority pod preemption by critical pod [Disabled:Broken] [Suite:k8s]"
I0514 20:06:32.549317   19348 test_context.go:423] Tolerating taints "node-role.kubernetes.io/master" when considering if nodes are ready
error: no test exists with that name
[ramakasturinarra@dhcp35-60 origin]$ ./_output/local/bin/linux/amd64/openshift-tests run all --dry-run | grep -i "validates lower priority pod preemption by critical pod"
I0514 20:06:39.761823   19382 test_context.go:423] Tolerating taints "node-role.kubernetes.io/master" when considering if nodes are ready
"[sig-scheduling] SchedulerPreemption [Serial] validates lower priority pod preemption by critical pod [Suite:openshift/conformance/serial] [Suite:k8s]"
[ramakasturinarra@dhcp35-60 origin]$ ./_output/local/bin/linux/amd64/openshift-tests run-test "[sig-scheduling] SchedulerPreemption [Serial] validates lower priority pod preemption by critical pod [Suite:openshift/conformance/serial] [Suite:k8s]"
I0514 20:07:09.151099   19458 test_context.go:423] Tolerating taints "node-role.kubernetes.io/master" when considering if nodes are ready
May 14 20:07:09.216: INFO: Waiting up to 30m0s for all (but 100) nodes to be schedulable
May 14 20:07:09.458: INFO: Waiting up to 10m0s for all pods (need at least 0) in namespace 'kube-system' to be running and ready
May 14 20:07:10.377: INFO: 0 / 0 pods in namespace 'kube-system' are running and ready (0 seconds elapsed)
May 14 20:07:10.377: INFO: expected 0 pod replicas in namespace 'kube-system', 0 are Running and Ready.
May 14 20:07:10.377: INFO: Waiting up to 5m0s for all daemonsets in namespace 'kube-system' to start
May 14 20:07:10.651: INFO: e2e test version: v1.18.2
May 14 20:07:10.881: INFO: kube-apiserver version: v1.18.2
May 14 20:07:11.189: INFO: Cluster IP family: ipv4
[BeforeEach] [Top Level]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/framework.go:1413
[BeforeEach] [Top Level]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/framework.go:1413
[BeforeEach] [Top Level]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/test.go:58
[BeforeEach] [sig-scheduling] SchedulerPreemption [Serial]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:178
STEP: Creating a kubernetes client
STEP: Building a namespace api object, basename sched-preemption
May 14 20:07:12.115: INFO: About to run a Kube e2e test, ensuring namespace is privileged
May 14 20:07:14.977: INFO: No PodSecurityPolicies found; assuming PodSecurityPolicy is disabled.
STEP: Waiting for a default service account to be provisioned in namespace
[BeforeEach] [sig-scheduling] SchedulerPreemption [Serial]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/scheduling/preemption.go:86
May 14 20:07:16.206: INFO: Waiting up to 1m0s for all nodes to be ready
May 14 20:08:19.492: INFO: Waiting for terminating namespaces to be deleted...
[It] validates lower priority pod preemption by critical pod [Suite:openshift/conformance/serial] [Suite:k8s]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/scheduling/preemption.go:192
STEP: Create pods that use 2/3 of node resources.
May 14 20:08:20.309: INFO: Created pod: pod0-sched-preemption-low-priority
May 14 20:08:20.768: INFO: Created pod: pod1-sched-preemption-medium-priority
May 14 20:08:21.333: INFO: Created pod: pod2-sched-preemption-medium-priority
STEP: Wait for pods to be scheduled.
STEP: Run a critical pod that use same resources as that of a lower priority pod
[AfterEach] [sig-scheduling] SchedulerPreemption [Serial]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:179
May 14 20:08:53.385: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready
STEP: Destroying namespace "e2e-sched-preemption-4131" for this suite.
[AfterEach] [sig-scheduling] SchedulerPreemption [Serial]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/scheduling/preemption.go:76
May 14 20:08:56.156: INFO: Running AfterSuite actions on all nodes
May 14 20:08:56.156: INFO: Running AfterSuite actions on node 1
[ramakasturinarra@dhcp35-60 origin]$ ./_output/local/bin/linux/amd64/openshift-tests run all --dry-run | grep -i "validates basic preemption works"
I0514 20:10:44.104854   19870 test_context.go:423] Tolerating taints "node-role.kubernetes.io/master" when considering if nodes are ready
"[sig-scheduling] SchedulerPreemption [Serial] validates basic preemption works [Suite:openshift/conformance/serial] [Suite:k8s]"
[ramakasturinarra@dhcp35-60 origin]$ ./_output/local/bin/linux/amd64/openshift-tests run-test "[sig-scheduling] SchedulerPreemption [Serial] validates basic preemption works [Suite:openshift/conformance/serial] [Suite:k8s]"
I0514 20:11:08.666798   19927 test_context.go:423] Tolerating taints "node-role.kubernetes.io/master" when considering if nodes are ready
May 14 20:11:08.734: INFO: Waiting up to 30m0s for all (but 100) nodes to be schedulable
May 14 20:11:08.977: INFO: Waiting up to 10m0s for all pods (need at least 0) in namespace 'kube-system' to be running and ready
May 14 20:11:09.892: INFO: 0 / 0 pods in namespace 'kube-system' are running and ready (0 seconds elapsed)
May 14 20:11:09.893: INFO: expected 0 pod replicas in namespace 'kube-system', 0 are Running and Ready.
May 14 20:11:09.893: INFO: Waiting up to 5m0s for all daemonsets in namespace 'kube-system' to start
May 14 20:11:10.205: INFO: e2e test version: v1.18.2
May 14 20:11:10.499: INFO: kube-apiserver version: v1.18.2
May 14 20:11:10.809: INFO: Cluster IP family: ipv4
[BeforeEach] [Top Level]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/framework.go:1413
[BeforeEach] [Top Level]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/framework.go:1413
[BeforeEach] [Top Level]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/test.go:58
[BeforeEach] [sig-scheduling] SchedulerPreemption [Serial]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:178
STEP: Creating a kubernetes client
STEP: Building a namespace api object, basename sched-preemption
May 14 20:11:11.658: INFO: About to run a Kube e2e test, ensuring namespace is privileged
May 14 20:11:14.493: INFO: No PodSecurityPolicies found; assuming PodSecurityPolicy is disabled.
STEP: Waiting for a default service account to be provisioned in namespace
[BeforeEach] [sig-scheduling] SchedulerPreemption [Serial]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/scheduling/preemption.go:86
May 14 20:11:15.824: INFO: Waiting up to 1m0s for all nodes to be ready
May 14 20:12:20.135: INFO: Waiting for terminating namespaces to be deleted...
[It] validates basic preemption works [Suite:openshift/conformance/serial] [Suite:k8s]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/scheduling/preemption.go:110
STEP: Create pods that use 60% of node resources.
May 14 20:12:21.054: INFO: Created pod: pod0-sched-preemption-low-priority with resources: {Limits:map[scheduling.k8s.io/foo:{i:{value:2 scale:0} d:{Dec:<nil>} s:2 Format:DecimalSI}] Requests:map[scheduling.k8s.io/foo:{i:{value:2 scale:0} d:{Dec:<nil>} s:2 Format:DecimalSI}]}
May 14 20:12:21.565: INFO: Created pod: pod1-sched-preemption-medium-priority with resources: {Limits:map[scheduling.k8s.io/foo:{i:{value:2 scale:0} d:{Dec:<nil>} s:2 Format:DecimalSI}] Requests:map[scheduling.k8s.io/foo:{i:{value:2 scale:0} d:{Dec:<nil>} s:2 Format:DecimalSI}]}
May 14 20:12:22.181: INFO: Created pod: pod2-sched-preemption-medium-priority with resources: {Limits:map[scheduling.k8s.io/foo:{i:{value:2 scale:0} d:{Dec:<nil>} s:2 Format:DecimalSI}] Requests:map[scheduling.k8s.io/foo:{i:{value:2 scale:0} d:{Dec:<nil>} s:2 Format:DecimalSI}]}
STEP: Wait for pods to be scheduled.
STEP: Run a high priority pod that has same requirements as that of lower priority pod
[AfterEach] [sig-scheduling] SchedulerPreemption [Serial]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:179
May 14 20:12:33.444: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready
STEP: Destroying namespace "e2e-sched-preemption-9544" for this suite.
[AfterEach] [sig-scheduling] SchedulerPreemption [Serial]
  /home/ramakasturinarra/automation/OpenShift/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/scheduling/preemption.go:76
May 14 20:12:36.212: INFO: Running AfterSuite actions on all nodes
May 14 20:12:36.212: INFO: Running AfterSuite actions on node 1

Based on the above moving the bug to verified state.

Comment 30 errata-xmlrpc 2020-07-13 17:11:03 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Note You need to log in before you can comment on or make changes to this bug.