Bug 1743102 - Failing test: [sig-scheduling] SchedulerPreemption [Serial] validates pod anti-affinity works in preemption [Suite:openshift/conformance/serial] [Suite:k8s]
Summary: Failing test: [sig-scheduling] SchedulerPreemption [Serial] validates pod ant...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-scheduler
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 4.2.0
Assignee: Mike Dame
QA Contact: ge liu
URL:
Whiteboard:
: 1748150 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-19 06:08 UTC by Xingxing Xia
Modified: 2019-10-16 06:36 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:36:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 23728 0 None closed Bug 1743102: UPSTREAM: 80821: Remove duplicate anti-affinity scheduler e2e 2020-01-29 15:58:41 UTC
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:36:36 UTC

Description Xingxing Xia 2019-08-19 06:08:04 UTC
Description of problem:
https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-serial-4.2/21
https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-serial-4.2/19
... etc.


fail [k8s.io/kubernetes/test/e2e/scheduling/preemption.go:318]: Unexpected error:
    <*errors.errorString | 0xc0002733f0>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred

Version-Release number of selected component (if applicable):
4.2 jobs

How reproducible:
Much often in last week

Comment 1 Maciej Szulik 2019-08-21 11:28:37 UTC
I don't see this too often, lowering the priority and moving out of 4.2

Comment 2 Qin Ping 2019-08-26 07:20:41 UTC
Seems the resources is insufficient.


Aug 26 01:02:10.523: INFO: At 2019-08-26 00:57:05 +0000 UTC - event for pod0-sched-preemption-medium-priority: {default-scheduler } Scheduled: Successfully assigned e2e-sched-preemption-6106/pod0-sched-preemption-medium-priority to ci-op-q0jd3q58-3a8ca-kvjb9-worker-centralus1-c7s2t
Aug 26 01:02:10.523: INFO: At 2019-08-26 00:57:05 +0000 UTC - event for pod1-sched-preemption-low-priority: {default-scheduler } FailedScheduling: 0/6 nodes are available: 2 Insufficient cpu, 5 node(s) didn't match node selector.
Aug 26 01:02:10.523: INFO: At 2019-08-26 00:57:05 +0000 UTC - event for pod2-sched-preemption-low-priority: {default-scheduler } Scheduled: Successfully assigned e2e-sched-preemption-6106/pod2-sched-preemption-low-priority to ci-op-q0jd3q58-3a8ca-kvjb9-worker-centralus3-bk246
Aug 26 01:02:10.523: INFO: At 2019-08-26 00:57:07 +0000 UTC - event for pod0-sched-preemption-medium-priority: {kubelet ci-op-q0jd3q58-3a8ca-kvjb9-worker-centralus1-c7s2t} Pulled: Container image "k8s.gcr.io/pause:3.1" already present on machine
Aug 26 01:02:10.523: INFO: At 2019-08-26 00:57:07 +0000 UTC - event for pod0-sched-preemption-medium-priority: {kubelet ci-op-q0jd3q58-3a8ca-kvjb9-worker-centralus1-c7s2t} Created: Created container pod0-sched-preemption-medium-priority
Aug 26 01:02:10.523: INFO: At 2019-08-26 00:57:07 +0000 UTC - event for pod0-sched-preemption-medium-priority: {kubelet ci-op-q0jd3q58-3a8ca-kvjb9-worker-centralus1-c7s2t} Started: Started container pod0-sched-preemption-medium-priority
Aug 26 01:02:10.523: INFO: At 2019-08-26 00:57:08 +0000 UTC - event for pod2-sched-preemption-low-priority: {kubelet ci-op-q0jd3q58-3a8ca-kvjb9-worker-centralus3-bk246} Pulled: Container image "k8s.gcr.io/pause:3.1" already present on machine
Aug 26 01:02:10.523: INFO: At 2019-08-26 00:57:08 +0000 UTC - event for pod2-sched-preemption-low-priority: {kubelet ci-op-q0jd3q58-3a8ca-kvjb9-worker-centralus3-bk246} Created: Created container pod2-sched-preemption-low-priority
Aug 26 01:02:10.523: INFO: At 2019-08-26 00:57:08 +0000 UTC - event for pod2-sched-preemption-low-priority: {kubelet ci-op-q0jd3q58-3a8ca-kvjb9-worker-centralus3-bk246} Started: Started container pod2-sched-preemption-low-priority
Aug 26 01:02:10.564: INFO: POD                                    NODE                                                PHASE    GRACE  CONDITIONS
Aug 26 01:02:10.564: INFO: pod0-sched-preemption-medium-priority  ci-op-q0jd3q58-3a8ca-kvjb9-worker-centralus1-c7s2t  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2019-08-26 00:57:05 +0000 UTC  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2019-08-26 00:57:08 +0000 UTC  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2019-08-26 00:57:08 +0000 UTC  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2019-08-26 00:57:05 +0000 UTC  }]
Aug 26 01:02:10.564: INFO: pod1-sched-preemption-low-priority                                                         Pending         [{PodScheduled False 0001-01-01 00:00:00 +0000 UTC 2019-08-26 00:57:05 +0000 UTC Unschedulable 0/6 nodes are available: 2 Insufficient cpu, 5 node(s) didn't match node selector.}]
Aug 26 01:02:10.564: INFO: pod2-sched-preemption-low-priority     ci-op-q0jd3q58-3a8ca-kvjb9-worker-centralus3-bk246  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2019-08-26 00:57:05 +0000 UTC  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2019-08-26 00:57:08 +0000 UTC  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2019-08-26 00:57:08 +0000 UTC  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2019-08-26 00:57:05 +0000 UTC  }]

Comment 4 Mike Dame 2019-09-04 18:40:29 UTC
I think this may be addressed by https://github.com/kubernetes/kubernetes/pull/76663, which I'm rebasing and updating to hopefully pick to origin

Comment 5 Mike Dame 2019-09-04 22:28:29 UTC
Disregard that, we actually removed this test upstream so I opened a PR to pick that in origin here: https://github.com/openshift/origin/pull/23728

Comment 6 Yu Qi Zhang 2019-09-05 15:11:53 UTC
This test is now run against 4.2 nightly as part of azure-serial, and is now being considered a blocking failure for 4.2 (branching for 4.3). Moving this back to 4.2 and moving to urgent. Please reach out to me or nstielau if you think this should not be the case.

Comment 8 Mike Dame 2019-09-10 15:11:37 UTC
*** Bug 1748150 has been marked as a duplicate of this bug. ***

Comment 9 Mike Dame 2019-09-10 15:27:43 UTC
This test has been removed (in favor of a duplicate integration test), and should no longer be run. Can you please confirm that the test is no longer run?

Comment 10 Wenjing Zheng 2019-09-11 01:54:19 UTC
(In reply to Mike Dame from comment #9)
> This test has been removed (in favor of a duplicate integration test), and
> should no longer be run. Can you please confirm that the test is no longer
> run?

Yes, I cannot find this test in latest azure-serial test now: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-serial-4.2/127.

How to deal with this bug?

Comment 11 ge liu 2019-09-11 06:14:15 UTC
ok, close it based on comments above, thx

Comment 12 errata-xmlrpc 2019-10-16 06:36:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.