Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1743102

Summary: Failing test: [sig-scheduling] SchedulerPreemption [Serial] validates pod anti-affinity works in preemption [Suite:openshift/conformance/serial] [Suite:k8s]
Product: OpenShift Container Platform Reporter: Xingxing Xia <xxia>
Component: kube-schedulerAssignee: Mike Dame <mdame>
Status: CLOSED ERRATA QA Contact: ge liu <geliu>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.2.0CC: anli, aos-bugs, jerzhang, mfojtik, nstielau, piqin, wzheng
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:36:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Xingxing Xia 2019-08-19 06:08:04 UTC
Description of problem:
https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-serial-4.2/21
https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-serial-4.2/19
... etc.


fail [k8s.io/kubernetes/test/e2e/scheduling/preemption.go:318]: Unexpected error:
    <*errors.errorString | 0xc0002733f0>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred

Version-Release number of selected component (if applicable):
4.2 jobs

How reproducible:
Much often in last week

Comment 1 Maciej Szulik 2019-08-21 11:28:37 UTC
I don't see this too often, lowering the priority and moving out of 4.2

Comment 2 Qin Ping 2019-08-26 07:20:41 UTC
Seems the resources is insufficient.


Aug 26 01:02:10.523: INFO: At 2019-08-26 00:57:05 +0000 UTC - event for pod0-sched-preemption-medium-priority: {default-scheduler } Scheduled: Successfully assigned e2e-sched-preemption-6106/pod0-sched-preemption-medium-priority to ci-op-q0jd3q58-3a8ca-kvjb9-worker-centralus1-c7s2t
Aug 26 01:02:10.523: INFO: At 2019-08-26 00:57:05 +0000 UTC - event for pod1-sched-preemption-low-priority: {default-scheduler } FailedScheduling: 0/6 nodes are available: 2 Insufficient cpu, 5 node(s) didn't match node selector.
Aug 26 01:02:10.523: INFO: At 2019-08-26 00:57:05 +0000 UTC - event for pod2-sched-preemption-low-priority: {default-scheduler } Scheduled: Successfully assigned e2e-sched-preemption-6106/pod2-sched-preemption-low-priority to ci-op-q0jd3q58-3a8ca-kvjb9-worker-centralus3-bk246
Aug 26 01:02:10.523: INFO: At 2019-08-26 00:57:07 +0000 UTC - event for pod0-sched-preemption-medium-priority: {kubelet ci-op-q0jd3q58-3a8ca-kvjb9-worker-centralus1-c7s2t} Pulled: Container image "k8s.gcr.io/pause:3.1" already present on machine
Aug 26 01:02:10.523: INFO: At 2019-08-26 00:57:07 +0000 UTC - event for pod0-sched-preemption-medium-priority: {kubelet ci-op-q0jd3q58-3a8ca-kvjb9-worker-centralus1-c7s2t} Created: Created container pod0-sched-preemption-medium-priority
Aug 26 01:02:10.523: INFO: At 2019-08-26 00:57:07 +0000 UTC - event for pod0-sched-preemption-medium-priority: {kubelet ci-op-q0jd3q58-3a8ca-kvjb9-worker-centralus1-c7s2t} Started: Started container pod0-sched-preemption-medium-priority
Aug 26 01:02:10.523: INFO: At 2019-08-26 00:57:08 +0000 UTC - event for pod2-sched-preemption-low-priority: {kubelet ci-op-q0jd3q58-3a8ca-kvjb9-worker-centralus3-bk246} Pulled: Container image "k8s.gcr.io/pause:3.1" already present on machine
Aug 26 01:02:10.523: INFO: At 2019-08-26 00:57:08 +0000 UTC - event for pod2-sched-preemption-low-priority: {kubelet ci-op-q0jd3q58-3a8ca-kvjb9-worker-centralus3-bk246} Created: Created container pod2-sched-preemption-low-priority
Aug 26 01:02:10.523: INFO: At 2019-08-26 00:57:08 +0000 UTC - event for pod2-sched-preemption-low-priority: {kubelet ci-op-q0jd3q58-3a8ca-kvjb9-worker-centralus3-bk246} Started: Started container pod2-sched-preemption-low-priority
Aug 26 01:02:10.564: INFO: POD                                    NODE                                                PHASE    GRACE  CONDITIONS
Aug 26 01:02:10.564: INFO: pod0-sched-preemption-medium-priority  ci-op-q0jd3q58-3a8ca-kvjb9-worker-centralus1-c7s2t  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2019-08-26 00:57:05 +0000 UTC  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2019-08-26 00:57:08 +0000 UTC  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2019-08-26 00:57:08 +0000 UTC  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2019-08-26 00:57:05 +0000 UTC  }]
Aug 26 01:02:10.564: INFO: pod1-sched-preemption-low-priority                                                         Pending         [{PodScheduled False 0001-01-01 00:00:00 +0000 UTC 2019-08-26 00:57:05 +0000 UTC Unschedulable 0/6 nodes are available: 2 Insufficient cpu, 5 node(s) didn't match node selector.}]
Aug 26 01:02:10.564: INFO: pod2-sched-preemption-low-priority     ci-op-q0jd3q58-3a8ca-kvjb9-worker-centralus3-bk246  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2019-08-26 00:57:05 +0000 UTC  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2019-08-26 00:57:08 +0000 UTC  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2019-08-26 00:57:08 +0000 UTC  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2019-08-26 00:57:05 +0000 UTC  }]

Comment 4 Mike Dame 2019-09-04 18:40:29 UTC
I think this may be addressed by https://github.com/kubernetes/kubernetes/pull/76663, which I'm rebasing and updating to hopefully pick to origin

Comment 5 Mike Dame 2019-09-04 22:28:29 UTC
Disregard that, we actually removed this test upstream so I opened a PR to pick that in origin here: https://github.com/openshift/origin/pull/23728

Comment 6 Yu Qi Zhang 2019-09-05 15:11:53 UTC
This test is now run against 4.2 nightly as part of azure-serial, and is now being considered a blocking failure for 4.2 (branching for 4.3). Moving this back to 4.2 and moving to urgent. Please reach out to me or nstielau if you think this should not be the case.

Comment 8 Mike Dame 2019-09-10 15:11:37 UTC
*** Bug 1748150 has been marked as a duplicate of this bug. ***

Comment 9 Mike Dame 2019-09-10 15:27:43 UTC
This test has been removed (in favor of a duplicate integration test), and should no longer be run. Can you please confirm that the test is no longer run?

Comment 10 Wenjing Zheng 2019-09-11 01:54:19 UTC
(In reply to Mike Dame from comment #9)
> This test has been removed (in favor of a duplicate integration test), and
> should no longer be run. Can you please confirm that the test is no longer
> run?

Yes, I cannot find this test in latest azure-serial test now: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-serial-4.2/127.

How to deal with this bug?

Comment 11 ge liu 2019-09-11 06:14:15 UTC
ok, close it based on comments above, thx

Comment 12 errata-xmlrpc 2019-10-16 06:36:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922