Bug 1733581 - failing tests: [sig-scheduling] NoExecuteTaintManager Multiple Pods [Serial] evicts pods with minTolerationSeconds
Summary: failing tests: [sig-scheduling] NoExecuteTaintManager Multiple Pods [Serial] ...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-scheduler
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.2.0
Assignee: Mike Dame
QA Contact: Xingxing Xia
Whiteboard: buildcop
Depends On:
TreeView+ depends on / blocked
Reported: 2019-07-26 15:49 UTC by Hongkai Liu
Modified: 2019-09-23 09:12 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2019-09-23 09:12:13 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift origin pull 23600 0 None None None 2020-10-05 13:16:48 UTC
Github openshift origin pull 23829 0 None None None 2020-10-05 13:16:47 UTC

Failing tests:
 [sig-scheduling] NoExecuteTaintManager Multiple Pods [Serial] evicts pods with minTolerationSeconds [Suite:openshift/conformance/serial] [Suite:k8s]
 Writing JUnit report to /tmp/artifacts/junit/junit_e2e_20190726-121548.xml
 error: 1 fail, 49 pass, 167 skip (1h0m56s)
2019/07/26 12:15:49 Container test in pod e2e-aws-serial failed, exit code 1, reason Error
2019/07/26 12:21:54 Copied 192.20Mi of artifacts from e2e-aws-serial to /logs/artifacts/e2e-aws-serial
2019/07/26 12:22:00 Ran for 1h35m31s
error: could not run steps: step e2e-aws-serial failed: template pod "e2e-aws-serial" failed: the pod ci-op-4nikhb87/e2e-aws-serial failed after 1h33m29s (failed containers: test): ContainerFailed one or more containers exited
 Container test exited with code 1, reason Error
5 I ns/openshift-image-registry pod/node-ca-6hftd node/ created
Jul 26 12:13:56.002 I ns/openshift-image-registry daemonset/node-ca Created pod: node-ca-6hftd
Jul 26 12:13:56.005 I ns/openshift-image-registry pod/node-ca-6hftd Successfully assigned openshift-image-registry/node-ca-6hftd to ip-10-0-135-33.ec2.internal
Jul 26 12:14:25.483 I ns/openshift-machine-api machine/ci-op-4nikhb87-ce7d8-28skd-master-2 Updated machine ci-op-4nikhb87-ce7d8-28skd-master-2 (11 times)
Jul 26 12:14:27.222 I ns/openshift-machine-api machine/ci-op-4nikhb87-ce7d8-28skd-master-0 Updated machine ci-op-4nikhb87-ce7d8-28skd-master-0 (11 times)
Jul 26 12:14:27.223 I ns/openshift-machine-api machine/ci-op-4nikhb87-ce7d8-28skd-worker-us-east-1a-vj9mx Updated machine ci-op-4nikhb87-ce7d8-28skd-worker-us-east-1a-vj9mx (14 times)
Jul 26 12:14:27.464 I ns/openshift-machine-api machine/ci-op-4nikhb87-ce7d8-28skd-worker-us-east-1a-fc545 Updated machine ci-op-4nikhb87-ce7d8-28skd-worker-us-east-1a-fc545 (14 times)
Jul 26 12:14:27.696 I ns/openshift-machine-api machine/ci-op-4nikhb87-ce7d8-28skd-worker-us-east-1b-xfhdz Updated machine ci-op-4nikhb87-ce7d8-28skd-worker-us-east-1b-xfhdz (14 times)
Jul 26 12:14:29.242 I ns/openshift-machine-api machine/ci-op-4nikhb87-ce7d8-28skd-master-1 Updated machine ci-op-4nikhb87-ce7d8-28skd-master-1 (11 times)
Jul 26 12:14:57.503 I ns/openshift-image-registry pod/node-ca-6hftd Container image "registry.svc.ci.openshift.org/ocp/4.2-2019-07-26-104231@sha256:0f8ea602298e98ad6b3bd049b318783c8e303ca9fe60d6a24b5c1b19a2c6e909" already present on machine
Jul 26 12:14:57.701 I ns/openshift-image-registry pod/node-ca-6hftd Created container node-ca
Jul 26 12:14:57.901 I ns/openshift-image-registry pod/node-ca-6hftd Started container node-ca
Comment 7 W. Trevor King 2019-08-13 22:08:16 UTC
Clayton pointed out that 10% of 4.2 serial promotion gate failures are failing this test [1].  Recent example [2]:

  [sig-scheduling] NoExecuteTaintManager Multiple Pods [Serial] evicts pods with minTolerationSeconds [Suite:openshift/conformance/serial] [Suite:k8s]
  fail [k8s.io/kubernetes/test/e2e/scheduling/taints.go:440]: Aug 13 18:51:12.866: Failed to evict all Pods. 1 pod(s) is not evicted.

[1]: https://ci-search-ci-search-next.svc.ci.openshift.org/chart?name=release-openshift-origin-installer-e2e-aws-serial-4.2&search=k8s.io/kubernetes/test/e2e/scheduling/taints.go.*%20Failed%20to%20evict%20all%20Pods
[2]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-serial-4.2/3256

Comment 9 Xingxing Xia 2019-08-22 10:15:18 UTC
https://ci-search-ci-search-next.svc.ci.openshift.org/chart?name=release-openshift-origin-installer-e2e-aws-serial-4.2&search=k8s.io/kubernetes/test/e2e/scheduling/taints.go.*%20Failed%20to%20evict%20all%20Pods shows:
58 recent release-openshift-origin-installer-e2e-aws-serial-4.2 jobs
0 (0% of all failures) k8s.io/kubernetes/test/e2e/scheduling/taints.go.* Failed to evict all Pod

In "https://testgrid.k8s.io/redhat-openshift-release-blocking#redhat-release-openshift-origin-installer-e2e-aws-serial-4.2&sort-by-flakiness=" , also see the case is continuously green in executed jobs now. So changing the bug status

Comment 15 Xingxing Xia 2019-09-23 09:12:13 UTC
Checked jobs https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-serial-4.2/216 to latest job 232 as of commenting, the case of this bug "NoExecuteTaintManager Multiple Pods [Serial] evicts pods with minTolerationSeconds" is always green (passed). The PR fix is test code, not functional code. Thus changing the bug status.

