Bug 1733581 - failing tests: [sig-scheduling] NoExecuteTaintManager Multiple Pods [Serial] evicts pods with minTolerationSeconds
Summary: failing tests: [sig-scheduling] NoExecuteTaintManager Multiple Pods [Serial] ...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-scheduler
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 4.2.0
Assignee: Mike Dame
QA Contact: Xingxing Xia
URL:
Whiteboard: buildcop
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-26 15:49 UTC by Hongkai Liu
Modified: 2019-09-23 09:12 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-23 09:12:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 23600 0 None None None 2020-10-05 13:16:48 UTC
Github openshift origin pull 23829 0 None None None 2020-10-05 13:16:47 UTC

Description Hongkai Liu 2019-07-26 15:49:59 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.2/2512#0:build-log.txt%3A18770


Failing tests:
 [sig-scheduling] NoExecuteTaintManager Multiple Pods [Serial] evicts pods with minTolerationSeconds [Suite:openshift/conformance/serial] [Suite:k8s]
 Writing JUnit report to /tmp/artifacts/junit/junit_e2e_20190726-121548.xml
 error: 1 fail, 49 pass, 167 skip (1h0m56s)
2019/07/26 12:15:49 Container test in pod e2e-aws-serial failed, exit code 1, reason Error
2019/07/26 12:21:54 Copied 192.20Mi of artifacts from e2e-aws-serial to /logs/artifacts/e2e-aws-serial
2019/07/26 12:22:00 Ran for 1h35m31s
error: could not run steps: step e2e-aws-serial failed: template pod "e2e-aws-serial" failed: the pod ci-op-4nikhb87/e2e-aws-serial failed after 1h33m29s (failed containers: test): ContainerFailed one or more containers exited
 Container test exited with code 1, reason Error
---
5 I ns/openshift-image-registry pod/node-ca-6hftd node/ created
Jul 26 12:13:56.002 I ns/openshift-image-registry daemonset/node-ca Created pod: node-ca-6hftd
Jul 26 12:13:56.005 I ns/openshift-image-registry pod/node-ca-6hftd Successfully assigned openshift-image-registry/node-ca-6hftd to ip-10-0-135-33.ec2.internal
Jul 26 12:14:25.483 I ns/openshift-machine-api machine/ci-op-4nikhb87-ce7d8-28skd-master-2 Updated machine ci-op-4nikhb87-ce7d8-28skd-master-2 (11 times)
Jul 26 12:14:27.222 I ns/openshift-machine-api machine/ci-op-4nikhb87-ce7d8-28skd-master-0 Updated machine ci-op-4nikhb87-ce7d8-28skd-master-0 (11 times)
Jul 26 12:14:27.223 I ns/openshift-machine-api machine/ci-op-4nikhb87-ce7d8-28skd-worker-us-east-1a-vj9mx Updated machine ci-op-4nikhb87-ce7d8-28skd-worker-us-east-1a-vj9mx (14 times)
Jul 26 12:14:27.464 I ns/openshift-machine-api machine/ci-op-4nikhb87-ce7d8-28skd-worker-us-east-1a-fc545 Updated machine ci-op-4nikhb87-ce7d8-28skd-worker-us-east-1a-fc545 (14 times)
Jul 26 12:14:27.696 I ns/openshift-machine-api machine/ci-op-4nikhb87-ce7d8-28skd-worker-us-east-1b-xfhdz Updated machine ci-op-4nikhb87-ce7d8-28skd-worker-us-east-1b-xfhdz (14 times)
Jul 26 12:14:29.242 I ns/openshift-machine-api machine/ci-op-4nikhb87-ce7d8-28skd-master-1 Updated machine ci-op-4nikhb87-ce7d8-28skd-master-1 (11 times)
Jul 26 12:14:57.503 I ns/openshift-image-registry pod/node-ca-6hftd Container image "registry.svc.ci.openshift.org/ocp/4.2-2019-07-26-104231@sha256:0f8ea602298e98ad6b3bd049b318783c8e303ca9fe60d6a24b5c1b19a2c6e909" already present on machine
Jul 26 12:14:57.701 I ns/openshift-image-registry pod/node-ca-6hftd Created container node-ca
Jul 26 12:14:57.901 I ns/openshift-image-registry pod/node-ca-6hftd Started container node-ca
 Failing tests:
 [sig-scheduling] NoExecuteTaintManager Multiple Pods [Serial] evicts pods with minTolerationSeconds [Suite:openshift/conformance/serial] [Suite:k8s]
 Writing JUnit report to /tmp/artifacts/junit/junit_e2e_20190726-121548.xml
 error: 1 fail, 49 pass, 167 skip (1h0m56s)

Comment 7 W. Trevor King 2019-08-13 22:08:16 UTC
Clayton pointed out that 10% of 4.2 serial promotion gate failures are failing this test [1].  Recent example [2]:

  [sig-scheduling] NoExecuteTaintManager Multiple Pods [Serial] evicts pods with minTolerationSeconds [Suite:openshift/conformance/serial] [Suite:k8s]
  fail [k8s.io/kubernetes/test/e2e/scheduling/taints.go:440]: Aug 13 18:51:12.866: Failed to evict all Pods. 1 pod(s) is not evicted.

[1]: https://ci-search-ci-search-next.svc.ci.openshift.org/chart?name=release-openshift-origin-installer-e2e-aws-serial-4.2&search=k8s.io/kubernetes/test/e2e/scheduling/taints.go.*%20Failed%20to%20evict%20all%20Pods
[2]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-serial-4.2/3256

Comment 9 Xingxing Xia 2019-08-22 10:15:18 UTC
https://ci-search-ci-search-next.svc.ci.openshift.org/chart?name=release-openshift-origin-installer-e2e-aws-serial-4.2&search=k8s.io/kubernetes/test/e2e/scheduling/taints.go.*%20Failed%20to%20evict%20all%20Pods shows:
58 recent release-openshift-origin-installer-e2e-aws-serial-4.2 jobs
0 (0% of all failures) k8s.io/kubernetes/test/e2e/scheduling/taints.go.* Failed to evict all Pod

In "https://testgrid.k8s.io/redhat-openshift-release-blocking#redhat-release-openshift-origin-installer-e2e-aws-serial-4.2&sort-by-flakiness=" , also see the case is continuously green in executed jobs now. So changing the bug status

Comment 15 Xingxing Xia 2019-09-23 09:12:13 UTC
Checked jobs https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-serial-4.2/216 to latest job 232 as of commenting, the case of this bug "NoExecuteTaintManager Multiple Pods [Serial] evicts pods with minTolerationSeconds" is always green (passed). The PR fix is test code, not functional code. Thus changing the bug status.


Note You need to log in before you can comment on or make changes to this bug.