Bug 1908880
Summary: | 4.7 aws-serial CI: NoExecuteTaintManager Single Pod [Serial] eventually evict pod with finite tolerations from tainted nodes | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | W. Trevor King <wking> |
Component: | Node | Assignee: | Elana Hashman <ehashman> |
Node sub component: | Kubelet | QA Contact: | Sunil Choudhary <schoudha> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | aos-bugs, fabian, rphillips, tsweeney |
Version: | 4.7 | Keywords: | UpcomingSprint |
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: Performance regression in Kubernetes 1.20: checking sandbox deletion caused pod deletions to take much longer.
Consequence: many tests that expected pods to be deleted quickly began flaking as pods were not deleted in time
Fix: reverted sandbox deletion logic
Result: pod deletions should now finish in the expected amount of time without a performance regression
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-02-24 15:46:26 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
W. Trevor King
2020-12-17 19:13:11 UTC
xref https://github.com/kubernetes/kubernetes/issues/42685 upstream This is a very old test, I'm wondering if it just has a tuning issue (as encountered in the upstream issue)? I'll take a closer look. I am consistently seeing the error mentioned above on all the 4.7 failures: ``` The container could not be located when the pod was deleted. The container used to be Running ``` This matches https://github.com/kubernetes/kubernetes/issues/97288 - an upstream regression in the 1.20 release. "after patching a deployment, the old pod sticks around for over a minute (or test times out after a minute). This is despite terminationGracePeriodSeconds: 30s." consistent with the behaviour we're seeing here on the flaky tests. *** Bug 1915494 has been marked as a duplicate of this bug. *** Checking for this test failure, I see last it failed 4 days in 4.7 serial tests. Do not see any recent failure after fix is merged. https://search.ci.openshift.org/?search=eventually+evict+pod+with+finite+tolerations+from+tainted+nodes&maxAge=168h&context=1&type=junit&name=&maxMatches=5&maxBytes=20971520&groupBy=job Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |