Bug 1778529
Summary: | 4.2 to 4.3 upgrades broken when cluster-dns-operator attempts to upgrade due to missing metrics-tls secret | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Clayton Coleman <ccoleman> | |
Component: | Networking | Assignee: | Miciah Dashiel Butler Masters <mmasters> | |
Networking sub component: | DNS | QA Contact: | Hongan Li <hongli> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | urgent | |||
Priority: | urgent | CC: | aos-bugs, bbennett, dmace, lmohanty, wking | |
Version: | 4.3.0 | |||
Target Milestone: | --- | |||
Target Release: | 4.4.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Cause: The DNS operator did not monitor or update the DNS DaemonSet's tolerations.
Consequence: If a user or a previous version of the DNS operator set incorrect tolerations on the DNS DaemonSet, the operator did not correct them.
Fix: The DNS operator now checks the DNS DaemonSet's tolerations and updates them if they are not what the operator expects.
Result: The DNS DaemonSet now always has the expected tolerations.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1778954 (view as bug list) | Environment: | ||
Last Closed: | 2020-05-13 21:53:23 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1778954 |
Description
Clayton Coleman
2019-12-01 22:35:44 UTC
The only problem I see here is two CoreDNS pods were scheduled to nodes with a Ready=Unknown condition, causing DNS to report degraded: ip-10-0-136-246.ec2.internal dns-default-p8zxt ip-10-0-142-83.ec2.internal dns-default-42x8h https://github.com/openshift/cluster-dns-operator/pull/140 was supposed to fix the scheduling issue, but the fix was incomplete because the operator wasn't actually rolling out the new toleration changes. Miciah has fixed that in https://github.com/openshift/cluster-dns-operator/pull/144. I believe https://github.com/openshift/cluster-dns-operator/pull/144 is the fix. [1] shows NotAllDNSesAvailable clearing up in the wake of #144. We're still failing upgrades on NodeControllerDegraded, but that's bug 1778904. [1]: https://ci-search-ci-search-next.svc.ci.openshift.org/chart?name=%5erelease-.*upgrade&search=NotAllDNSesAvailable:%20Not%20all%20desired%20DNS%20DaemonSets%20available&search=NodeControllerDegradedMasterNodesReady:%20NodeControllerDegraded:%20The%20master%20node.*not%20ready didn't see the issue in recent 4.4 upgrade testing, moving to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581 |