Bug 1778954
Summary: | [4.3] 4.2 to 4.3 upgrades broken when cluster-dns-operator attempts to upgrade due to missing metrics-tls secret | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Miciah Dashiel Butler Masters <mmasters> |
Component: | Networking | Assignee: | Miciah Dashiel Butler Masters <mmasters> |
Networking sub component: | DNS | QA Contact: | Hongan Li <hongli> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | urgent | CC: | aos-bugs, bbennett, ccoleman, dmace, hongli, lmohanty |
Version: | 4.3.0 | ||
Target Milestone: | --- | ||
Target Release: | 4.3.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1778529 | Environment: | |
Last Closed: | 2020-01-23 11:14:59 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1778529 | ||
Bug Blocks: |
Description
Miciah Dashiel Butler Masters
2019-12-02 22:13:04 UTC
The PR#145 was merged in https://openshift-release.svc.ci.openshift.org/releasestream/4.3.0-0.nightly/release/4.3.0-0.nightly-2019-12-03-211441. When checking https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/12062 which upgrade to 4.3.0-0.nightly-2019-12-04-214544 and still found the message: secret "metrics-tls" not found Dec 04 22:50:01.382 I ns/openshift-dns-operator deployment/dns-operator Scaled up replica set dns-operator-6746ff4575 to 1 Dec 04 22:50:01.391 I ns/openshift-dns-operator pod/dns-operator-6746ff4575-dl5kq node/ created Dec 04 22:50:01.410 I ns/openshift-dns-operator replicaset/dns-operator-6746ff4575 Created pod: dns-operator-6746ff4575-dl5kq Dec 04 22:50:01.438 I ns/openshift-dns-operator pod/dns-operator-6746ff4575-dl5kq Successfully assigned openshift-dns-operator/dns-operator-6746ff4575-dl5kq to ip-10-0-140-114.ec2.internal Dec 04 22:50:01.438 W ns/openshift-monitoring pod/kube-state-metrics-544fbcbfbb-qtmbk network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network (10 times) Dec 04 22:50:01.649 W ns/openshift-dns-operator pod/dns-operator-6746ff4575-dl5kq MountVolume.SetUp failed for volume "metrics-tls" : secret "metrics-tls" not found Since we didn't back port the scheduling fix to 4.2, it's expected that the DNS pods will continue to be scheduled to un-ready nodes prior to the 4.3 upgrade. The root problem is that the nodes aren't ready. DNS isn't causing the problems, and would eventually report success when the nodes are fixed. Maybe we should go ahead and backport the scheduling fix to 4.2. Opened https://bugzilla.redhat.com/show_bug.cgi?id=1780213 to backport the scheduling fix to 4.2.z. Let's re-test once the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1780213 is released. This is still waiting for https://github.com/openshift/cluster-dns-operator/pull/150 to be approved for release. https://github.com/openshift/cluster-dns-operator/pull/150 merged, so once that's released, let's try the upgrade from 4.2.z containing https://github.com/openshift/cluster-dns-operator/pull/150 to the latest 4.3 release. verified with upgrade from 4.2.0-0.nightly-2019-12-20-124216 to 4.3.0-0.nightly-2019-12-22-223447 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062 |