Bug 1959185
Summary: | 4.6 CI failures with aws due to prometheus NoRunningOvnMaster alert | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Tim Rozet <trozet> |
Component: | Networking | Assignee: | jamo luhrsen <jluhrsen> |
Networking sub component: | ovn-kubernetes | QA Contact: | Ross Brattain <rbrattai> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | urgent | CC: | aconstan, astoycos, bbennett, bverschu, dcbw, memodi, mpatel, rbrattai |
Version: | 4.7 | ||
Target Milestone: | --- | ||
Target Release: | 4.6.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-06-01 12:10:08 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1891023 | ||
Bug Blocks: |
Description
Tim Rozet
2021-05-10 20:51:05 UTC
These failures started happening when this PR in CNO was merged to fix kube-rbac-proxy startup scripts https://github.com/openshift/cluster-network-operator/pull/1061 I think the scripts are now better reporting a problem when ovn-node-metrics-certs is not mounted. in 4.7 ovn-node-metrics-certs is mounted fine: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-ovn-4.7/1393028380240121856/artifacts/e2e-aws/pods/openshift-ovn-kubernetes_ovnkube-node-w2fh2_kube-rbac-proxy.log but, in 4.6, before the rbac-proxy script fix, you can see what looks like some trouble (traceback) and there is no log message that ovn-node-metrics-certs is mounted. I assume this meant that we did not fire any alert when maybe we should have: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-ovn-4.6/1388934081181388800/artifacts/e2e-aws/pods/openshift-ovn-kubernetes_ovnkube-node-p7pxb_kube-rbac-proxy.log after that rbac-proxy script fix, we can see that it's failing repeatedly and I'm assuming that's what fires the alert: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-ovn-4.6/1389008719353745408/artifacts/e2e-aws/pods/openshift-ovn-kubernetes_ovnkube-node-qphc2_kube-rbac-proxy.log I will be investigating around ovn-node-metrics-certs next. This PR fixes it for 4.6: https://github.com/openshift/cluster-network-operator/pull/1096 I don't know how to do the magic on this BZ, which was filed just for a bug on 4.6, so that I can link the PR. I am getting a complaint that there needs to be a 4.7 bug if I want to use 4.6.z as the target. I tried moving this bug to 4.7 to get around that, but that didn't work either. *** Bug 1960781 has been marked as a duplicate of this bug. *** Verified on 4.6.0-0.nightly-2021-05-24-230019 on ipi-on-aws/versioned-installer-ovn It seems to take a few minutes to wait for the certs to be mounted. The kube-rbac-proxy scrips them selves are now inconsistent with respect each other and to comments and messages, we don't wait for "one hour." for example, but these are cosmetic issues. 2021-05-25T21:14:03+00:00 INFO: ovn-node-metrics-certs mounted, starting kube-rbac-proxy I0525 21:14:03.461287 2803 main.go:188] Valid token audiences: I0525 21:14:03.461454 2803 main.go:261] Reading certificate files I0525 21:14:03.461781 2803 main.go:294] Starting TCP socket on :9103 I0525 21:14:03.462287 2803 main.go:301] Listening securely on :9103 2021-05-25T21:04:50+00:00 INFO: ovn-node-metrics-cert not mounted. Waiting one hour. 2021-05-25T21:08:30+00:00 INFO: ovn-node-metrics-certs mounted, starting kube-rbac-proxy I0525 21:08:30.646215 2542 main.go:188] Valid token audiences: I0525 21:08:30.646299 2542 main.go:261] Reading certificate files I0525 21:08:30.646552 2542 main.go:294] Starting TCP socket on :9103 I0525 21:08:30.646875 2542 main.go:301] Listening securely on :9103 2021-05-25T21:14:04+00:00 INFO: ovn-node-metrics-certs mounted, starting kube-rbac-proxy I0525 21:14:04.396860 2848 main.go:188] Valid token audiences: I0525 21:14:04.397075 2848 main.go:261] Reading certificate files I0525 21:14:04.398162 2848 main.go:294] Starting TCP socket on :9103 I0525 21:14:04.398606 2848 main.go:301] Listening securely on :9103 2021-05-25T21:04:51+00:00 INFO: ovn-node-metrics-cert not mounted. Waiting one hour. 2021-05-25T21:08:51+00:00 INFO: ovn-node-metrics-certs mounted, starting kube-rbac-proxy I0525 21:08:51.614035 2550 main.go:188] Valid token audiences: I0525 21:08:51.614128 2550 main.go:261] Reading certificate files I0525 21:08:51.614435 2550 main.go:294] Starting TCP socket on :9103 I0525 21:08:51.614774 2550 main.go:301] Listening securely on :9103 2021-05-25T21:16:25+00:00 INFO: ovn-node-metrics-certs mounted, starting kube-rbac-proxy I0525 21:16:25.748801 2832 main.go:188] Valid token audiences: I0525 21:16:25.748947 2832 main.go:261] Reading certificate files I0525 21:16:25.749240 2832 main.go:294] Starting TCP socket on :9103 I0525 21:16:25.749995 2832 main.go:301] Listening securely on :9103 2021-05-25T21:04:47+00:00 INFO: ovn-node-metrics-cert not mounted. Waiting one hour. 2021-05-25T21:08:32+00:00 INFO: ovn-node-metrics-certs mounted, starting kube-rbac-proxy I0525 21:08:32.399646 2642 main.go:188] Valid token audiences: I0525 21:08:32.399716 2642 main.go:261] Reading certificate files I0525 21:08:32.399910 2642 main.go:294] Starting TCP socket on :9103 I0525 21:08:32.400221 2642 main.go:301] Listening securely on :9103 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.31 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2100 |