Bug 2016352
| Summary: | Some pods start before CA resources are present | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jan Fajerski <jfajersk> |
| Component: | Monitoring | Assignee: | Jan Fajerski <jfajersk> |
| Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> |
| Severity: | medium | Docs Contact: | Brian Burt <bburt> |
| Priority: | medium | ||
| Version: | 4.10 | CC: | amuller, anpicker, aos-bugs, bburt, erooth, juzhao, spasquie |
| Target Milestone: | --- | ||
| Target Release: | 4.10.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
See also: https://bugzilla.redhat.com/show_bug.cgi?id=1953264
This release fixes an issue in which some pods in the monitoring stack would start before TLS certificate-related resources were present, which resulted in failures and restarts.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-03-10 16:21:33 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Jan Fajerski
2021-10-21 11:07:08 UTC
yes, found the issue in 4.10.0-0.nightly-2021-10-25-190146
# oc -n openshift-monitoring get pod prometheus-operator-54695d6648-lskjc -oyaml
...
- containerID: cri-o://adb147861b4c06252af9599936b90d684dfbe911a77652820f0675a225327c61
image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:dfc11491ae5f242603571cefaae1879af9427ca1178bedf53cc385d1dae115a7
imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:dfc11491ae5f242603571cefaae1879af9427ca1178bedf53cc385d1dae115a7
lastState:
terminated:
containerID: cri-o://8332e0b9e0b79c91eb7d3b4ae2639118fef008b1593df72af345174e81f4e053
exitCode: 255
finishedAt: "2021-10-27T00:54:11Z"
message: "ueue.(*Type).updateUnfinishedWorkLoop(0xc000370a20)\n\t/go/src/github.com/brancz/kube-rbac-proxy/vendor/k8s.io/client-go/util/workqueue/queue.go:198
+0xac\ncreated by k8s.io/client-go/util/workqueue.newQueue\n\t/go/src/github.com/brancz/kube-rbac-proxy/vendor/k8s.io/client-go/util/workqueue/queue.go:58
+0x135\n\ngoroutine 13 [select]:\nk8s.io/client-go/util/workqueue.(*delayingType).waitingLoop(0xc000370b40)\n\t/go/src/github.com/brancz/kube-rbac-proxy/vendor/k8s.io/client-go/util/workqueue/delaying_queue.go:231
+0x3df\ncreated by k8s.io/client-go/util/workqueue.newDelayingQueue\n\t/go/src/github.com/brancz/kube-rbac-proxy/vendor/k8s.io/client-go/util/workqueue/delaying_queue.go:68
+0x185\n\ngoroutine 14 [semacquire]:\nsync.runtime_SemacquireMutex(0x22992dc,
0xc0000f3d00, 0x1)\n\t/usr/lib/golang/src/runtime/sema.go:71 +0x47\nsync.(*Mutex).lockSlow(0x22992d8)\n\t/usr/lib/golang/src/sync/mutex.go:138
+0x105\nsync.(*Mutex).Lock(...)\n\t/usr/lib/golang/src/sync/mutex.go:81\nk8s.io/klog/v2.(*loggingT).output(0x22992c0,
0xc000000000, 0x0, 0x0, 0xc00043c070, 0x1bfed0d, 0x19, 0xa7, 0x0)\n\t/go/src/github.com/brancz/kube-rbac-proxy/vendor/k8s.io/klog/v2/klog.go:882
+0x825\nk8s.io/klog/v2.(*loggingT).printf(0x22992c0, 0x0, 0x0, 0x0, 0x1742b89,
0xb, 0xc0000f3f20, 0x1, 0x1)\n\t/go/src/github.com/brancz/kube-rbac-proxy/vendor/k8s.io/klog/v2/klog.go:733
+0x17a\nk8s.io/klog/v2.Infof(...)\n\t/go/src/github.com/brancz/kube-rbac-proxy/vendor/k8s.io/klog/v2/klog.go:1363\nk8s.io/apiserver/pkg/server/dynamiccertificates.(*DynamicFileCAContent).Run(0xc000370ba0,
0x1, 0x0)\n\t/go/src/github.com/brancz/kube-rbac-proxy/vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/dynamic_cafile_content.go:167
+0x145\ngithub.com/brancz/kube-rbac-proxy/pkg/authn.(*DelegatingAuthenticator).Run(0xc00009e288,
0x1, 0x0)\n\t/go/src/github.com/brancz/kube-rbac-proxy/pkg/authn/delegating.go:82
+0x51\ncreated by main.main\n\t/go/src/github.com/brancz/kube-rbac-proxy/main.go:189
+0x3547\nI1027 00:54:11.192982 1 dynamic_cafile_content.go:167] Starting
client-ca::/etc/tls/client/client-ca.crt\n"
reason: Error
startedAt: "2021-10-27T00:54:11Z"
name: kube-rbac-proxy
ready: true
restartCount: 1
started: true
state:
running:
startedAt: "2021-10-27T00:54:11Z"
***************
waiting for the fix merged to payload
*** Bug 2017616 has been marked as a duplicate of this bug. *** tested with the PRs, no restarted pods in the fresh cluster # oc -n openshift-monitoring get pod NAME READY STATUS RESTARTS AGE alertmanager-main-0 6/6 Running 0 132m alertmanager-main-1 6/6 Running 0 132m alertmanager-main-2 6/6 Running 0 132m cluster-monitoring-operator-85489b49d6-n7hr7 2/2 Running 0 146m grafana-6fddbbcc5c-v5tcx 3/3 Running 0 132m kube-state-metrics-86c755dc84-tprm9 3/3 Running 0 140m node-exporter-d99lw 2/2 Running 0 140m node-exporter-r4x44 2/2 Running 0 140m node-exporter-rjzkw 2/2 Running 0 140m node-exporter-t2b7h 2/2 Running 0 133m node-exporter-x76d9 2/2 Running 0 133m node-exporter-z4rq9 2/2 Running 0 133m openshift-state-metrics-7b5dd5bdd4-5jnrt 3/3 Running 0 140m prometheus-adapter-6fb8fd644-r4njz 1/1 Running 0 134m prometheus-adapter-6fb8fd644-vmvnr 1/1 Running 0 134m prometheus-k8s-0 6/6 Running 0 132m prometheus-k8s-1 6/6 Running 0 132m prometheus-operator-7776f6885f-5l6kq 2/2 Running 0 140m telemeter-client-68494bcb44-6tk6c 3/3 Running 0 140m thanos-querier-58dcb67767-l2zd2 6/6 Running 0 132m thanos-querier-58dcb67767-mllhv 6/6 Running 0 132m # oc -n openshift-user-workload-monitoring get pod NAME READY STATUS RESTARTS AGE prometheus-operator-57599545c8-v7qn6 2/2 Running 0 3m37s prometheus-user-workload-0 5/5 Running 0 3m34s prometheus-user-workload-1 5/5 Running 0 3m34s thanos-ruler-user-workload-0 3/3 Running 0 3m30s thanos-ruler-user-workload-1 3/3 Running 0 3m30s Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |