Bug 1956768
Summary: | aws-ebs-csi-driver-controller-metrics TargetDown | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Petr Muller <pmuller> |
Component: | Storage | Assignee: | Jan Safranek <jsafrane> |
Storage sub component: | Storage | QA Contact: | Qin Ping <piqin> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | unspecified | CC: | aos-bugs, jsafrane, wking |
Version: | 4.8 | ||
Target Milestone: | --- | ||
Target Release: | 4.8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: |
test: openshift-tests.[sig-instrumentation][Late] Alerts shouldn't report any alerts in firing or pending state apart from Watchdog and AlertmanagerReceiversNotConfigured and have no gaps in Watchdog firing [Suite:openshift/conformance/parallel]
|
|
Last Closed: | 2021-07-27 23:06:09 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Petr Muller
2021-05-04 11:37:14 UTC
Setting to high, this blocks CI release acceptance It's indeed related to the rebase, the provisioner returns 500 to metrics request:
sh-4.4# curl -v localhost:8202/metrics
* Trying ::1...
* TCP_NODELAY set
* connect to ::1 port 8202 failed: Connection refused
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8202 (#0)
> GET /metrics HTTP/1.1
> Host: localhost:8202
> User-Agent: curl/7.61.1
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Tue, 04 May 2021 16:20:47 GMT
< Content-Length: 243
<
An error has occurred while serving metrics:
gathered metric family process_start_time_seconds has help "[ALPHA] Start time of the process since unix epoch in seconds." but should have "Start time of the process since unix epoch in seconds."
I am unable to find the root cause, the same container works locally, only in the real OCP cluster it breaks. It's related to CSI migration - the provisioner behaves differently for migratable CSI drivers and registers wrong metrics. This registration should include "metrics.WithProcessStartTime(false)" option to prevent double registration of startup time metric: https://github.com/kubernetes-csi/external-provisioner/blob/5d1b62aaa38b309e2c845e97efdc24c944fb66d8/cmd/csi-provisioner/csi-provisioner.go#L225 Verified with: 4.8.0-0.nightly-2021-05-06-210840 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |