Bug 1689021
Summary: | Prometheus adapter is reported as unreachable by apiservers and never recovers, causing wedged ns deletions | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Clayton Coleman <ccoleman> |
Component: | Monitoring | Assignee: | Sergiusz Urbaniak <surbania> |
Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> |
Severity: | medium | Docs Contact: | |
Priority: | urgent | ||
Version: | 4.1.0 | CC: | erooth, fbranczy, lserven, mifiedle, mloibl, ncredi, pweil, surbania |
Target Milestone: | --- | ||
Target Release: | 4.1.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-06-04 10:45:52 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: |
Description
Clayton Coleman
2019-03-14 23:47:28 UTC
It seems that we see similar errors in other components in the openshift stack, including Prometheus itself, but some of these components are able to recover from failure. We need to investigate if the Prometheus adapter is simply missing retry logic. Regarding the error in the logs, it seems that this was fixed in newer versions of Kubernetes apimachinery. I'd say we should upgrade to v1.14 (as we only use the pod and node API this should be safe to do). For what it's worth the log lines are from the list/watch and are suggesting the apiserver is closing connections unexpectedly, so it doesn't seem to me that these log lines have anything to do with the failure. However, we're going to go through all the components necessary to update everything to the latest apimachinery code. Moving to assigned as we're working on updating the Kubernetes dependencies throughout the stack. Note that GOAWAY isn't an actual error. It's just an informational message and isn't indicative of any error state. tested with payload: 4.1.0-0.nightly-2019-04-23-223857, no such issue now # oc get apiservice v1beta1.metrics.k8s.io NAME SERVICE AVAILABLE AGE v1beta1.metrics.k8s.io openshift-monitoring/prometheus-adapter True 28h Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |