Bug 1941592
Summary: | HAProxyDown not Firing | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Apurva Nisal <anisal> |
Component: | Networking | Assignee: | Stephen Greene <sgreene> |
Networking sub component: | router | QA Contact: | Hongan Li <hongli> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | alegrand, amcdermo, anpicker, aos-bugs, erooth, jechen, juzhao, kakkoyun, lcosic, mjoseph, pkrupa, sgreene, surbania |
Version: | 4.6 | ||
Target Milestone: | --- | ||
Target Release: | 4.8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause:
HAProxyDown alert message was vague
Consequence:
End users thought HAProxyDown alert meant that the router pods were no available (instead of specifically just HAProxy)
Fix:
Make the HAProxyDown alert message more detailed
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-27 22:54:36 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Apurva Nisal
2021-03-22 12:56:44 UTC
The HAProxyDown alert fires when haproxy is down, not when there are no openshift router pods running. We will fix the message so that it reports that "haproxy is down" to avoid confusion. ClusterOperatorDegraded and ClusterOperatorDown alerts should fire if no router pods are scheduled or running. For example: https://github.com/openshift/cluster-version-operator/blob/master/install/0000_90_cluster-version-operator_02_servicemonitor.yaml#L73-L88 I will work on this bug during the 4.8 bug fix phase. attempted to verify in 4.8.0-0.nightly-2021-04-21-084059, pull #597 is listed in release status for this build, but Prometheus rule definition is still in old way of description: HAProxy metrics are reporting that the router is down. Suspect pull #597 is not in this build. Will wait for next build to verify verified https://github.com/openshift/cluster-ingress-operator/pull/597 in 4.8.0-0.nightly-2021-04-21-172405 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-04-21-172405 True False 42m Cluster version is 4.8.0-0.nightly-2021-04-21-172405 $ oc -n openshift-ingress-operator get PrometheusRule -oyaml <--snip--> rules: - alert: HAProxyReloadFail annotations: message: HAProxy reloads are failing on {{ $labels.pod }}. Router is not respecting recently created or modified routes expr: template_router_reload_failure == 1 for: 5m labels: severity: warning - alert: HAProxyDown annotations: message: HAProxy metrics are reporting that HAProxy is down on pod {{ $labels.namespace }} / {{ $labels.pod }} <--verified https://github.com/openshift/cluster-ingress-operator/pull/597/ expr: haproxy_up == 0 for: 5m labels: severity: critical Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |