1941592 – HAProxyDown not Firing

Bug 1941592 - HAProxyDown not Firing

Summary: HAProxyDown not Firing

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Stephen Greene
QA Contact:	Hongan Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-03-22 12:56 UTC by Apurva Nisal
Modified:	2024-10-01 17:44 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: HAProxyDown alert message was vague Consequence: End users thought HAProxyDown alert meant that the router pods were no available (instead of specifically just HAProxy) Fix: Make the HAProxyDown alert message more detailed
Clone Of:
Environment:
Last Closed:	2021-07-27 22:54:36 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-ingress-operator pull 597	0	None	open	Bug 1941592: Alerts: Fix up HAProxyDown Alert Message	2021-04-12 13:49:11 UTC
Red Hat Product Errata	RHSA-2021:2438	0	None	None	None	2021-07-27 22:54:57 UTC

Description Apurva Nisal 2021-03-22 12:56:44 UTC

Description of problem:

HAProxyDown not Firing  when all router pods (-n openshift-ingress) are down or all nodes on which router pods are scheduled are down

Version-Release number of selected component (if applicable):
RHOCP 4.6

Actual results:
HAProxyDown not Firing 

Expected results:
HAProxyDown should be Firing

Comment 2 Andrew McDermott 2021-03-23 18:04:44 UTC

The HAProxyDown alert fires when haproxy is down, not when there are no openshift router pods running.
We will fix the message so that it reports that "haproxy is down" to avoid confusion.

ClusterOperatorDegraded and ClusterOperatorDown alerts should fire if no router pods are scheduled or running.

For example:

https://github.com/openshift/cluster-version-operator/blob/master/install/0000_90_cluster-version-operator_02_servicemonitor.yaml#L73-L88

Comment 3 Stephen Greene 2021-03-31 19:31:05 UTC

I will work on this bug during the 4.8 bug fix phase.

Comment 5 jechen 2021-04-21 19:32:28 UTC

attempted to verify in 4.8.0-0.nightly-2021-04-21-084059, pull #597 is listed in release status for this build, but  Prometheus rule definition is still in old way of description:  HAProxy metrics are reporting that the router is down.  Suspect pull #597 is not in this build.   Will wait for next build to verify

Comment 6 jechen 2021-04-21 23:46:43 UTC

verified https://github.com/openshift/cluster-ingress-operator/pull/597 in 4.8.0-0.nightly-2021-04-21-172405

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-04-21-172405   True        False         42m     Cluster version is 4.8.0-0.nightly-2021-04-21-172405



$ oc -n openshift-ingress-operator get PrometheusRule -oyaml
<--snip-->
      rules:
      - alert: HAProxyReloadFail
        annotations:
          message: HAProxy reloads are failing on {{ $labels.pod }}. Router is not respecting recently created or modified routes
        expr: template_router_reload_failure == 1
        for: 5m
        labels:
          severity: warning
      - alert: HAProxyDown
        annotations:
          message: HAProxy metrics are reporting that HAProxy is down on pod {{ $labels.namespace }} / {{ $labels.pod }}    <--verified https://github.com/openshift/cluster-ingress-operator/pull/597/
        expr: haproxy_up == 0
        for: 5m
        labels:
          severity: critical

Comment 9 errata-xmlrpc 2021-07-27 22:54:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Note You need to log in before you can comment on or make changes to this bug.