This is the alert portion backport of https://bugzilla.redhat.com/show_bug.cgi?id=1861455
Add router template reload failure alert. Also add basic HAProxy up alert.
The PR merge made into "4.5.0-0.nightly-2020-08-27-040633" payload. With the patch in place, it is noted that, the ingress CO goes into degraded state along with the alerts firing specifically indicating the router failure as per the set alert rules:
-----
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.5.0-0.nightly-2020-08-27-040633 True False 140m Error while reconciling 4.5.0-0.nightly-2020-08-27-040633: the cluster operator ingress is degraded
oc get co ingress
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
ingress 4.5.0-0.nightly-2020-08-27-040633 False True True 16m
$ oc -n openshift-ingress logs deployments/router-default --tail 5
Found 2 pods, using pod/router-default-945d7559f-fmccq
[ALERT] 243/061753 (633) : Fatal errors found in configuration.
E0831 06:17:58.879782 1 limiter.go:165] error reloading router: exit status 1
[ALERT] 243/061758 (636) : parsing [/var/lib/haproxy/conf/haproxy.config:319] : timer overflow in argument '999d' to 'timeout server' (maximum value is 2147483647 ms or ~24.8 days)
[ALERT] 243/061758 (636) : Error(s) found in configuration file : /var/lib/haproxy/conf/haproxy.config
[ALERT] 243/061758 (636) : Fatal errors found in configuration.
-----
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (OpenShift Container Platform 4.5.8 bug fix update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2020:3510