Created attachment 1721757 [details] Prometheus graph for the metric on the default alert Created attachment 1721757 [details] Prometheus graph for the metric on the default alert Description of problem: Router template reloading known error triggers prometheus alerts. ``` E1008 10:38:15.642166 1 limiter.go:165] error reloading router: wait: no child processes ``` I know this is a known router error but I only found a 3.6 bugzilla related to this stating that it won't be fixed since it is a complex race condiction that usually doesn't have any impact on the cluster. The customer case I'm attending however, has a default alert triggering because this error are too frequent with about ~90 routes. [attached screenshot] I'm unsure how common is this or if there's some work to try and fix it (the 3.6 bugzilla is quite old). More information is welcome. I've recommended the customer to silence the default alert and create a new one according to it's current cluster error triggering as workaround. Version-Release number of selected component (if applicable): SOURCE_GIT_TAG=4.0.0-143-ge3b9390 BUILD_VERSION=v4.5.0 How reproducible: See description and attached screenshot. Additional info: - 3.6 bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1442904 - Prometheus alert resource: https://github.com/openshift/cluster-ingress-operator/blob/8aa1ce2f3fc2384f7c1688b8cc16d599f9ac89ea/manifests/0000_90_ingress-operator_03_prometheusrules.yaml
We have fixed the issue in 4.6 as bug 1859134 and are in the process of backporting the fix to 4.5 as bug 1885688, so I am marking this bug as a duplicate of the latter. *** This bug has been marked as a duplicate of bug 1885688 ***