Description of problem: Prometheus, during WAL replay returns 503 http codes when any endpoint is invoked. This causes constant issues and potentially endless restart loops. This has been reported upstream in https://github.com/prometheus-operator/prometheus-operator/issues/3391, fixed in v0.46.0 of prometheus-operator.
4.8 uses prometheus-operator 0.45 # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-03-06-055252 True False 148m Cluster version is 4.8.0-0.nightly-2021-03-06-055252 # oc -n openshift-monitoring logs prometheus-operator-684947f46c-28gxl -c prometheus-operator level=info ts=2021-03-07T23:43:09.721849275Z caller=main.go:233 msg="Starting Prometheus Operator" version="(version=0.45.0, branch=rhaos-4.8-rhel-8, revision=9d3e9a6)"
Dear Red Hat, Does Red Hat have a plan to backport this fix to old versions? Our customer got the same issue in OCP4.5. 4.8 is still not GA version so they cannot upgraded to it now. How can we avoid this issue with old versions? Currently only one prometheus instance is running on the customer's env since other one has got restarted repeatedly due to this issue. If the same issue happened in both instances, cluster-monitoring becomes completely unavailable. It is very critical. Best Regards, Masaki Hatada
The bug fix has been backported to 4.6.22 (bug 1935586) and 4.7.2 (bug 1935585).
Dear Simon, Thank you for your update. > The bug fix has been backported to 4.6.22 (bug 1935586) and 4.7.2 (bug 1935585). Our customer is using OCP4.5. Of course, we will upgrade their cluster in future but it will take a time. Please let us know if there is some workaround of this issue. Best Regards, Masaki Hatada
Unfortunately we have no workaround for 4.5.
clearing needinfo flag.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438