Description of problem: Two seemingly unrelated issues on OSD today where metrics that should be in prometheus were not showing up. Finally noticed prometheus operator was logging errors. We didn't snag a must-gather for them as they were happening. Will make sure we watch for it and do that next time. Version-Release number of selected component (if applicable): 4.5.16 How reproducible: Unknown Steps to Reproduce: 1. Unknown 2. 3. Actual results: Prometheus doesn't register new targets. Prometheus doesn't update existing targets. Expected results: Prometheus registers new targets and updates existing targets. Additional info: E1112 20:40:04.648331 1 reflector.go:178] github.com/coreos/prometheus-operator/pkg/prometheus/operator.go:485: Failed to list *v1.ServiceMonitor: resourceVersion: Invalid value: "72124 060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/7212406 0/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/ 72124060/72124060/72124060": strconv.ParseUint: parsing "72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060 /72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/7 2124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060/72124060": invalid syntax E1112 20:40:37.584245 1 reflector.go:178] github.com/coreos/prometheus-operator/pkg/thanos/operator.go:322: Failed to list *v1.PrometheusRule: resourceVersion: Invalid value: "76179311/ 76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76 179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/7617 9311/76179311/76179311": strconv.ParseUint: parsing "76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311/76179311": invalid syntax E1112 20:40:47.349089 1 reflector.go:178] github.com/coreos/prometheus-operator/pkg/prometheus/operator.go:486: Failed to list *v1.PodMonitor: resourceVersion: Invalid value: "72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754": strconv.ParseUint: parsing "72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754/72127754": invalid syntax
On one cluster it was an SRE operator having problems registering a new ServiceMonitor. On another cluster it was FluentdNodeDown with message "Prometheus could not scrape fluentd for more than 10m." For the fluentd cluster, prom query `{job="fluentd"}` had no results (none!). On a healthy cluster w/ logging that query returns 1400+ time series. Workaround right is to delete the prometheus-operator pod.
it seems the same issue with bug 1891815
*** This bug has been marked as a duplicate of bug 1891815 ***