1872253 – Invalid service monitors block the update of the user workload monitoring prometheus

Bug 1872253 - Invalid service monitors block the update of the user workload monitoring prometheus

Summary: Invalid service monitors block the update of the user workload monitoring pro...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Simon Pasquier
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-08-25 09:42 UTC by Simon Pasquier
Modified:	2020-10-27 16:33 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-10-27 16:33:06 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift prometheus-operator pull 89	None	open	Bug 1872253: skip invalid service monitors	2020-08-31 16:07:19 UTC
Github	prometheus-operator prometheus-operator issues 3327	None	open	Add new metric when prometheus is stuck on "creating config failed"	2020-08-31 13:41:01 UTC
Github	prometheus-operator prometheus-operator issues 3329	None	open	Incorrect ServiceMonitor blocks prometheus deployment	2020-08-31 13:41:01 UTC
Github	prometheus-operator prometheus-operator pull 3445	None	open	pkg/prometheus: skip invalid service monitors	2020-08-31 13:41:01 UTC
Red Hat Product Errata	RHBA-2020:4196	None	None	None	2020-10-27 16:33:08 UTC

Description Simon Pasquier 2020-08-25 09:42:00 UTC

Description of problem:
Whenever a service monitor references an invalid secret or configmap's key, the prometheus operator wouldn't update the Prometheus configuration. It shouldn't be a big issue for the infra Prometheus because we pretty control what goes in but it's more problematic for user workload monitoring (basically a bad service monitor can DoS the service).

Version-Release number of selected component (if applicable):
4.6

How reproducible:
Always

Steps to Reproduce:
1. Enable user workload monitoring
2. Create a secret + a service monitor that references this secret but with an invalid key
apiVersion: v1
data: {}
kind: Secret
metadata:
  name: demo
  namespace: default
type: Opaque

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: demo
  namespace: default
spec:
  endpoints:
  - port: web
    bearerTokenSecret:
      key: missing
      name: demo
  selector:
    matchLabels:
      app: demo

3. Create a valid service monitor

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: demo2
  namespace: default
spec:
  endpoints:
  - port: web
  selector:
    matchLabels:
      app: demo2

Actual results:
The second service monitor isn't present in the Prometheus configuration.

Expected results:
The second service monitor should be present in the Prometheus configuration.

Additional info:
https://github.com/prometheus-operator/prometheus-operator/issues/3327

Comment 6 errata-xmlrpc 2020-10-27 16:33:06 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Note You need to log in before you can comment on or make changes to this bug.