Bug 2051470

Summary: prometheus: Add validations for relabel configs
Product: OpenShift Container Platform Reporter: German Parente <gparente>
Component: MonitoringAssignee: Jayapriya Pai <janantha>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact: Brian Burt <bburt>
Priority: medium    
Version: 4.9CC: amuller, anpicker, aos-bugs, bburt, janantha, spasquie
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Before this update prometheus-operator allowed invalid relabel configs, after this change it will validate the config passed
Story Points: ---
Clone Of:
: 2060718 (view as bug list) Environment:
Last Closed: 2022-08-10 10:47:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2060718    

Description German Parente 2022-02-07 10:48:34 UTC
Description of problem:

this is just a bugzilla to document this upstream feature downstream:

https://github.com/prometheus-operator/prometheus-operator/pull/4429

it's fixed in 0.54 release of prometheus and this BZ is intended to match this to downstream version.


How reproducible:

Just add a relabelings in the spec and remove the targetLabel. Example:

From this:

spec:
  endpoints:
  - interval: 30s
    port: 8080-tcp
    scheme: http
    path: /actuator/prometheus
    relabelings:
    - action: replace
      regex: (.+)
      sourceLabels:
      - __meta_kubernetes_namespace
      targetLabel: namespace

then, let's remove targetLabel:

spec:
  endpoints:
  - interval: 30s
    port: 8080-tcp
    scheme: http
    path: /actuator/prometheus
    relabelings:
    - action: replace
      regex: (.+)
      sourceLabels:
      - __meta_kubernetes_namespace

We will see this:

level=error ts=2022-01-31T09:52:55.924Z caller=main.go:729 msg="Error reloading config" err="couldn't load configuration (--config.file=\"/etc/prometheus/config_out/prometheus.env.yaml\"): parsing YAML file /etc/prometheus/config_out/prometheus.env.yaml: relabel configuration for replace action requires 'target_label' value"

And no new configuration will be loaded because of one single invalid servicemonitor.

Comment 1 Jayapriya Pai 2022-02-23 14:08:40 UTC
relabel validation is included in 0.54 its already updated downstream https://github.com/openshift/prometheus-operator/pull/151
https://github.com/openshift/cluster-monitoring-operator/pull/1556 also brings this change to CMO once this PR is merged we are good to close this bug

Comment 2 Simon Pasquier 2022-02-23 14:37:37 UTC
For safety, we should probably bump to v0.54.1 once it's available.

Comment 3 Jayapriya Pai 2022-02-23 14:48:04 UTC
Sure once 0.54.1 is released I can pull that downstream and update in cmo

Comment 8 Junqi Zhao 2022-03-03 10:30:18 UTC
test with 4.11.0-0.nightly-2022-03-03-061758, prometheus operator 0.54.1, prometheus 2.32.1, no error in the prometheus-k8s pod and the servicemonitor is skipped, so it would not loaded to prometheus configuration, and there's error "relabel configuration for replace action needs targetLabel value" in prometheus operator

# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- cat /etc/prometheus/config_out/prometheus.env.yaml | grep "serviceMonitor/openshift-console/console-test/0"
no result

# oc -n openshift-monitoring logs -c prometheus prometheus-k8s-0 | grep "couldn't load configuration"
no result

# oc -n openshift-monitoring logs -c prometheus-operator $(oc -n openshift-monitoring get pod --no-headers | grep prometheus-operator | awk '{print $1}') | grep "relabel configuration for replace action needs targetLabel value"
level=warn ts=2022-03-03T10:10:33.784201269Z caller=operator.go:1837 component=prometheusoperator msg="skipping servicemonitor" error="relabel configuration for replace action needs targetLabel value" servicemonitor=openshift-console/console-test namespace=openshift-monitoring prometheus=k8s
level=warn ts=2022-03-03T10:10:34.020788319Z caller=operator.go:1837 component=prometheusoperator msg="skipping servicemonitor" error="relabel configuration for replace action needs targetLabel value" servicemonitor=openshift-console/console-test namespace=openshift-monitoring prometheus=k8s
...

Comment 13 errata-xmlrpc 2022-08-10 10:47:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069