Bug 1959278 - Should remove prometheus servicemonitor from openshift-user-workload-monitoring
Summary: Should remove prometheus servicemonitor from openshift-user-workload-monitoring
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.8
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.8.0
Assignee: Brad Ison
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-11 07:45 UTC by Junqi Zhao
Modified: 2021-07-27 23:08 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 23:07:53 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 1166 0 None open WIP: Bug 1959278: Remove obsolete user-workload ServiceMonitor 2021-05-18 10:49:17 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:08:11 UTC

Description Junqi Zhao 2021-05-11 07:45:44 UTC
Description of problem:
this bug is found when verifing bug 1952744
enabled user workload monitoring and upgrade from 4.7.10 to 4.8.0-0.nightly-2021-05-10-225140
in 4.7.10 cluster, there is prometheus servicemonitor under openshift-user-workload-monitoring, since 4.8, prometheus servicemonitor is renamed to prometheus-user-workload, we should delete prometheus servicemonitor, but it still exists after upgrade to 4.8

NOTE: no functional effect in OpenShift cluster, since it reports "Error on ingesting samples with different value but same timestamp" in OSD cluster which bug 1952744 mentioned, we should remove the prometheus servicemonitor from openshift-user-workload-monitoring project

level=warn ts=2021-04-23T02:51:03.446Z caller=scrape.go:1375 component="scrape manager" scrape_pool=openshift-user-workload-monitoring/prometheus-user-workload/0 target=https://10.130.6.24:9091/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=7
***********************************
# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.10    True        False         4m23s   Cluster version is 4.7.10

# oc -n openshift-user-workload-monitoring get servicemonitor
NAME                  AGE
prometheus            9m2s
prometheus-operator   9m18s
thanos-sidecar        9m2s
***********************************

after upgrade to 4.8.0-0.nightly-2021-05-10-225140
*************************************************
# oc -n openshift-user-workload-monitoring get servicemonitor
NAME                       AGE
prometheus                 71m
prometheus-operator        71m
prometheus-user-workload   29m
thanos-ruler               23m
thanos-sidecar             71m

# oc -n openshift-user-workload-monitoring get servicemonitor prometheus -oyaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  creationTimestamp: "2021-05-11T06:27:59Z"
  generation: 1
  labels:
    k8s-app: prometheus
  name: prometheus
  namespace: openshift-user-workload-monitoring
  resourceVersion: "29261"
  uid: 2e00db1d-e711-4d7c-bbae-9bb02edd18cd
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    bearerTokenSecret:
      key: ""
    interval: 30s
    port: metrics
    scheme: https
    tlsConfig:
      ca: {}
      caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
      cert: {}
      serverName: prometheus-user-workload.openshift-user-workload-monitoring.svc
  namespaceSelector: {}
  selector:
    matchLabels:
      prometheus: user-workload

# oc -n openshift-user-workload-monitoring get servicemonitor prometheus-user-workload  -oyaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  creationTimestamp: "2021-05-11T07:10:02Z"
  generation: 1
  labels:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: openshift-monitoring
    app.kubernetes.io/version: 2.26.0
  name: prometheus-user-workload
  namespace: openshift-user-workload-monitoring
  resourceVersion: "47597"
  uid: 86f2236a-ec57-47a9-84e2-e85370e11e63
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    bearerTokenSecret:
      key: ""
    interval: 30s
    port: metrics
    scheme: https
    tlsConfig:
      ca: {}
      caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
      cert: {}
      serverName: prometheus-user-workload.openshift-user-workload-monitoring.svc
  namespaceSelector: {}
  selector:
    matchLabels:
      app.kubernetes.io/component: prometheus
      app.kubernetes.io/name: prometheus
      app.kubernetes.io/part-of: openshift-monitoring
      prometheus: user-workload
*************************************************

Version-Release number of selected component (if applicable):
upgrade from 4.7.10 to 4.8.0-0.nightly-2021-05-10-225140

How reproducible:
always

Steps to Reproduce:
1. enabled user workload monitoring and upgrade from 4.7.10 to 4.8.0-0.nightly-2021-05-10-225140
oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.8.0-0.nightly-2021-05-10-225140 --allow-explicit-upgrade=true --force
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Brad Ison 2021-05-20 07:39:33 UTC
Thanks for pointing this out. This should be removed now in latest builds of master.

Comment 5 Junqi Zhao 2021-05-24 07:44:40 UTC
upgrade from 4.7.10 to 4.8.0-0.nightly-2021-05-21-233425, no prometheus servicemonitor now
4.7.10
# oc -n openshift-user-workload-monitoring get servicemonitor
NAME                       AGE
prometheus-operator        139m
prometheus-user-workload   74m
thanos-ruler               68m
thanos-sidecar             139m

upgrade to 4.8.0-0.nightly-2021-05-21-233425
# oc -n openshift-user-workload-monitoring get servicemonitor
NAME                       AGE
prometheus-operator        139m
prometheus-user-workload   74m
thanos-ruler               68m
thanos-sidecar             139m

Comment 8 errata-xmlrpc 2021-07-27 23:07:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.