Bug 1959278

Summary: Should remove prometheus servicemonitor from openshift-user-workload-monitoring
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: MonitoringAssignee: Brad Ison <brad.ison>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: low Docs Contact:
Priority: low    
Version: 4.8CC: alegrand, anpicker, aos-bugs, erooth, kakkoyun, lcosic, pkrupa
Target Milestone: ---Keywords: Regression
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 23:07:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Junqi Zhao 2021-05-11 07:45:44 UTC
Description of problem:
this bug is found when verifing bug 1952744
enabled user workload monitoring and upgrade from 4.7.10 to 4.8.0-0.nightly-2021-05-10-225140
in 4.7.10 cluster, there is prometheus servicemonitor under openshift-user-workload-monitoring, since 4.8, prometheus servicemonitor is renamed to prometheus-user-workload, we should delete prometheus servicemonitor, but it still exists after upgrade to 4.8

NOTE: no functional effect in OpenShift cluster, since it reports "Error on ingesting samples with different value but same timestamp" in OSD cluster which bug 1952744 mentioned, we should remove the prometheus servicemonitor from openshift-user-workload-monitoring project

level=warn ts=2021-04-23T02:51:03.446Z caller=scrape.go:1375 component="scrape manager" scrape_pool=openshift-user-workload-monitoring/prometheus-user-workload/0 target=https://10.130.6.24:9091/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=7
***********************************
# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.10    True        False         4m23s   Cluster version is 4.7.10

# oc -n openshift-user-workload-monitoring get servicemonitor
NAME                  AGE
prometheus            9m2s
prometheus-operator   9m18s
thanos-sidecar        9m2s
***********************************

after upgrade to 4.8.0-0.nightly-2021-05-10-225140
*************************************************
# oc -n openshift-user-workload-monitoring get servicemonitor
NAME                       AGE
prometheus                 71m
prometheus-operator        71m
prometheus-user-workload   29m
thanos-ruler               23m
thanos-sidecar             71m

# oc -n openshift-user-workload-monitoring get servicemonitor prometheus -oyaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  creationTimestamp: "2021-05-11T06:27:59Z"
  generation: 1
  labels:
    k8s-app: prometheus
  name: prometheus
  namespace: openshift-user-workload-monitoring
  resourceVersion: "29261"
  uid: 2e00db1d-e711-4d7c-bbae-9bb02edd18cd
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    bearerTokenSecret:
      key: ""
    interval: 30s
    port: metrics
    scheme: https
    tlsConfig:
      ca: {}
      caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
      cert: {}
      serverName: prometheus-user-workload.openshift-user-workload-monitoring.svc
  namespaceSelector: {}
  selector:
    matchLabels:
      prometheus: user-workload

# oc -n openshift-user-workload-monitoring get servicemonitor prometheus-user-workload  -oyaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  creationTimestamp: "2021-05-11T07:10:02Z"
  generation: 1
  labels:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: openshift-monitoring
    app.kubernetes.io/version: 2.26.0
  name: prometheus-user-workload
  namespace: openshift-user-workload-monitoring
  resourceVersion: "47597"
  uid: 86f2236a-ec57-47a9-84e2-e85370e11e63
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    bearerTokenSecret:
      key: ""
    interval: 30s
    port: metrics
    scheme: https
    tlsConfig:
      ca: {}
      caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
      cert: {}
      serverName: prometheus-user-workload.openshift-user-workload-monitoring.svc
  namespaceSelector: {}
  selector:
    matchLabels:
      app.kubernetes.io/component: prometheus
      app.kubernetes.io/name: prometheus
      app.kubernetes.io/part-of: openshift-monitoring
      prometheus: user-workload
*************************************************

Version-Release number of selected component (if applicable):
upgrade from 4.7.10 to 4.8.0-0.nightly-2021-05-10-225140

How reproducible:
always

Steps to Reproduce:
1. enabled user workload monitoring and upgrade from 4.7.10 to 4.8.0-0.nightly-2021-05-10-225140
oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.8.0-0.nightly-2021-05-10-225140 --allow-explicit-upgrade=true --force
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Brad Ison 2021-05-20 07:39:33 UTC
Thanks for pointing this out. This should be removed now in latest builds of master.

Comment 5 Junqi Zhao 2021-05-24 07:44:40 UTC
upgrade from 4.7.10 to 4.8.0-0.nightly-2021-05-21-233425, no prometheus servicemonitor now
4.7.10
# oc -n openshift-user-workload-monitoring get servicemonitor
NAME                       AGE
prometheus-operator        139m
prometheus-user-workload   74m
thanos-ruler               68m
thanos-sidecar             139m

upgrade to 4.8.0-0.nightly-2021-05-21-233425
# oc -n openshift-user-workload-monitoring get servicemonitor
NAME                       AGE
prometheus-operator        139m
prometheus-user-workload   74m
thanos-ruler               68m
thanos-sidecar             139m

Comment 8 errata-xmlrpc 2021-07-27 23:07:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438