1846207 – Syncing "openshift-monitoring/cluster-monitoring-config" failed

Bug 1846207 - Syncing "openshift-monitoring/cluster-monitoring-config" failed

Summary: Syncing "openshift-monitoring/cluster-monitoring-config" failed

Keywords:
Status:	CLOSED DUPLICATE of bug 1833427
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.4
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Sergiusz Urbaniak
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-06-11 05:14 UTC by Brendan Shirren
Modified:	2023-10-06 20:34 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-06-11 06:27:31 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Brendan Shirren 2020-06-11 05:14:15 UTC

Description of problem:

Upgrade of OCP 4.3.18 to 4.4.5 and the Prometheus monitoring PVCs specified in "cluster-monitoring-config" configmap are no longer mounted to the Prometheus pods. New PVCs are mounted to the Prometheus pods. Same for Alertmanager.

Version-Release number of selected component (if applicable):

4.3+

How reproducible: unknown.


Steps to Reproduce:
1. specify PVCs in "cluster-monitoring-config" configmap for Monitoring components as per documentation [1]
2. upgrade OCP from 4.3 to 4.4
3. confirm "cluster-monitoring-config" configmap still references existing PVCs but new PVCs are created with different names and used by Monitoring pods


Actual results:

New PVCs prefixed with default "prometheus-k8s-db" are created for Prometheus and PVC default prefix "alertmanager-main-db" for Alertmanager.


Expected results:

Existing PVCs specified in "cluster-monitoring-config" configmap are reused.


Additional info:

Logs from "cluster-monitoring-operator" pod:

2020-06-01T17:00:59.149753868Z E0601 17:00:59.149691       1 operator.go:273] Syncing "openshift-monitoring/cluster-monitoring-config" failed
2020-06-01T17:00:59.149753868Z E0601 17:00:59.149730       1 operator.go:274] sync "openshift-monitoring/cluster-monitoring-config" failed: running task Updating Alertmanager failed: waiting for Alertmanager object changes failed: waiting for Alertmanager: expected 3 replicas, updated 2 and available 2
2020-06-01T17:06:08.475033961Z E0601 17:06:08.474934       1 operator.go:273] Syncing "openshift-monitoring/cluster-monitoring-config" failed
2020-06-01T17:06:08.475033961Z E0601 17:06:08.474978       1 operator.go:274] sync "openshift-monitoring/cluster-monitoring-config" failed: running task Updating node-exporter failed: reconciling node-exporter DaemonSet failed: updating DaemonSet object failed: waiting for DaemonSetRollout of node-exporter: daemonset node-exporter is not ready. status: (desired: 9, updated: 9, ready: 8, unavailable: 1)
2020-06-01T17:18:46.45138148Z E0601 17:18:46.451299       1 operator.go:273] Syncing "openshift-monitoring/cluster-monitoring-config" failed
2020-06-01T17:18:46.45147981Z E0601 17:18:46.451356       1 operator.go:274] sync "openshift-monitoring/cluster-monitoring-config" failed: running task Updating Alertmanager failed: waiting for Alertmanager object changes failed: waiting for Alertmanager: expected 3 replicas, updated 2 and available 2
2020-06-01T17:24:12.494796194Z E0601 17:24:12.494724       1 operator.go:273] Syncing "openshift-monitoring/cluster-monitoring-config" failed
2020-06-01T17:24:12.494796194Z E0601 17:24:12.494768       1 operator.go:274] sync "openshift-monitoring/cluster-monitoring-config" failed: running task Updating Alertmanager failed: waiting for Alertmanager object changes failed: waiting for Alertmanager: expected 3 replicas, updated 2 and available 2
2020-06-01T17:29:38.786566149Z E0601 17:29:38.786515       1 operator.go:273] Syncing "openshift-monitoring/cluster-monitoring-config" failed
2020-06-01T17:29:38.786626468Z E0601 17:29:38.786554       1 operator.go:274] sync "openshift-monitoring/cluster-monitoring-config" failed: running task Updating Prometheus-k8s failed: waiting for Prometheus object changes failed: waiting for Prometheus: expected 2 replicas, updated 1 and available 1
2020-06-01T17:35:04.802625555Z E0601 17:35:04.802556       1 operator.go:273] Syncing "openshift-monitoring/cluster-monitoring-config" failed
2020-06-01T17:35:04.802686897Z E0601 17:35:04.802610       1 operator.go:274] sync "openshift-monitoring/cluster-monitoring-config" failed: running task Updating Alertmanager failed: waiting for Alertmanager object changes failed: waiting for Alertmanager: expected 3 replicas, updated 2 and available 2
2020-06-01T17:40:31.00283567Z E0601 17:40:31.002760       1 operator.go:273] Syncing "openshift-monitoring/cluster-monitoring-config" failed
2020-06-01T17:40:31.00283567Z E0601 17:40:31.002802       1 operator.go:274] sync "openshift-monitoring/cluster-monitoring-config" failed: running task Updating Alertmanager failed: waiting for Alertmanager object changes failed: waiting for Alertmanager: expected 3 replicas, updated 2 and available 2



[1] https://docs.openshift.com/container-platform/4.3/monitoring/cluster_monitoring/configuring-the-monitoring-stack.html#maintenance-and-support_configuring-monitoring

[2] https://bugzilla.redhat.com/show_bug.cgi?id=1820229
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1807430

Comment 1 Brendan Shirren 2020-06-11 05:20:27 UTC

NAME                                           STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS               AGE
alertmanager-main-db-alertmanager-main-0       Bound    pvc-6c7e2adc-891d-45c5-8075-f7cf486937ac   64Mi       RWO            iscsi-targetd-vg-targetd   3d2h
alertmanager-main-db-alertmanager-main-1       Bound    pvc-02601d42-3f90-435c-a0a3-60f3cdf4195f   64Mi       RWO            iscsi-targetd-vg-targetd   3d2h
alertmanager-main-db-alertmanager-main-2       Bound    pvc-78bfd302-0298-4240-b570-7ffe667851c9   64Mi       RWO            iscsi-targetd-vg-targetd   3d2h
atlas-alertmanager-claim-alertmanager-main-0   Bound    pvc-ec98337e-7837-11ea-9631-588a5aca6333   64Mi       RWO            iscsi-targetd-vg-targetd   58d
atlas-alertmanager-claim-alertmanager-main-1   Bound    pvc-ecb1e2af-7837-11ea-9631-588a5aca6333   64Mi       RWO            iscsi-targetd-vg-targetd   58d
atlas-alertmanager-claim-alertmanager-main-2   Bound    pvc-ecd6f57d-7837-11ea-9631-588a5aca6333   64Mi       RWO            iscsi-targetd-vg-targetd   58d
atlas-prometheus-claim-prometheus-k8s-0        Bound    pvc-7b21da77-ce69-453d-8ce5-22401f16d0b6   10Gi       RWO            iscsi-targetd-vg-targetd   8d
atlas-prometheus-claim-prometheus-k8s-1        Bound    pvc-ee78b1f0-6ceb-4e6e-904d-399466cebfc1   10Gi       RWO            iscsi-targetd-vg-targetd   8d
prometheus-k8s-db-prometheus-k8s-0             Bound    pvc-95acc401-e6e1-4696-8473-300510e3b342   10Gi       RWO            iscsi-targetd-vg-targetd   3d2h
prometheus-k8s-db-prometheus-k8s-1             Bound    pvc-a6afedd8-7d07-486f-9a24-60a89976674c   10Gi       RWO            iscsi-targetd-vg-targetd   3d2h



$ oc get cm cluster-monitoring-config -n openshift-monitoring -o yaml
apiVersion: v1
data:
  config.yaml: |
    prometheusK8s:
      volumeClaimTemplate:
        metadata:
          name: atlas-prometheus-claim
        spec:
          storageClassName: iscsi-targetd-vg-targetd
          volumeMode: Filesystem
          resources:
            requests:
              storage: 10Gi
      retention: 3d
    alertmanagerMain:
      volumeClaimTemplate:
        metadata:
          name: atlas-alertmanager-claim
        spec:
          storageClassName: iscsi-targetd-vg-targetd
          volumeMode: Filesystem
          resources:
            requests:
              storage: 64Mi
      retention: 6d
    techPreviewUserWorkload:
      enabled: true
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring

Comment 2 Junqi Zhao 2020-06-11 06:27:31 UTC


*** This bug has been marked as a duplicate of bug 1833427 ***

Note You need to log in before you can comment on or make changes to this bug.