Description of problem: login 4.11.0-0.nightly-2022-05-20-213928 hypershift cluster with Guest cluster kubeconfig, default cluster-monitoring-config configmap under openshift-monitoring project see from bug 2089191, update the configmap to attach PVs. at first, the PVs are created and attached to prometheus pods, but after a while, cluster-monitoring-config is reverted to the default setting, this caused the prometheus pod restarted and no PVs are attached. update the configmap to attach PVs # oc -n openshift-monitoring get cm cluster-monitoring-config -oyaml apiVersion: v1 data: config.yaml: | alertmanagerMain: null enableUserWorkload: null grafana: null http: null k8sPrometheusAdapter: null kubeStateMetrics: null openshiftStateMetrics: null prometheusK8s: retention: 3h volumeClaimTemplate: metadata: name: prometheus spec: volumeMode: Filesystem resources: requests: storage: 10Gi prometheusOperator: logLevel: "" nodeSelector: kubernetes.io/os: linux tolerations: null telemeterClient: null thanosQuerier: null kind: ConfigMap metadata: creationTimestamp: "2022-05-23T03:29:46Z" labels: hypershift.io/managed: "true" name: cluster-monitoring-config namespace: openshift-monitoring resourceVersion: "1516" uid: 02a7ae1e-4f03-4b85-bda5-871fa187cd10 # oc get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-6c8c04dc-a009-4d46-a495-0b29290c6c4a 10Gi RWO Delete Bound openshift-monitoring/prometheus-prometheus-k8s-1 gp2 53m pvc-bac05a66-dcd0-4e24-aed4-e02fbabc5c8e 10Gi RWO Delete Bound openshift-monitoring/prometheus-prometheus-k8s-0 gp2 53m # oc -n openshift-monitoring get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE prometheus-prometheus-k8s-0 Bound pvc-bac05a66-dcd0-4e24-aed4-e02fbabc5c8e 10Gi RWO gp2 53m prometheus-prometheus-k8s-1 Bound pvc-6c8c04dc-a009-4d46-a495-0b29290c6c4a 10Gi RWO gp2 53m # oc -n openshift-monitoring get event | grep prometheus-k8s ... 64m Normal WaitForFirstConsumer persistentvolumeclaim/prometheus-prometheus-k8s-0 waiting for first consumer to be created before binding 64m Normal ProvisioningSucceeded persistentvolumeclaim/prometheus-prometheus-k8s-0 Successfully provisioned volume pvc-bac05a66-dcd0-4e24-aed4-e02fbabc5c8e using kubernetes.io/aws-ebs 64m Normal WaitForFirstConsumer persistentvolumeclaim/prometheus-prometheus-k8s-1 waiting for first consumer to be created before binding 64m Normal ProvisioningSucceeded persistentvolumeclaim/prometheus-prometheus-k8s-1 Successfully provisioned volume pvc-6c8c04dc-a009-4d46-a495-0b29290c6c4a using kubernetes.io/aws-ebs after a while, the configmap is reverted # oc -n openshift-monitoring get cm cluster-monitoring-config -oyaml apiVersion: v1 data: config.yaml: | alertmanagerMain: null enableUserWorkload: null grafana: null http: null k8sPrometheusAdapter: null kubeStateMetrics: null openshiftStateMetrics: null prometheusK8s: null prometheusOperator: logLevel: "" nodeSelector: kubernetes.io/os: linux tolerations: null telemeterClient: null thanosQuerier: null kind: ConfigMap metadata: creationTimestamp: "2022-05-23T03:29:46Z" labels: hypershift.io/managed: "true" name: cluster-monitoring-config namespace: openshift-monitoring resourceVersion: "60293" uid: 02a7ae1e-4f03-4b85-bda5-871fa187cd10 Version-Release number of selected component (if applicable): login 4.11.0-0.nightly-2022-05-20-213928 hypershift cluster with Guest cluster kubeconfig How reproducible: always Steps to Reproduce: 1. update the configmap to attach PVs 2. 3. Actual results: after a while, the configmap is reverted Expected results: should not revert Additional info:
I think this is a bug for HyperShift folks as this is not a problem of CMO but of the HyperShift controller as CMO does not reset values in any condition unless the ConfigMap is removed. I suspect something with https://github.com/openshift/hypershift/blob/9fba0b6ed55808f86b1f9d5d13d2837cf5107b5e/control-plane-operator/hostedclusterconfigoperator/controllers/resources/monitoring/config.go#L20
This is definitely a bug in our reconciliation code. However, instead of fixing the reconciliation code, we should remove any reconciliation of the config. @jmarcal if the CMO can default the prometheus operator deployment node selector to not include master when running inside a cluster with a hosted control plane, then we can leave the config as something entirely modified by the user, which is the case with standalone OCP.
Just to remove the needinfo and to make things more traceable, in the CMO PR https://github.com/openshift/cluster-monitoring-operator/pull/1679 we changed the default prometheus operator deployment node selector to not include master when running inside a cluster with a hosted control plane
fix is in 4.11.0-0.nightly-2022-06-15-161625 and configmap could reloaded based on change
(In reply to Junqi Zhao from comment #8) > fix is in 4.11.0-0.nightly-2022-06-15-161625 and configmap could reloaded > based on change ignore, paste to this bug wrongly
tested 4.11.0-0.nightly-2022-06-15-222801 hypershift cluster with Guest cluster kubeconfig, default configmap cluster-monitoring-config is removed # oc -n openshift-monitoring get cm cluster-monitoring-config Error from server (NotFound): configmaps "cluster-monitoring-config" not found followed steps in Comment 0, we can configure monitoring now # oc -n openshift-monitoring get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE prometheus-prometheus-k8s-0 Bound pvc-303fc231-2d44-4810-a49d-b7a510743d7e 10Gi RWO gp2 9m57s # oc -n openshift-monitoring get pod prometheus-k8s-0 -oyaml | grep persistentVolumeClaim -A1 persistentVolumeClaim: claimName: prometheus-prometheus-k8s-0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069