Description of problem ====================== There are events in openshift-monitoring namespace, which complains about missing alertmanager-trusted-ca-bundle-XXXXXX config map, but the one checks it, the configmap seems to exists. Version-Release number of selected component ============================================ cluster channel: stable-4.2 cluster version: 4.2.0-0.nightly-2019-08-26-235330 cluster image: registry.svc.ci.openshift.org/ocp/release@sha256:4b1f127d3d13e63ec0210568bc5aada642d1a97e3dfebd4b534257657011acce namespace openshift-cluster-storage-operator image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:135c4846c99f3da1f3e3e9c17ad37135efdd9d1bc3fa61231f1c41e10b8c2172 * quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:135c4846c99f3da1f3e3e9c17ad37135efdd9d1bc3fa61231f1c41e10b8c2172 namespace openshift-storage image quay.io/cephcsi/cephcsi:canary * quay.io/cephcsi/cephcsi@sha256:65bda97c05d01dd6bcb76c93e61cf0f0972b7e130406692143a6c18c7e9c00fa image quay.io/k8scsi/csi-node-driver-registrar:v1.1.0 * quay.io/k8scsi/csi-node-driver-registrar@sha256:13daf82fb99e951a4bff8ae5fc7c17c3a8fe7130be6400990d8f6076c32d4599 image quay.io/k8scsi/csi-attacher:v1.2.0 * quay.io/k8scsi/csi-attacher@sha256:26fccd7a99d973845df1193b46ebdcc6ab8dc5f6e6be319750c471fce1742d13 image quay.io/k8scsi/csi-provisioner:v1.3.0 * quay.io/k8scsi/csi-provisioner@sha256:e615e92233248e72f046dd4f5fac40e75dd49f78805801953a7dfccf4eb09148 image quay.io/k8scsi/csi-snapshotter:v1.2.0 * quay.io/k8scsi/csi-snapshotter@sha256:6f12a57ef46c340c475489cac8d63c2431033961deaf40414208edebee50b640 image docker.io/ceph/ceph:v14.2.2-20190722 * docker.io/ceph/ceph@sha256:567fe78d90a63ead11deadc2cbf5a912e42bfcc6ef4b1d6154f4b4fea4019052 image docker.io/rook/ceph:master * docker.io/rook/ceph@sha256:16feb1c77281e9eee66cdd3ee78e1b7642283e0f3537322873bf2cf5744b7517 How reproducible ================ 1/1 Steps to Reproduce ================== 1. Install OCP/OCS cluster (I did this via red-hat-storage/ocs-ci, using upstream OCS images and with monitoring enabled, ocs-ci commit b304d0a) 2. List events in openshift-monitoring namespace Actual results ============== There are events reporting a mount failure of configmap which doesn't exists: ``` $ oc get events -n openshift-monitoring LAST SEEN TYPE REASON OBJECT MESSAGE 77m Warning FailedMount pod/alertmanager-main-0 MountVolume.SetUp failed for volume "configmap-alertmanager-trusted-ca-bundle-cquddrmb6dfoh" : configmaps "alertmanager-trusted-ca-bundle-cquddrmb6dfoh" not found 4m49s Warning FailedMount pod/alertmanager-main-1 MountVolume.SetUp failed for volume "configmap-alertmanager-trusted-ca-bundle-cquddrmb6dfoh" : configmaps "alertmanager-trusted-ca-bundle-cquddrmb6dfoh" not found 84m Warning FailedMount pod/alertmanager-main-2 MountVolume.SetUp failed for volume "configmap-alertmanager-trusted-ca-bundle-cquddrmb6dfoh" : configmaps "alertmanager-trusted-ca-bundle-cquddrmb6dfoh" not found ``` But I can query for the configmap which was not found, I see it without any problems: ``` $ oc get configmap/alertmanager-trusted-ca-bundle-cquddrmb6dfoh -n openshift-monitoring NAME DATA AGE alertmanager-trusted-ca-bundle-cquddrmb6dfoh 1 7s ``` Expected results ================ There is no event or the event reports the problem in more specific way, which seems not to conflict with 1st observation. Additional info =============== All pods in openshift-monitoring namespace seems to be running: ``` $ oc get pods -n openshift-monitoring NAME READY STATUS RESTARTS AGE alertmanager-main-0 3/3 Running 0 8h alertmanager-main-1 3/3 Running 0 8h alertmanager-main-2 3/3 Running 0 8h cluster-monitoring-operator-775b45bc8b-sx4h7 1/1 Running 0 8h grafana-867bfddd4d-bsj2g 2/2 Running 0 8h kube-state-metrics-7f4cdccd7c-4vlt2 3/3 Running 0 8h node-exporter-hbvxl 2/2 Running 0 8h node-exporter-hs2q8 2/2 Running 0 8h node-exporter-jf6wd 2/2 Running 0 8h node-exporter-kxp88 2/2 Running 0 8h node-exporter-n89lb 2/2 Running 0 8h node-exporter-p9bh7 2/2 Running 0 8h openshift-state-metrics-6d66db6574-l8s7j 3/3 Running 0 8h prometheus-adapter-bf745d6cd-25jlm 1/1 Running 0 8h prometheus-adapter-bf745d6cd-jjx8x 1/1 Running 0 8h prometheus-k8s-0 6/6 Running 1 8h prometheus-k8s-1 6/6 Running 1 8h prometheus-operator-6d5b8887d6-xtxgm 1/1 Running 0 8h telemeter-client-678957d86-6wjk4 3/3 Running 0 8h ``` There is no obvious error with monitoring at first sight: watchdog alert is firing and there are 115 not firigh alerts. That said, I haven't tested particular monitoring features in more detail. One can see alertmanager-main web interface (as listed eg. in `oc get routes -n openshift-monitoring`), but it asks me again for kubeadmin credentials, even when I select to login with openshift.
With 4.2.0-0.nightly-2019-08-28-083236 build, I noticed a similar problem with openshift-storage, and since this was triaged and fixed in cluster-monitoring-operator, I reported a new bug for openshift-storage: BZ 1746536 Would it make sense to check whether is this the same kind of issue in a different operator or whether there is a something deeper in OCP to improve? Off course, it's also possible that the 2 bugs are not related at all.
let the cluster run a few hours, and check events, there is not "missing alertmanager-trusted-ca-bundle-XXXXXX config map" events $ oc -n openshift-monitoring get event payload: 4.2.0-0.nightly-2019-08-29-170426
Just to confirm this only, at least for me, now happens when you delete the configmap, correct?
(In reply to Lili Cosic from comment #7) > Just to confirm this only, at least for me, now happens when you delete the > configmap, correct? did not delete the configmap. and there is not such issue in my fresh environment now, but there is some secrets report missing # oc -n openshift-monitoring get event | grep "not found" 93m Warning FailedMount pod/grafana-787654dccf-ccprz MountVolume.SetUp failed for volume "secret-grafana-tls" : secrets "grafana-tls" not found 95m Warning FailedMount pod/node-exporter-4l7xt MountVolume.SetUp failed for volume "node-exporter-tls" : secrets "node-exporter-tls" not found 95m Warning FailedMount pod/node-exporter-r454t MountVolume.SetUp failed for volume "node-exporter-tls" : secrets "node-exporter-tls" not found 95m Warning FailedMount pod/node-exporter-wzkq4 MountVolume.SetUp failed for volume "node-exporter-tls" : secrets "node-exporter-tls" not found # oc -n openshift-monitoring get secrets | grep -e grafana-tls -e node-exporter-tls grafana-tls kubernetes.io/tls 2 93m node-exporter-tls kubernetes.io/tls 2 95m
(In reply to Lili Cosic from comment #7) > Just to confirm this only, at least for me, now happens when you delete the > configmap, correct? will watch on different clusters, if there is not such issue, will close this bug
close it with 4.2.0-0.nightly-2019-09-04-142146 build, there is not "missing alertmanager-trusted-ca-bundle-XXXXXX config map" events
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922