Description of problem: - The alertmanager-main-2 and prometheus-k8s-1 pods are not able to start, stalling in Terminated state. - The other pods from statefulset are running normally - without any error. - both problematic pods are started on the same infra node - other pods are running normally (non monitoring ones) alertmanager-main-2 0/5 Terminating 0 1s prometheus-k8s-1 0/7 Terminating 0 2s After checking logs from kubelet and crio - I see that the containers are constantly brought down and up in endless loop without removing the status. Scaling down and up the sts won't help. Restart of the kubelet won't solve the issue. Version-Release number of selected component (if applicable): OpenShift Container Platform 4.5 How reproducible: #n/a Additional info: will provide data in additional comment
The prometheus-operator is erroring with the following message: """ 2020-10-16T15:33:42.713855516+00:00 stderr F E1016 15:33:42.713828 1 operator.go:996] Sync "openshift-monitoring/k8s" failed: failed to create new ConfigMap 'prometheus-k8s-rulefiles-0': configmaps "prometheus-k8s-rulefiles-0" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil> Then, it repeatedly posts: """ 2020-10-17T18:22:33.440002053Z E1017 18:22:33.439961 1 operator.go:996] Sync "openshift-monitoring/k8s" failed: failed to create new ConfigMap 'prometheus-k8s-rulefiles-0': configmaps is forbidden: User "system:serviceaccount:openshift-monitoring:prometheus-operator" cannot create resource "configmaps" in API group "" in the namespace "openshift-monitoring" 2020-10-17T18:22:46.314927734Z E1017 18:22:46.314847 1 operator.go:996] Sync "openshift-monitoring/k8s" failed: failed to create ConfigMap 'prometheus-k8s-rulefiles-0': configmaps "prometheus-k8s-rulefiles-0" already exists """
This looks like a monitoring operator RBAC permissions issue. Moving over...
After discussing off-line with @surbania we concluded that if might be related to bug 1863011 (bug 1887354 for 4.5.z) so closing this one as a duplicate since the resolution is already on-going. *** This bug has been marked as a duplicate of bug 1887354 ***