Description of problem: Getting message, "Prometheus could not scrape fluentd for more than 10m." Version-Release number of selected component (if applicable): 4.7.34 How reproducible: Unconfirmed Additional info: Customer set label openshift.io/cluster-monitoring: "true" set but still that error is not clearing. The prometheus pods are noting this error on repeat: 2021-10-31T03:05:06.385693354Z level=error ts=2021-10-31T03:05:06.385Z caller=klog.go:96 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:428: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"openshift-logging\"" 2021-10-31T03:05:08.607296440Z level=error ts=2021-10-31T03:05:08.607Z caller=klog.go:96 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:427: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"openshift-logging\"" 2021-10-31T03:05:31.197590776Z level=error ts=2021-10-31T03:05:31.197Z caller=klog.go:96 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:426: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"openshift-logging\"" We found a similar bug from an older version: https://bugzilla.redhat.com/show_bug.cgi?id=1774907 Using diagnostic steps from that bug: # token=`oc -n openshift-monitoring sa get-token prometheus-k8s` # oc auth can-i list endpoints -n openshift-logging --token $token # oc auth can-i list endpoints -n openshift-logging --token $token # oc auth can-i list endpoints -n openshift-logging --token $token # oc auth can-i list endpoints -n openshift-logging --token $token # oc auth can-i list endpoints -n openshift-logging --token $token # oc auth can-i list endpoints -n openshift-logging --token $token These all result "no". I suspect something has failed to set the proper rolebindings for prometheus-k8s. Are there roles that should be added? Can they be added manually?
Other cluster operators(e.g. cluster-etcd-operator] defines explicit role[1] bindings[2] to the `prometheus-k8s` service account. You may need to follow the same. But I'm wondering why it was not done from cluster-logging operator! [1] https://github.com/openshift/cluster-etcd-operator/blob/master/manifests/0000_90_etcd-operator_01_prometheusrole.yaml [2] https://github.com/openshift/cluster-etcd-operator/blob/master/manifests/0000_90_etcd-operator_02_prometheusrolebinding.yaml
It seems cluster-logging-operator has the necessary role[1] binding[2] to the `prometheus-k8s` service account. [1] https://github.com/openshift/cluster-logging-operator/blob/release-4.7/manifests/4.7/0100_clusterroles.yaml [2] https://github.com/openshift/cluster-logging-operator/blob/release-4.7/manifests/4.7/0110_clusterrolebindings.yaml
The product OpenShift Logging does not compile any version 4.7. The last release bound to OCP is 4.6.z. Anything after that has a prefix 5.x (e.g. 5.0, 5.1, 5.2). If your image bundle registry is showing up anything like 4.7 it is a registry issue. Please use 5.x.
Hi, my apologies, the OpenShift Logging version is: cluster-logging.5.2.2-21 It slipped my mind that the products are tracking different release cycles now. Can we re-open this against 5.2.2?
Actually, looks like I set the version 4.7 when the component was set to Monitoring... Should we move this to JIRA as this is Logging 5.2? I know new bugs should be sent there but not sure what the protocol is if it's possibly related to other components like monitoring.
Apologies for the comment spam; I'll close this and move to JIRA.