Bug 1833189

Summary:	error logs in kube-state-metrics container
Product:	OpenShift Container Platform	Reporter:	Junqi Zhao <juzhao>
Component:	Monitoring	Assignee:	Lili Cosic <lcosic>
Status:	CLOSED DUPLICATE	QA Contact:	Junqi Zhao <juzhao>
Severity:	low	Docs Contact:
Priority:	low
Version:	4.5	CC:	alegrand, anpicker, erooth, kakkoyun, lcosic, mloibl, pkrupa, spasquie, surbania
Target Milestone:	---	Keywords:	Regression
Target Release:	4.6.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-08-25 07:36:31 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Junqi Zhao 2020-05-08 02:09:32 UTC

Description of problem:
# oc -n openshift-monitoring logs -c kube-state-metrics kube-state-metrics-d987997f7-bvj57
I0507 23:33:26.470489       1 main.go:86] Using default collectors
I0507 23:33:26.470844       1 main.go:98] Using all namespace
I0507 23:33:26.470866       1 main.go:139] metric white-blacklisting: blacklisting the following items: kube_secret_labels
W0507 23:33:26.470878       1 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0507 23:33:26.474583       1 main.go:186] Testing communication with server
I0507 23:33:26.645771       1 main.go:191] Running with Kubernetes cluster version: v1.18+. git version: v1.18.0-rc.1. git tree state: clean. commit: dfd05bf. platform: linux/amd64
I0507 23:33:26.645795       1 main.go:193] Communication with server successful
I0507 23:33:26.645920       1 main.go:227] Starting metrics server: 127.0.0.1:8081
I0507 23:33:26.646098       1 metrics_handler.go:96] Autosharding disabled
I0507 23:33:26.646949       1 main.go:202] Starting kube-state-metrics self metrics server: 127.0.0.1:8082
I0507 23:33:26.663478       1 builder.go:156] Active collectors: certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments
E0507 23:59:59.381991       1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.ReplicaSet: unknown (get replicasets.apps)
E0508 00:00:00.383828       1 reflector.go:153] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to list *v1.ReplicaSet: replicasets.apps is forbidden: User "system:serviceaccount:openshift-monitoring:kube-state-metrics" cannot list resource "replicasets" in API group "apps" at the cluster scope
E0508 01:00:00.392411       1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.ConfigMap: unknown (get configmaps)
E0508 01:01:31.389009       1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.Pod: unknown (get pods)
E0508 01:01:32.390796       1 reflector.go:153] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:openshift-monitoring:kube-state-metrics" cannot list resource "pods" in API group "" at the cluster scope
E0508 01:11:09.580458       1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.VolumeAttachment: unknown (get volumeattachments.storage.k8s.io)
E0508 01:11:10.582327       1 reflector.go:153] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to list *v1.VolumeAttachment: volumeattachments.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-monitoring:kube-state-metrics" cannot list resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope
E0508 01:11:46.578960       1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.PersistentVolume: unknown (get persistentvolumes)
E0508 01:11:47.580730       1 reflector.go:153] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to list *v1.PersistentVolume: persistentvolumes is forbidden: User "system:serviceaccount:openshift-monitoring:kube-state-metrics" cannot list resource "persistentvolumes" in API group "" at the cluster scope
E0508 01:14:59.580284       1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1beta1.CronJob: unknown (get cronjobs.batch)
E0508 01:16:32.644702       1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.ConfigMap: unknown (get configmaps)
E0508 01:41:47.587064       1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.MutatingWebhookConfiguration: unknown (get mutatingwebhookconfigurations.admissionregistration.k8s.io)
E0508 02:00:37.077669       1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.Secret: unknown (get secrets)

maybe similar to bug 1832830

Version-Release number of selected component (if applicable):
4.5.0-0.nightly-2020-05-07-144853

How reproducible:
always

Steps to Reproduce:
1. Check kube-state-metrics container logs
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Lili Cosic 2020-05-11 09:10:58 UTC

I don't see this in the nightly 4.5 cluster that I launched. But it seems from your logs that this happens after more than half an hour after startup, so we seem to provision it correctly and it collects the resource metrics but something happens after 30 minutes or so that triggers this. 

Note times of starting and when it first failed to watch the replicaset.
> I0507 23:33:26.663478       1 builder.go:156] Active collectors: certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments
E0507 23:59:59.381991       1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.ReplicaSet: unknown (get replicasets.apps)

The only thing that would trigger this would be if the RBAC changed or something wrong with the kubeapiserver, can you make sure everything else is okay in the cluster?

Comment 2 Junqi Zhao 2020-05-11 09:47:59 UTC

(In reply to Lili Cosic from comment #1)
> The only thing that would trigger this would be if the RBAC changed or
> something wrong with the kubeapiserver, can you make sure everything else is
> okay in the cluster?

yes, the kubeapiserver is normal, checked the same 4.5.0-0.nightly-2020-05-10-180138 on Azure/AWS cluster,
did not see such error on Azure, but found on AWS, will keep an eye on it

Comment 3 Junqi Zhao 2020-05-12 03:01:00 UTC

meet the same error in AWS with 4.5.0-0.nightly-2020-05-11-202959 payload
# oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h10m
cloud-credential                           4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h39m
cluster-autoscaler                         4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h23m
config-operator                            4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h23m
console                                    4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h13m
csi-snapshot-controller                    4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h16m
dns                                        4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h26m
etcd                                       4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h28m
image-registry                             4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h17m
ingress                                    4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h17m
insights                                   4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h25m
kube-apiserver                             4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h27m
kube-controller-manager                    4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h26m
kube-scheduler                             4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h27m
kube-storage-version-migrator              4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h17m
machine-api                                4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h21m
machine-approver                           4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h27m
machine-config                             4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h27m
marketplace                                4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h24m
monitoring                                 4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h16m
network                                    4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h29m
node-tuning                                4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h29m
openshift-apiserver                        4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h24m
openshift-controller-manager               4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h24m
openshift-samples                          4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h23m
operator-lifecycle-manager                 4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h28m
operator-lifecycle-manager-catalog         4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h28m
operator-lifecycle-manager-packageserver   4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h25m
service-ca                                 4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h29m
storage                                    4.5.0-0.nightly-2020-05-11-202959   True        False         False      3h25m
# for n in $(oc -n openshift-monitoring get pods  -o wide | grep kube-state-metrics | awk '{print $1}'); do echo ">>> $n <<<";kubectl -n openshift-monitoring logs $n -c kube-state-metrics; done

>>> kube-state-metrics-d987997f7-fbt8k <<<
I0511 23:33:43.583879       1 main.go:86] Using default collectors
I0511 23:33:43.583976       1 main.go:98] Using all namespace
I0511 23:33:43.583995       1 main.go:139] metric white-blacklisting: blacklisting the following items: kube_secret_labels
W0511 23:33:43.584013       1 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0511 23:33:43.587009       1 main.go:186] Testing communication with server
I0511 23:33:43.617029       1 main.go:191] Running with Kubernetes cluster version: v1.18+. git version: v1.18.2. git tree state: clean. commit: d6084de. platform: linux/amd64
I0511 23:33:43.617048       1 main.go:193] Communication with server successful
I0511 23:33:43.617153       1 main.go:227] Starting metrics server: 127.0.0.1:8081
I0511 23:33:43.617294       1 metrics_handler.go:96] Autosharding disabled
I0511 23:33:43.617772       1 main.go:202] Starting kube-state-metrics self metrics server: 127.0.0.1:8082
I0511 23:33:43.618985       1 builder.go:156] Active collectors: certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments
E0512 00:25:56.001616       1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.VolumeAttachment: unknown (get volumeattachments.storage.k8s.io)
E0512 00:25:57.002967       1 reflector.go:153] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to list *v1.VolumeAttachment: volumeattachments.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-monitoring:kube-state-metrics" cannot list resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope
E0512 01:15:57.022025       1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.LimitRange: unknown (get limitranges)
E0512 01:15:58.023401       1 reflector.go:153] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to list *v1.LimitRange: limitranges is forbidden: User "system:serviceaccount:openshift-monitoring:kube-state-metrics" cannot list resource "limitranges" in API group "" at the cluster scope
E0512 01:56:34.016965       1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.PersistentVolume: unknown (get persistentvolumes)
E0512 01:56:35.018438       1 reflector.go:153] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to list *v1.PersistentVolume: persistentvolumes is forbidden: User "system:serviceaccount:openshift-monitoring:kube-state-metrics" cannot list resource "persistentvolumes" in API group "" at the cluster scope
E0512 02:26:34.003897       1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.Node: unknown (get nodes)
E0512 02:26:35.015488       1 reflector.go:153] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to list *v1.Node: nodes is forbidden: User "system:serviceaccount:openshift-monitoring:kube-state-metrics" cannot list resource "nodes" in API group "" at the cluster scope

Comment 6 Junqi Zhao 2020-05-14 09:15:47 UTC

kube_node_info
Element 	Value
kube_node_info{container_runtime_version="cri-o://1.18.0-17.dev.rhaos4.5.gitdea34b9.el8",endpoint="https-main",instance="10.131.0.15:8443",job="kube-state-metrics",kernel_version="4.18.0-147.8.1.el8_1.x86_64",kubelet_version="v1.18.2",kubeproxy_version="v1.18.2",namespace="openshift-monitoring",node="ip-10-0-138-153.us-east-2.compute.internal",os_image="Red Hat Enterprise Linux CoreOS 45.81.202005131629-0 (Ootpa)",pod="kube-state-metrics-d987997f7-tjvw9",provider_id="aws:///us-east-2a/i-058edb6bc050fbfbd",service="kube-state-metrics"}	1
kube_node_info{container_runtime_version="cri-o://1.18.0-17.dev.rhaos4.5.gitdea34b9.el8",endpoint="https-main",instance="10.131.0.15:8443",job="kube-state-metrics",kernel_version="4.18.0-147.8.1.el8_1.x86_64",kubelet_version="v1.18.2",kubeproxy_version="v1.18.2",namespace="openshift-monitoring",node="ip-10-0-139-20.us-east-2.compute.internal",os_image="Red Hat Enterprise Linux CoreOS 45.81.202005131629-0 (Ootpa)",pod="kube-state-metrics-d987997f7-tjvw9",provider_id="aws:///us-east-2a/i-09981b73b2848c928",service="kube-state-metrics"}	1
kube_node_info{container_runtime_version="cri-o://1.18.0-17.dev.rhaos4.5.gitdea34b9.el8",endpoint="https-main",instance="10.131.0.15:8443",job="kube-state-metrics",kernel_version="4.18.0-147.8.1.el8_1.x86_64",kubelet_version="v1.18.2",kubeproxy_version="v1.18.2",namespace="openshift-monitoring",node="ip-10-0-153-31.us-east-2.compute.internal",os_image="Red Hat Enterprise Linux CoreOS 45.81.202005131629-0 (Ootpa)",pod="kube-state-metrics-d987997f7-tjvw9",provider_id="aws:///us-east-2b/i-034895358e3eab84b",service="kube-state-metrics"}	1
kube_node_info{container_runtime_version="cri-o://1.18.0-17.dev.rhaos4.5.gitdea34b9.el8",endpoint="https-main",instance="10.131.0.15:8443",job="kube-state-metrics",kernel_version="4.18.0-147.8.1.el8_1.x86_64",kubelet_version="v1.18.2",kubeproxy_version="v1.18.2",namespace="openshift-monitoring",node="ip-10-0-159-37.us-east-2.compute.internal",os_image="Red Hat Enterprise Linux CoreOS 45.81.202005131629-0 (Ootpa)",pod="kube-state-metrics-d987997f7-tjvw9",provider_id="aws:///us-east-2b/i-0ebc056456b7e07b4",service="kube-state-metrics"}	1
kube_node_info{container_runtime_version="cri-o://1.18.0-17.dev.rhaos4.5.gitdea34b9.el8",endpoint="https-main",instance="10.131.0.15:8443",job="kube-state-metrics",kernel_version="4.18.0-147.8.1.el8_1.x86_64",kubelet_version="v1.18.2",kubeproxy_version="v1.18.2",namespace="openshift-monitoring",node="ip-10-0-165-248.us-east-2.compute.internal",os_image="Red Hat Enterprise Linux CoreOS 45.81.202005131629-0 (Ootpa)",pod="kube-state-metrics-d987997f7-tjvw9",provider_id="aws:///us-east-2c/i-036da13469f9076be",service="kube-state-metrics"}	1
kube_node_info{container_runtime_version="cri-o://1.18.0-17.dev.rhaos4.5.gitdea34b9.el8",endpoint="https-main",instance="10.131.0.15:8443",job="kube-state-metrics",kernel_version="4.18.0-147.8.1.el8_1.x86_64",kubelet_version="v1.18.2",kubeproxy_version="v1.18.2",namespace="openshift-monitoring",node="ip-10-0-171-126.us-east-2.compute.internal",os_image="Red Hat Enterprise Linux CoreOS 45.81.202005131629-0 (Ootpa)",pod="kube-state-metrics-d987997f7-tjvw9",provider_id="aws:///us-east-2c/i-0e8d1262b6265dbc4",service="kube-state-metrics"}

Comment 18 Lili Cosic 2020-07-29 11:18:00 UTC

Thanks for confirming!

The main thing I would fix with this issue is: 
> E0729 01:56:36.256657       1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1beta1.PodDisruptionBudget: unknown (get poddisruptionbudgets.policy)
E0729 01:56:37.258390       1 reflector.go:153] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to list *v1beta1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:openshift-monitoring:kube-state-metrics" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E0729 07:14:01.518147       1 reflector.go:153] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to list *v2beta1.HorizontalPodAutoscaler: horizontalpodautoscalers.autoscaling is forbidden: User "system:serviceaccount:openshift-monitoring:kube-state-metrics" cannot list resource "horizontalpodautoscalers" in API group "autoscaling" at the cluster scope
E0729 07:29:39.531828       1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.Pod: unknown (get pods)
E0729 07:29:40.533110       1 reflector.go:153] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:openshift-monitoring:kube-state-metrics" cannot list resource "pods" in API group "" at the cluster scope
E0729 07:59:39.533244       1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.Deployment: unknown (get deployments.apps)
E0729 07:59:40.535187       1 reflector.go:153] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:openshift-monitoring:kube-state-metrics" cannot list resource "deployments" in API group "apps" at the cluster scope

The rest will be fixed in upstream. Seems related to the Prometheus operator issue we have.

Comment 20 Pawel Krupa 2020-08-21 07:58:59 UTC

Removing NEEDINFO as it seems all neccessary information are provided.

Comment 22 Sergiusz Urbaniak 2020-08-25 07:36:31 UTC

Closing out as the underlying issue is the same as commented in https://bugzilla.redhat.com/show_bug.cgi?id=1856189#c35.

*** This bug has been marked as a duplicate of bug 1856189 ***