Summary: | error logs in kube-state-metrics container | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Junqi Zhao <juzhao> |
Component: | Monitoring | Assignee: | Lili Cosic <lcosic> |
Status: | CLOSED DUPLICATE | QA Contact: | Junqi Zhao <juzhao> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 4.5 | CC: | alegrand, anpicker, erooth, kakkoyun, lcosic, mloibl, pkrupa, spasquie, surbania |
Target Milestone: | --- | Keywords: | Regression |
Target Release: | 4.6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-08-25 07:36:31 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: |
Description
Junqi Zhao
2020-05-08 02:09:32 UTC
I don't see this in the nightly 4.5 cluster that I launched. But it seems from your logs that this happens after more than half an hour after startup, so we seem to provision it correctly and it collects the resource metrics but something happens after 30 minutes or so that triggers this.
Note times of starting and when it first failed to watch the replicaset.
> I0507 23:33:26.663478 1 builder.go:156] Active collectors: certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments
E0507 23:59:59.381991 1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.ReplicaSet: unknown (get replicasets.apps)
The only thing that would trigger this would be if the RBAC changed or something wrong with the kubeapiserver, can you make sure everything else is okay in the cluster?
(In reply to Lili Cosic from comment #1) > The only thing that would trigger this would be if the RBAC changed or > something wrong with the kubeapiserver, can you make sure everything else is > okay in the cluster? yes, the kubeapiserver is normal, checked the same 4.5.0-0.nightly-2020-05-10-180138 on Azure/AWS cluster, did not see such error on Azure, but found on AWS, will keep an eye on it meet the same error in AWS with 4.5.0-0.nightly-2020-05-11-202959 payload
# oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
authentication 4.5.0-0.nightly-2020-05-11-202959 True False False 3h10m
cloud-credential 4.5.0-0.nightly-2020-05-11-202959 True False False 3h39m
cluster-autoscaler 4.5.0-0.nightly-2020-05-11-202959 True False False 3h23m
config-operator 4.5.0-0.nightly-2020-05-11-202959 True False False 3h23m
console 4.5.0-0.nightly-2020-05-11-202959 True False False 3h13m
csi-snapshot-controller 4.5.0-0.nightly-2020-05-11-202959 True False False 3h16m
dns 4.5.0-0.nightly-2020-05-11-202959 True False False 3h26m
etcd 4.5.0-0.nightly-2020-05-11-202959 True False False 3h28m
image-registry 4.5.0-0.nightly-2020-05-11-202959 True False False 3h17m
ingress 4.5.0-0.nightly-2020-05-11-202959 True False False 3h17m
insights 4.5.0-0.nightly-2020-05-11-202959 True False False 3h25m
kube-apiserver 4.5.0-0.nightly-2020-05-11-202959 True False False 3h27m
kube-controller-manager 4.5.0-0.nightly-2020-05-11-202959 True False False 3h26m
kube-scheduler 4.5.0-0.nightly-2020-05-11-202959 True False False 3h27m
kube-storage-version-migrator 4.5.0-0.nightly-2020-05-11-202959 True False False 3h17m
machine-api 4.5.0-0.nightly-2020-05-11-202959 True False False 3h21m
machine-approver 4.5.0-0.nightly-2020-05-11-202959 True False False 3h27m
machine-config 4.5.0-0.nightly-2020-05-11-202959 True False False 3h27m
marketplace 4.5.0-0.nightly-2020-05-11-202959 True False False 3h24m
monitoring 4.5.0-0.nightly-2020-05-11-202959 True False False 3h16m
network 4.5.0-0.nightly-2020-05-11-202959 True False False 3h29m
node-tuning 4.5.0-0.nightly-2020-05-11-202959 True False False 3h29m
openshift-apiserver 4.5.0-0.nightly-2020-05-11-202959 True False False 3h24m
openshift-controller-manager 4.5.0-0.nightly-2020-05-11-202959 True False False 3h24m
openshift-samples 4.5.0-0.nightly-2020-05-11-202959 True False False 3h23m
operator-lifecycle-manager 4.5.0-0.nightly-2020-05-11-202959 True False False 3h28m
operator-lifecycle-manager-catalog 4.5.0-0.nightly-2020-05-11-202959 True False False 3h28m
operator-lifecycle-manager-packageserver 4.5.0-0.nightly-2020-05-11-202959 True False False 3h25m
service-ca 4.5.0-0.nightly-2020-05-11-202959 True False False 3h29m
storage 4.5.0-0.nightly-2020-05-11-202959 True False False 3h25m
# for n in $(oc -n openshift-monitoring get pods -o wide | grep kube-state-metrics | awk '{print $1}'); do echo ">>> $n <<<";kubectl -n openshift-monitoring logs $n -c kube-state-metrics; done
>>> kube-state-metrics-d987997f7-fbt8k <<<
I0511 23:33:43.583879 1 main.go:86] Using default collectors
I0511 23:33:43.583976 1 main.go:98] Using all namespace
I0511 23:33:43.583995 1 main.go:139] metric white-blacklisting: blacklisting the following items: kube_secret_labels
W0511 23:33:43.584013 1 client_config.go:543] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0511 23:33:43.587009 1 main.go:186] Testing communication with server
I0511 23:33:43.617029 1 main.go:191] Running with Kubernetes cluster version: v1.18+. git version: v1.18.2. git tree state: clean. commit: d6084de. platform: linux/amd64
I0511 23:33:43.617048 1 main.go:193] Communication with server successful
I0511 23:33:43.617153 1 main.go:227] Starting metrics server: 127.0.0.1:8081
I0511 23:33:43.617294 1 metrics_handler.go:96] Autosharding disabled
I0511 23:33:43.617772 1 main.go:202] Starting kube-state-metrics self metrics server: 127.0.0.1:8082
I0511 23:33:43.618985 1 builder.go:156] Active collectors: certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments
E0512 00:25:56.001616 1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.VolumeAttachment: unknown (get volumeattachments.storage.k8s.io)
E0512 00:25:57.002967 1 reflector.go:153] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to list *v1.VolumeAttachment: volumeattachments.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-monitoring:kube-state-metrics" cannot list resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope
E0512 01:15:57.022025 1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.LimitRange: unknown (get limitranges)
E0512 01:15:58.023401 1 reflector.go:153] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to list *v1.LimitRange: limitranges is forbidden: User "system:serviceaccount:openshift-monitoring:kube-state-metrics" cannot list resource "limitranges" in API group "" at the cluster scope
E0512 01:56:34.016965 1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.PersistentVolume: unknown (get persistentvolumes)
E0512 01:56:35.018438 1 reflector.go:153] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to list *v1.PersistentVolume: persistentvolumes is forbidden: User "system:serviceaccount:openshift-monitoring:kube-state-metrics" cannot list resource "persistentvolumes" in API group "" at the cluster scope
E0512 02:26:34.003897 1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.Node: unknown (get nodes)
E0512 02:26:35.015488 1 reflector.go:153] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to list *v1.Node: nodes is forbidden: User "system:serviceaccount:openshift-monitoring:kube-state-metrics" cannot list resource "nodes" in API group "" at the cluster scope
kube_node_info Element Value kube_node_info{container_runtime_version="cri-o://1.18.0-17.dev.rhaos4.5.gitdea34b9.el8",endpoint="https-main",instance="10.131.0.15:8443",job="kube-state-metrics",kernel_version="4.18.0-147.8.1.el8_1.x86_64",kubelet_version="v1.18.2",kubeproxy_version="v1.18.2",namespace="openshift-monitoring",node="ip-10-0-138-153.us-east-2.compute.internal",os_image="Red Hat Enterprise Linux CoreOS 45.81.202005131629-0 (Ootpa)",pod="kube-state-metrics-d987997f7-tjvw9",provider_id="aws:///us-east-2a/i-058edb6bc050fbfbd",service="kube-state-metrics"} 1 kube_node_info{container_runtime_version="cri-o://1.18.0-17.dev.rhaos4.5.gitdea34b9.el8",endpoint="https-main",instance="10.131.0.15:8443",job="kube-state-metrics",kernel_version="4.18.0-147.8.1.el8_1.x86_64",kubelet_version="v1.18.2",kubeproxy_version="v1.18.2",namespace="openshift-monitoring",node="ip-10-0-139-20.us-east-2.compute.internal",os_image="Red Hat Enterprise Linux CoreOS 45.81.202005131629-0 (Ootpa)",pod="kube-state-metrics-d987997f7-tjvw9",provider_id="aws:///us-east-2a/i-09981b73b2848c928",service="kube-state-metrics"} 1 kube_node_info{container_runtime_version="cri-o://1.18.0-17.dev.rhaos4.5.gitdea34b9.el8",endpoint="https-main",instance="10.131.0.15:8443",job="kube-state-metrics",kernel_version="4.18.0-147.8.1.el8_1.x86_64",kubelet_version="v1.18.2",kubeproxy_version="v1.18.2",namespace="openshift-monitoring",node="ip-10-0-153-31.us-east-2.compute.internal",os_image="Red Hat Enterprise Linux CoreOS 45.81.202005131629-0 (Ootpa)",pod="kube-state-metrics-d987997f7-tjvw9",provider_id="aws:///us-east-2b/i-034895358e3eab84b",service="kube-state-metrics"} 1 kube_node_info{container_runtime_version="cri-o://1.18.0-17.dev.rhaos4.5.gitdea34b9.el8",endpoint="https-main",instance="10.131.0.15:8443",job="kube-state-metrics",kernel_version="4.18.0-147.8.1.el8_1.x86_64",kubelet_version="v1.18.2",kubeproxy_version="v1.18.2",namespace="openshift-monitoring",node="ip-10-0-159-37.us-east-2.compute.internal",os_image="Red Hat Enterprise Linux CoreOS 45.81.202005131629-0 (Ootpa)",pod="kube-state-metrics-d987997f7-tjvw9",provider_id="aws:///us-east-2b/i-0ebc056456b7e07b4",service="kube-state-metrics"} 1 kube_node_info{container_runtime_version="cri-o://1.18.0-17.dev.rhaos4.5.gitdea34b9.el8",endpoint="https-main",instance="10.131.0.15:8443",job="kube-state-metrics",kernel_version="4.18.0-147.8.1.el8_1.x86_64",kubelet_version="v1.18.2",kubeproxy_version="v1.18.2",namespace="openshift-monitoring",node="ip-10-0-165-248.us-east-2.compute.internal",os_image="Red Hat Enterprise Linux CoreOS 45.81.202005131629-0 (Ootpa)",pod="kube-state-metrics-d987997f7-tjvw9",provider_id="aws:///us-east-2c/i-036da13469f9076be",service="kube-state-metrics"} 1 kube_node_info{container_runtime_version="cri-o://1.18.0-17.dev.rhaos4.5.gitdea34b9.el8",endpoint="https-main",instance="10.131.0.15:8443",job="kube-state-metrics",kernel_version="4.18.0-147.8.1.el8_1.x86_64",kubelet_version="v1.18.2",kubeproxy_version="v1.18.2",namespace="openshift-monitoring",node="ip-10-0-171-126.us-east-2.compute.internal",os_image="Red Hat Enterprise Linux CoreOS 45.81.202005131629-0 (Ootpa)",pod="kube-state-metrics-d987997f7-tjvw9",provider_id="aws:///us-east-2c/i-0e8d1262b6265dbc4",service="kube-state-metrics"} Thanks for confirming!
The main thing I would fix with this issue is:
> E0729 01:56:36.256657 1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1beta1.PodDisruptionBudget: unknown (get poddisruptionbudgets.policy)
E0729 01:56:37.258390 1 reflector.go:153] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to list *v1beta1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:openshift-monitoring:kube-state-metrics" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E0729 07:14:01.518147 1 reflector.go:153] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to list *v2beta1.HorizontalPodAutoscaler: horizontalpodautoscalers.autoscaling is forbidden: User "system:serviceaccount:openshift-monitoring:kube-state-metrics" cannot list resource "horizontalpodautoscalers" in API group "autoscaling" at the cluster scope
E0729 07:29:39.531828 1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.Pod: unknown (get pods)
E0729 07:29:40.533110 1 reflector.go:153] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:openshift-monitoring:kube-state-metrics" cannot list resource "pods" in API group "" at the cluster scope
E0729 07:59:39.533244 1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.Deployment: unknown (get deployments.apps)
E0729 07:59:40.535187 1 reflector.go:153] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:openshift-monitoring:kube-state-metrics" cannot list resource "deployments" in API group "apps" at the cluster scope
The rest will be fixed in upstream. Seems related to the Prometheus operator issue we have.
Removing NEEDINFO as it seems all neccessary information are provided. Closing out as the underlying issue is the same as commented in https://bugzilla.redhat.com/show_bug.cgi?id=1856189#c35. *** This bug has been marked as a duplicate of bug 1856189 *** |