Description of problem: Failed to watch errors in kube-state-metrics container logs # oc -n openshift-monitoring logs kube-state-metrics-7d8f88b5f7-2w8sc -c kube-state-metrics I0804 02:14:25.340948 1 main.go:86] Using default collectors I0804 02:14:25.341072 1 main.go:98] Using all namespace I0804 02:14:25.341093 1 main.go:139] metric white-blacklisting: blacklisting the following items: kube_secret_labels W0804 02:14:25.341111 1 client_config.go:543] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. I0804 02:14:25.343113 1 main.go:186] Testing communication with server I0804 02:14:25.355312 1 main.go:191] Running with Kubernetes cluster version: v4.6+. git version: v4.6.0-202008030720.p0-dirty. git tree state: dirty. commit: 64529ef6458777ac400f4c1bf78b1dabea082fa4. platform: linux/amd64 I0804 02:14:25.355340 1 main.go:193] Communication with server successful I0804 02:14:25.355504 1 main.go:227] Starting metrics server: 127.0.0.1:8081 I0804 02:14:25.355755 1 metrics_handler.go:96] Autosharding disabled I0804 02:14:25.356897 1 builder.go:156] Active collectors: certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments I0804 02:14:25.357258 1 main.go:202] Starting kube-state-metrics self metrics server: 127.0.0.1:8082 E0804 02:27:43.432795 1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.Deployment: unknown (get deployments.apps) E0804 02:50:42.390754 1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.Job: unknown (get jobs.batch) E0804 02:50:43.395802 1 reflector.go:153] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to list *v1.Job: jobs.batch is forbidden: User "system:serviceaccount:openshift-monitoring:kube-state-metrics" cannot list resource "jobs" in API group "batch" at the cluster scope E0804 02:57:02.406234 1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.StorageClass: unknown (get storageclasses.storage.k8s.io) E0804 02:57:03.412278 1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.StorageClass: unknown (get storageclasses.storage.k8s.io) # oc explain Deployment KIND: Deployment VERSION: apps/v1 ... ************************ # oc explain StorageClass KIND: StorageClass VERSION: storage.k8s.io/v1 ... *********** # oc explain jobs KIND: Job VERSION: batch/v1 ... Version-Release number of selected component (if applicable): 4.6.0-0.nightly-2020-08-03-143208 # /usr/bin/kube-state-metrics --version version.Version{GitCommit:"f113959", BuildDate:"2020-08-02T15:17:20Z", Release:"v1.9.7", GoVersion:"go1.14.4", Compiler:"gc", Platform:"linux/amd64"} How reproducible: always Steps to Reproduce: 1. see the description 2. 3. Actual results: Expected results: Additional info:
@junqi is the error persistent and is there any degradation in functionality? or are the above log lines just temporary.
I just rechecked on my cluster (openshift-install-linux-4.6.0-0.nightly-2020-08-05-013608.tar.gz). While not pretty it is just a few log lines at the start of the pods and it seems to be a raciness with API objects. Lowering severity to low, and reassigning to api server team to assert if there are henn-and-egg problems currently in 4.6 with api objects being available at early start stages.
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.
tested with 4.6.0-0.nightly-2020-09-05-015624, no such error now # oc -n openshift-monitoring logs $(oc -n openshift-monitoring get po | grep kube-state-metrics | awk '{print $1}') -c kube-state-metrics I0906 23:36:29.112878 1 main.go:86] Using default collectors I0906 23:36:29.112978 1 main.go:98] Using all namespace I0906 23:36:29.112998 1 main.go:139] metric white-blacklisting: blacklisting the following items: kube_secret_labels W0906 23:36:29.113011 1 client_config.go:543] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. I0906 23:36:29.115697 1 main.go:186] Testing communication with server I0906 23:36:29.128256 1 main.go:191] Running with Kubernetes cluster version: v1.19+. git version: v1.19.0-rc.2+068702d. git tree state: clean. commit: 068702de7d48739e835ea41b7ca959b5252de432. platform: linux/amd64 I0906 23:36:29.128271 1 main.go:193] Communication with server successful I0906 23:36:29.128411 1 main.go:227] Starting metrics server: 127.0.0.1:8081 I0906 23:36:29.128449 1 main.go:202] Starting kube-state-metrics self metrics server: 127.0.0.1:8082 I0906 23:36:29.129188 1 metrics_handler.go:96] Autosharding disabled I0906 23:36:29.130246 1 builder.go:156] Active collectors: certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments
The LifecycleStale keyword was removed because the bug moved to QE and the bug got commented on recently. The bug assignee was notified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196