Description of problem: upgrade from 4.5.3 to 4.6.0-0.nightly-2020-07-23-055513, monitoring is degraded and it blocks the upgrade # oc get co/monitoring -oyaml ... - lastTransitionTime: "2020-07-23T10:54:23Z" message: 'Failed to rollout the stack. Error: running task Updating kube-state-metrics failed: reconciling kube-state-metrics Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/kube-state-metrics: current generation 90, observed generation 89' reason: UpdatingkubeStateMetricsFailed status: "True" type: Degraded ... # oc -n openshift-monitoring logs kube-state-metrics-6c47655b4f-wwggg -c kube-state-metrics I0723 10:30:45.351577 1 main.go:86] Using default collectors I0723 10:30:45.351672 1 main.go:98] Using all namespace I0723 10:30:45.351693 1 main.go:139] metric white-blacklisting: blacklisting the following items: kube_secret_labels W0723 10:30:45.351705 1 client_config.go:543] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. I0723 10:30:45.353590 1 main.go:186] Testing communication with server I0723 10:30:45.362049 1 main.go:191] Running with Kubernetes cluster version: v4.6+. git version: v4.6.0-202007202229.p0-dirty. git tree state: dirty. commit: 1192855e475bf73fe14ebff65694f8a2a717466f. platform: linux/amd64 I0723 10:30:45.362070 1 main.go:193] Communication with server successful I0723 10:30:45.362224 1 main.go:227] Starting metrics server: 127.0.0.1:8081 I0723 10:30:45.362376 1 main.go:202] Starting kube-state-metrics self metrics server: 127.0.0.1:8082 I0723 10:30:45.362475 1 metrics_handler.go:96] Autosharding disabled I0723 10:30:45.363465 1 builder.go:156] Active collectors: certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments E0723 10:30:51.844638 1 reflector.go:368] k8s.io/kube-state-metrics/internal/store/builder.go:346: expected type *v1.MutatingWebhookConfiguration, but watch event object had type *v1beta1.MutatingWebhookConfiguration E0723 10:30:51.844681 1 reflector.go:368] k8s.io/kube-state-metrics/internal/store/builder.go:346: expected type *v1.MutatingWebhookConfiguration, but watch event object had type *v1beta1.MutatingWebhookConfiguration E0723 10:36:29.348863 1 reflector.go:368] k8s.io/kube-state-metrics/internal/store/builder.go:346: expected type *v1.MutatingWebhookConfiguration, but watch event object had type *v1beta1.MutatingWebhookConfiguration E0723 10:36:31.393157 1 reflector.go:368] k8s.io/kube-state-metrics/internal/store/builder.go:346: expected type *v1.MutatingWebhookConfiguration, but watch event object had type *v1beta1.MutatingWebhookConfiguration E0723 10:36:31.393179 1 reflector.go:368] k8s.io/kube-state-metrics/internal/store/builder.go:346: expected type *v1.MutatingWebhookConfiguration, but watch event object had type *v1beta1.MutatingWebhookConfiguration # oc explain MutatingWebhookConfiguration KIND: MutatingWebhookConfiguration VERSION: admissionregistration.k8s.io/v1 DESCRIPTION: MutatingWebhookConfiguration describes the configuration of and admission webhook that accept or reject and may change the object. FIELDS: apiVersion <string> APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources kind <string> Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds metadata <Object> Standard object metadata; More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata. webhooks <[]Object> Webhooks is a list of webhooks and the affected resources and operations. Version-Release number of selected component (if applicable): upgrade from 4.5.3 to 4.6.0-0.nightly-2020-07-23-055513 How reproducible: first time to view such error Steps to Reproduce: 1. see the description 2. 3. Actual results: Expected results: Additional info:
Yes well aware of this issue it was fixed in upstream https://github.com/kubernetes/kube-state-metrics/releases/tag/v1.9.7, we need to either bump to latest kube-state-metrics or bump to 2.0, currently this work is tracked in https://issues.redhat.com/browse/MON-1162
To be verified in the next sprint.
issue is fixed with 4.6.0-0.nightly-2020-08-03-143208, for the Failed to watch errors, it's tracked in bug 1865742 # oc -n openshift-monitoring logs kube-state-metrics-7d8f88b5f7-2w8sc -c kube-state-metrics I0804 02:14:25.340948 1 main.go:86] Using default collectors I0804 02:14:25.341072 1 main.go:98] Using all namespace I0804 02:14:25.341093 1 main.go:139] metric white-blacklisting: blacklisting the following items: kube_secret_labels W0804 02:14:25.341111 1 client_config.go:543] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. I0804 02:14:25.343113 1 main.go:186] Testing communication with server I0804 02:14:25.355312 1 main.go:191] Running with Kubernetes cluster version: v4.6+. git version: v4.6.0-202008030720.p0-dirty. git tree state: dirty. commit: 64529ef6458777ac400f4c1bf78b1dabea082fa4. platform: linux/amd64 I0804 02:14:25.355340 1 main.go:193] Communication with server successful I0804 02:14:25.355504 1 main.go:227] Starting metrics server: 127.0.0.1:8081 I0804 02:14:25.355755 1 metrics_handler.go:96] Autosharding disabled I0804 02:14:25.356897 1 builder.go:156] Active collectors: certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments I0804 02:14:25.357258 1 main.go:202] Starting kube-state-metrics self metrics server: 127.0.0.1:8082 E0804 02:27:43.432795 1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.Deployment: unknown (get deployments.apps) E0804 02:50:42.390754 1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.Job: unknown (get jobs.batch) E0804 02:50:43.395802 1 reflector.go:153] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to list *v1.Job: jobs.batch is forbidden: User "system:serviceaccount:openshift-monitoring:kube-state-metrics" cannot list resource "jobs" in API group "batch" at the cluster scope E0804 02:57:02.406234 1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.StorageClass: unknown (get storageclasses.storage.k8s.io) E0804 02:57:03.412278 1 reflector.go:307] k8s.io/kube-state-metrics/internal/store/builder.go:346: Failed to watch *v1.StorageClass: unknown (get storageclasses.storage.k8s.io)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196