1973567 – Autoscaler log report error “Failed to watch *v1.CSIDriver”

Bug 1973567 - Autoscaler log report error “Failed to watch *v1.CSIDriver”

Summary: Autoscaler log report error “Failed to watch *v1.CSIDriver”

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cloud Compute
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	---
Target Release:	4.9.0
Assignee:	Michael McCune
QA Contact:	sunzhaohua
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1995595
TreeView+	depends on / blocked

Reported:	2021-06-18 07:51 UTC by sunzhaohua
Modified:	2021-10-18 17:35 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: The cluster autoscaler did not have permission to read csidrivers.storage.k8s.io or csistoragecapacities.storage.k8s.io resources. Consequence: The cluster autoscaler would report errors in its logs stating that its service account does not have access to interact with these resources. Fix: The Role for the cluster autoscaler has been updated to include the new resources. Result: The cluster autoscaler no longer creates error messages in its logs about interacting with these resources.
Clone Of:
Clones:	1995595 (view as bug list)
Environment:
Last Closed:	2021-10-18 17:35:35 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-autoscaler-operator pull 210	None	open	Bug 1973567: add csidrivers to the cluster-autoscaler cluster role	2021-06-22 22:28:02 UTC
Github	openshift cluster-autoscaler-operator pull 212	None	open	Bug 1973567: add csistoragecapacities to cluster-autoscaler cluster role	2021-06-25 14:00:56 UTC
Red Hat Product Errata	RHSA-2021:3759	None	None	None	2021-10-18 17:35:37 UTC

Description sunzhaohua 2021-06-18 07:51:27 UTC

Description of problem:
Autoscaler log report error “E0618 07:23:15.255288       1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-machine-api:cluster-autoscaler" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope”


Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-06-14-145150

How reproducible:
always

Steps to Reproduce:
1. Create a clusterautoscaler
2. Create a machineautoscaler 
3. Add workload
4. Check autoscaler logs

Actual results:
Autoscale logs always output error msg:

oc logs -f cluster-autoscaler-default-75c55cf9d7-kwtt8
I0618 06:25:05.288288       1 klogx.go:86] Pod openshift-machine-api/scale-up-6cc4bdd5db-69c8z is unschedulable
I0618 06:25:05.289389       1 klogx.go:86] Pod openshift-machine-api/scale-up-6cc4bdd5db-jlrxc is unschedulable
I0618 06:25:05.290411       1 klogx.go:86] Pod openshift-machine-api/scale-up-6cc4bdd5db-shjtv is unschedulable
I0618 06:25:05.294384       1 scale_up.go:453] No expansion options
I0618 06:25:05.297314       1 scale_down.go:917] No candidates for scale down
E0618 06:25:07.144282       1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-machine-api:cluster-autoscaler" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope
I0618 06:25:15.454335       1 klogx.go:86] Pod openshift-machine-api/scale-up-6cc4bdd5db-shjtv is unschedulable
I0618 06:25:15.454660       1 klogx.go:86] Pod openshift-machine-api/scale-up-6cc4bdd5db-svp5v is unschedulable


Expected results:
Autoscaler logs doesn’t have such error msgs.

Additional info:

Comment 1 Michael McCune 2021-06-21 17:55:38 UTC

i think we just need to update the role for the machine-api service account. i am starting to investigate.

Comment 3 sunzhaohua 2021-06-25 01:03:24 UTC

Failed to verify

 oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-06-24-082405   True        False         61m     Cluster version is 4.9.0-0.nightly-2021-06-24-082405

E0624 16:04:42.548972       1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: csistoragecapacities.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-machine-api:cluster-autoscaler" cannot list resource "csistoragecapacities" in API group "storage.k8s.io" at the cluster scope
I0624 16:04:46.391301       1 static_autoscaler.go:319] 2 unregistered nodes present

Comment 4 sunzhaohua 2021-06-25 01:03:25 UTC

Failed to verify

 oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-06-24-082405   True        False         61m     Cluster version is 4.9.0-0.nightly-2021-06-24-082405

E0624 16:04:42.548972       1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: csistoragecapacities.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-machine-api:cluster-autoscaler" cannot list resource "csistoragecapacities" in API group "storage.k8s.io" at the cluster scope
I0624 16:04:46.391301       1 static_autoscaler.go:319] 2 unregistered nodes present

Comment 6 sunzhaohua 2021-06-28 08:58:52 UTC

verified
oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-06-27-223612   True        False         3m43s   Cluster version is 4.9.0-0.nightly-2021-06-27-223612

I0628 08:53:12.609381       1 scale_down.go:917] No candidates for scale down
I0628 08:53:22.628438       1 static_autoscaler.go:401] No unschedulable pods
I0628 08:53:22.628875       1 scale_down.go:917] No candidates for scale down
W0628 08:53:32.712601       1 clusterstate.go:432] AcceptableRanges have not been populated yet. Skip checking
I0628 08:53:33.288638       1 static_autoscaler.go:401] No unschedulable pods
I0628 08:53:33.691161       1 pre_filtering_processor.go:66] Skipping ip-10-0-242-215.us-east-2.compute.internal - node group min size reached
I0628 08:53:34.088723       1 scale_down.go:917] No candidates for scale down
I0628 08:53:44.909621       1 klogx.go:86] Pod openshift-machine-api/scale-up-6cc4bdd5db-2k59x is unschedulable

Comment 8 Michael McCune 2021-08-19 13:25:18 UTC

@ancollin i have created https://bugzilla.redhat.com/show_bug.cgi?id=1995595 and am working on the backports

Comment 11 errata-xmlrpc 2021-10-18 17:35:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Note You need to log in before you can comment on or make changes to this bug.