1995595 – Autoscaler log report error “Failed to watch *v1.CSIDriver”

Bug 1995595 - Autoscaler log report error “Failed to watch *v1.CSIDriver”

Summary: Autoscaler log report error “Failed to watch *v1.CSIDriver”

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cloud Compute
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	4.8.z
Assignee:	Michael McCune
QA Contact:	sunzhaohua
Docs Contact:
URL:
Whiteboard:
Depends On:	1973567
Blocks:
TreeView+	depends on / blocked

Reported:	2021-08-19 13:17 UTC by Michael McCune
Modified:	2021-09-21 08:01 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: The cluster autoscaler did not have permission to read csidrivers.storage.k8s.io or csistoragecapacities.storage.k8s.io resources. Consequence: The cluster autoscaler would report errors in its logs stating that its service account does not have access to interact with these resources. Fix: The Role for the cluster autoscaler has been updated to include the new resources. Result: The cluster autoscaler no longer creates error messages in its logs about interacting with these resources.
Clone Of:	1973567
Environment:
Last Closed:	2021-09-21 08:01:31 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-autoscaler-operator pull 220	0	None	None	None	2021-08-19 13:36:22 UTC
Red Hat Product Errata	RHBA-2021:3511	0	None	None	None	2021-09-21 08:01:45 UTC

Description Michael McCune 2021-08-19 13:17:31 UTC

+++ This bug was initially created as a clone of Bug #1973567 +++

Description of problem:
Autoscaler log report error “E0618 07:23:15.255288       1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-machine-api:cluster-autoscaler" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope”


Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-06-14-145150

How reproducible:
always

Steps to Reproduce:
1. Create a clusterautoscaler
2. Create a machineautoscaler 
3. Add workload
4. Check autoscaler logs

Actual results:
Autoscale logs always output error msg:

oc logs -f cluster-autoscaler-default-75c55cf9d7-kwtt8
I0618 06:25:05.288288       1 klogx.go:86] Pod openshift-machine-api/scale-up-6cc4bdd5db-69c8z is unschedulable
I0618 06:25:05.289389       1 klogx.go:86] Pod openshift-machine-api/scale-up-6cc4bdd5db-jlrxc is unschedulable
I0618 06:25:05.290411       1 klogx.go:86] Pod openshift-machine-api/scale-up-6cc4bdd5db-shjtv is unschedulable
I0618 06:25:05.294384       1 scale_up.go:453] No expansion options
I0618 06:25:05.297314       1 scale_down.go:917] No candidates for scale down
E0618 06:25:07.144282       1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-machine-api:cluster-autoscaler" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope
I0618 06:25:15.454335       1 klogx.go:86] Pod openshift-machine-api/scale-up-6cc4bdd5db-shjtv is unschedulable
I0618 06:25:15.454660       1 klogx.go:86] Pod openshift-machine-api/scale-up-6cc4bdd5db-svp5v is unschedulable


Expected results:
Autoscaler logs doesn’t have such error msgs.

Additional info:

--- Additional comment from Michael McCune on 2021-06-21 17:55:38 UTC ---

i think we just need to update the role for the machine-api service account. i am starting to investigate.

--- Additional comment from OpenShift Automated Release Tooling on 2021-06-23 15:59:52 UTC ---

Elliott changed bug status from MODIFIED to ON_QA.

--- Additional comment from sunzhaohua on 2021-06-25 01:03:24 UTC ---

Failed to verify

 oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-06-24-082405   True        False         61m     Cluster version is 4.9.0-0.nightly-2021-06-24-082405

E0624 16:04:42.548972       1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: csistoragecapacities.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-machine-api:cluster-autoscaler" cannot list resource "csistoragecapacities" in API group "storage.k8s.io" at the cluster scope
I0624 16:04:46.391301       1 static_autoscaler.go:319] 2 unregistered nodes present

--- Additional comment from sunzhaohua on 2021-06-25 01:03:25 UTC ---

Failed to verify

 oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-06-24-082405   True        False         61m     Cluster version is 4.9.0-0.nightly-2021-06-24-082405

E0624 16:04:42.548972       1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: csistoragecapacities.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-machine-api:cluster-autoscaler" cannot list resource "csistoragecapacities" in API group "storage.k8s.io" at the cluster scope
I0624 16:04:46.391301       1 static_autoscaler.go:319] 2 unregistered nodes present

--- Additional comment from OpenShift Automated Release Tooling on 2021-06-27 18:50:52 UTC ---

Elliott changed bug status from MODIFIED to ON_QA.

--- Additional comment from sunzhaohua on 2021-06-28 08:58:52 UTC ---

verified
oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-06-27-223612   True        False         3m43s   Cluster version is 4.9.0-0.nightly-2021-06-27-223612

I0628 08:53:12.609381       1 scale_down.go:917] No candidates for scale down
I0628 08:53:22.628438       1 static_autoscaler.go:401] No unschedulable pods
I0628 08:53:22.628875       1 scale_down.go:917] No candidates for scale down
W0628 08:53:32.712601       1 clusterstate.go:432] AcceptableRanges have not been populated yet. Skip checking
I0628 08:53:33.288638       1 static_autoscaler.go:401] No unschedulable pods
I0628 08:53:33.691161       1 pre_filtering_processor.go:66] Skipping ip-10-0-242-215.us-east-2.compute.internal - node group min size reached
I0628 08:53:34.088723       1 scale_down.go:917] No candidates for scale down
I0628 08:53:44.909621       1 klogx.go:86] Pod openshift-machine-api/scale-up-6cc4bdd5db-2k59x is unschedulable

--- Additional comment from Andrew Collins on 2021-08-18 21:31:16 UTC ---

I see this in 4.8.2. Need backport.

Comment 3 sunzhaohua 2021-09-13 06:05:15 UTC

verified

clusterversion: 4.8.0-0.nightly-2021-09-11-042202

I0913 06:00:27.083028       1 klogx.go:86] Pod openshift-machine-api/scale-up-6cc4bdd5db-xrll2 is unschedulable
I0913 06:00:27.083034       1 klogx.go:86] Pod openshift-machine-api/scale-up-6cc4bdd5db-4chmv is unschedulable
I0913 06:00:27.902044       1 scale_up.go:468] Best option to resize: MachineSet/openshift-machine-api/zhsun481-ccmf2-worker-us-east-2c
I0913 06:00:27.902070       1 scale_up.go:472] Estimated 19 nodes needed in MachineSet/openshift-machine-api/zhsun481-ccmf2-worker-us-east-2c
I0913 06:00:27.902084       1 scale_up.go:477] Capping size to max cluster total size (8)
I0913 06:00:28.082939       1 scale_up.go:586] Final scale-up plan: [{MachineSet/openshift-machine-api/zhsun481-ccmf2-worker-us-east-2c 1->3 (max: 3)}]
I0913 06:00:28.082995       1 scale_up.go:675] Scale-up: setting group MachineSet/openshift-machine-api/zhsun481-ccmf2-worker-us-east-2c size to 3
I0913 06:00:39.539427       1 static_autoscaler.go:319] 2 unregistered nodes present
I0913 06:00:40.101374       1 klogx.go:86] Pod openshift-machine-api/scale-up-6cc4bdd5db-nsjjg is unschedulable

Comment 6 errata-xmlrpc 2021-09-21 08:01:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.12 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3511

Note You need to log in before you can comment on or make changes to this bug.