Bug 1995595

Summary: Autoscaler log report error “Failed to watch *v1.CSIDriver”
Product: OpenShift Container Platform Reporter: Michael McCune <mimccune>
Component: Cloud ComputeAssignee: Michael McCune <mimccune>
Cloud Compute sub component: Cluster Autoscaler QA Contact: sunzhaohua <zhsun>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: low CC: ancollin, aos-bugs, mimccune, zhsun
Version: 4.8   
Target Milestone: ---   
Target Release: 4.8.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: The cluster autoscaler did not have permission to read csidrivers.storage.k8s.io or csistoragecapacities.storage.k8s.io resources. Consequence: The cluster autoscaler would report errors in its logs stating that its service account does not have access to interact with these resources. Fix: The Role for the cluster autoscaler has been updated to include the new resources. Result: The cluster autoscaler no longer creates error messages in its logs about interacting with these resources.
Story Points: ---
Clone Of: 1973567 Environment:
Last Closed: 2021-09-21 08:01:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1973567    
Bug Blocks:    

Description Michael McCune 2021-08-19 13:17:31 UTC
+++ This bug was initially created as a clone of Bug #1973567 +++

Description of problem:
Autoscaler log report error “E0618 07:23:15.255288       1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-machine-api:cluster-autoscaler" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope”


Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-06-14-145150

How reproducible:
always

Steps to Reproduce:
1. Create a clusterautoscaler
2. Create a machineautoscaler 
3. Add workload
4. Check autoscaler logs

Actual results:
Autoscale logs always output error msg:

oc logs -f cluster-autoscaler-default-75c55cf9d7-kwtt8
I0618 06:25:05.288288       1 klogx.go:86] Pod openshift-machine-api/scale-up-6cc4bdd5db-69c8z is unschedulable
I0618 06:25:05.289389       1 klogx.go:86] Pod openshift-machine-api/scale-up-6cc4bdd5db-jlrxc is unschedulable
I0618 06:25:05.290411       1 klogx.go:86] Pod openshift-machine-api/scale-up-6cc4bdd5db-shjtv is unschedulable
I0618 06:25:05.294384       1 scale_up.go:453] No expansion options
I0618 06:25:05.297314       1 scale_down.go:917] No candidates for scale down
E0618 06:25:07.144282       1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-machine-api:cluster-autoscaler" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope
I0618 06:25:15.454335       1 klogx.go:86] Pod openshift-machine-api/scale-up-6cc4bdd5db-shjtv is unschedulable
I0618 06:25:15.454660       1 klogx.go:86] Pod openshift-machine-api/scale-up-6cc4bdd5db-svp5v is unschedulable


Expected results:
Autoscaler logs doesn’t have such error msgs.

Additional info:

--- Additional comment from Michael McCune on 2021-06-21 17:55:38 UTC ---

i think we just need to update the role for the machine-api service account. i am starting to investigate.

--- Additional comment from OpenShift Automated Release Tooling on 2021-06-23 15:59:52 UTC ---

Elliott changed bug status from MODIFIED to ON_QA.

--- Additional comment from sunzhaohua on 2021-06-25 01:03:24 UTC ---

Failed to verify

 oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-06-24-082405   True        False         61m     Cluster version is 4.9.0-0.nightly-2021-06-24-082405

E0624 16:04:42.548972       1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: csistoragecapacities.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-machine-api:cluster-autoscaler" cannot list resource "csistoragecapacities" in API group "storage.k8s.io" at the cluster scope
I0624 16:04:46.391301       1 static_autoscaler.go:319] 2 unregistered nodes present

--- Additional comment from sunzhaohua on 2021-06-25 01:03:25 UTC ---

Failed to verify

 oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-06-24-082405   True        False         61m     Cluster version is 4.9.0-0.nightly-2021-06-24-082405

E0624 16:04:42.548972       1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: csistoragecapacities.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-machine-api:cluster-autoscaler" cannot list resource "csistoragecapacities" in API group "storage.k8s.io" at the cluster scope
I0624 16:04:46.391301       1 static_autoscaler.go:319] 2 unregistered nodes present

--- Additional comment from OpenShift Automated Release Tooling on 2021-06-27 18:50:52 UTC ---

Elliott changed bug status from MODIFIED to ON_QA.

--- Additional comment from sunzhaohua on 2021-06-28 08:58:52 UTC ---

verified
oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-06-27-223612   True        False         3m43s   Cluster version is 4.9.0-0.nightly-2021-06-27-223612

I0628 08:53:12.609381       1 scale_down.go:917] No candidates for scale down
I0628 08:53:22.628438       1 static_autoscaler.go:401] No unschedulable pods
I0628 08:53:22.628875       1 scale_down.go:917] No candidates for scale down
W0628 08:53:32.712601       1 clusterstate.go:432] AcceptableRanges have not been populated yet. Skip checking
I0628 08:53:33.288638       1 static_autoscaler.go:401] No unschedulable pods
I0628 08:53:33.691161       1 pre_filtering_processor.go:66] Skipping ip-10-0-242-215.us-east-2.compute.internal - node group min size reached
I0628 08:53:34.088723       1 scale_down.go:917] No candidates for scale down
I0628 08:53:44.909621       1 klogx.go:86] Pod openshift-machine-api/scale-up-6cc4bdd5db-2k59x is unschedulable

--- Additional comment from Andrew Collins on 2021-08-18 21:31:16 UTC ---

I see this in 4.8.2. Need backport.

Comment 3 sunzhaohua 2021-09-13 06:05:15 UTC
verified

clusterversion: 4.8.0-0.nightly-2021-09-11-042202

I0913 06:00:27.083028       1 klogx.go:86] Pod openshift-machine-api/scale-up-6cc4bdd5db-xrll2 is unschedulable
I0913 06:00:27.083034       1 klogx.go:86] Pod openshift-machine-api/scale-up-6cc4bdd5db-4chmv is unschedulable
I0913 06:00:27.902044       1 scale_up.go:468] Best option to resize: MachineSet/openshift-machine-api/zhsun481-ccmf2-worker-us-east-2c
I0913 06:00:27.902070       1 scale_up.go:472] Estimated 19 nodes needed in MachineSet/openshift-machine-api/zhsun481-ccmf2-worker-us-east-2c
I0913 06:00:27.902084       1 scale_up.go:477] Capping size to max cluster total size (8)
I0913 06:00:28.082939       1 scale_up.go:586] Final scale-up plan: [{MachineSet/openshift-machine-api/zhsun481-ccmf2-worker-us-east-2c 1->3 (max: 3)}]
I0913 06:00:28.082995       1 scale_up.go:675] Scale-up: setting group MachineSet/openshift-machine-api/zhsun481-ccmf2-worker-us-east-2c size to 3
I0913 06:00:39.539427       1 static_autoscaler.go:319] 2 unregistered nodes present
I0913 06:00:40.101374       1 klogx.go:86] Pod openshift-machine-api/scale-up-6cc4bdd5db-nsjjg is unschedulable

Comment 6 errata-xmlrpc 2021-09-21 08:01:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.12 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3511