Description of problem: During installation of the CSI driver I see next error in the driver node controller: E0811 20:37:12.688424 1 azure_config.go:45] Failed to get cloud-config from secret: failed to get secret kube-system/azure-cloud-provider: secrets "azure-cloud-provider" is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:azure-disk-csi-driver-node-sa" cannot get resource "secrets" in API group "" in the namespace "kube-system" And all Node controllers are in CrashLoopBackOff State: ❯ oc get pods -n openshift-cluster-csi-drivers NAME READY STATUS RESTARTS AGE azure-disk-csi-driver-controller-7d75f479df-fphmv 11/11 Running 0 155m azure-disk-csi-driver-controller-7d75f479df-pztj8 11/11 Running 0 155m azure-disk-csi-driver-node-6hhhq 1/3 CrashLoopBackOff 36 (83s ago) 155m azure-disk-csi-driver-node-7g4w7 1/3 CrashLoopBackOff 36 (112s ago) 130m azure-disk-csi-driver-node-8ldxs 1/3 CrashLoopBackOff 36 (87s ago) 155m azure-disk-csi-driver-node-rrnzt 1/3 CrashLoopBackOff 36 (88s ago) 155m azure-disk-csi-driver-node-zwh7c 1/3 CrashLoopBackOff 36 (77s ago) 130m azure-disk-csi-driver-operator-6f748d8b45-fvs9p 1/1 Running 0 75m First, it seems like we can't allow the controller to read secrets in kube-system namespace. Second, there is no such secret in that namespace ❯ oc get secret -n kube-system azure-cloud-provider Error from server (NotFound): secrets "azure-cloud-provider" not found
This is the CSI driver trying to access the secret. First of all, the CSI is not supposed to access the API server, so we might want to work with upstream to avoid that. For the time being, we can check if we can pass the credentials to the driver in a different manner.
I installed OCP on Azure Stack Hub. I got the CSI driver working out of the box, with no configuration on my side. I don't see the CSI driver crashlooping: NAME READY STATUS RESTARTS AGE azure-disk-csi-driver-controller-5d95db56b6-24wk7 11/11 Running 15 (102m ago) 127m azure-disk-csi-driver-controller-5d95db56b6-k57cx 11/11 Running 6 (102m ago) 127m azure-disk-csi-driver-node-lxq8x 3/3 Running 0 127m azure-disk-csi-driver-node-r29tl 3/3 Running 0 10m azure-disk-csi-driver-node-rw7dx 3/3 Running 0 125m azure-disk-csi-driver-node-tdbkl 3/3 Running 0 105m azure-disk-csi-driver-node-wqrhp 3/3 Running 0 105m azure-disk-csi-driver-node-xtb6h 3/3 Running 0 127m azure-disk-csi-driver-operator-5d7d65d5b8-dszvq 1/1 Running 0 128m Server Version: 4.9.0-0.ci-2021-08-17-062049 Node driver logs show: I0817 14:08:47.572449 1 azure.go:62] reading cloud config from secret E0817 14:08:51.527786 1 azure_config.go:45] Failed to get cloud-config from secret: failed to get secret kube-system/azure-cloud-provider: Get "https://172.30.0.1:443/api/v1/namespaces/kube-system/secrets/azure-cloud-provider": dial tcp 172.30.0.1:443: connect: no route to host I0817 14:08:51.527878 1 azure.go:65] InitializeCloudFromSecret failed with error: InitializeCloudFromSecret: failed to get cloud config from secret kube-system/azure-cloud-provider: failed to get secret kube-system/azure-cloud-provider: Get "https://172.30.0.1:443/api/v1/namespaces/kube-system/secrets/azure-cloud-provider": dial tcp 172.30.0.1:443: connect: no route to host I0817 14:08:51.527919 1 azure.go:70] could not read cloud config from secret I0817 14:08:51.527955 1 azure.go:73] AZURE_CREDENTIAL_FILE env var set as /etc/kubernetes/cloud.conf I0817 14:08:51.528167 1 azure.go:92] read cloud config from file: /etc/kubernetes/cloud.conf successfully I0817 14:08:51.586542 1 azure_auth.go:119] azure: using client_id+client_secret to retrieve access token The driver is fine when it cannot get the secret from the API server, at least now. It still uses /etc/kubernetes/cloud.conf, projected from the host via HostPath volume.
Now that https://github.com/openshift/installer/pull/5138 is merged, nodes no longer have Azure credentials. Azure Disk CSI driver operator should create its own CredentialsRequest.
This bug makes storage not working on AzureStackHub.
Verified pass on 4.9.0-0.nightly-2021-09-07-201519 $ oc -n openshift-cluster-csi-drivers get pod NAME READY STATUS RESTARTS AGE azure-disk-csi-driver-controller-578586f546-r727k 11/11 Running 0 56m azure-disk-csi-driver-controller-578586f546-tnclh 11/11 Running 3 (46m ago) 56m azure-disk-csi-driver-node-485w5 3/3 Running 0 56m azure-disk-csi-driver-node-52jdw 3/3 Running 0 39m azure-disk-csi-driver-node-9nvvn 3/3 Running 0 39m azure-disk-csi-driver-node-d2rdh 3/3 Running 0 39m azure-disk-csi-driver-node-ffbwg 3/3 Running 0 51m azure-disk-csi-driver-node-qhx7r 3/3 Running 0 56m azure-disk-csi-driver-operator-6c77674cc7-cjvzb 1/1 Running 0 56m Creating pod with pvc, it works.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759