Bug 1992875

Summary: [Azure CSI] Driver Node controller can't get config from the secret of Azure Stack Hub
Product: OpenShift Container Platform Reporter: Mike Fedosin <mfedosin>
Component: StorageAssignee: Jan Safranek <jsafrane>
Storage sub component: Operators QA Contact: Wei Duan <wduan>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: aos-bugs, fbertina, hongyli, jsafrane
Version: 4.9Keywords: Reopened
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-18 17:45:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mike Fedosin 2021-08-11 21:52:12 UTC
Description of problem:

During installation of the CSI driver I see next error in the driver node controller: 

E0811 20:37:12.688424       1 azure_config.go:45] Failed to get cloud-config from secret: failed to get secret kube-system/azure-cloud-provider: secrets "azure-cloud-provider" is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:azure-disk-csi-driver-node-sa" cannot get resource "secrets" in API group "" in the namespace "kube-system"


And all Node controllers are in CrashLoopBackOff State:
❯ oc get pods -n openshift-cluster-csi-drivers
NAME                                                READY   STATUS             RESTARTS        AGE
azure-disk-csi-driver-controller-7d75f479df-fphmv   11/11   Running            0               155m
azure-disk-csi-driver-controller-7d75f479df-pztj8   11/11   Running            0               155m
azure-disk-csi-driver-node-6hhhq                    1/3     CrashLoopBackOff   36 (83s ago)    155m
azure-disk-csi-driver-node-7g4w7                    1/3     CrashLoopBackOff   36 (112s ago)   130m
azure-disk-csi-driver-node-8ldxs                    1/3     CrashLoopBackOff   36 (87s ago)    155m
azure-disk-csi-driver-node-rrnzt                    1/3     CrashLoopBackOff   36 (88s ago)    155m
azure-disk-csi-driver-node-zwh7c                    1/3     CrashLoopBackOff   36 (77s ago)    130m
azure-disk-csi-driver-operator-6f748d8b45-fvs9p     1/1     Running            0               75m

First, it seems like we can't allow the controller to read secrets in kube-system namespace.

Second, there is no such secret in that namespace 
❯ oc get secret -n kube-system azure-cloud-provider
Error from server (NotFound): secrets "azure-cloud-provider" not found

Comment 1 Fabio Bertinatto 2021-08-17 14:11:40 UTC
This is the CSI driver trying to access the secret. First of all, the CSI is not supposed to access the API server, so we might want to work with upstream to avoid that.

For the time being, we can check if we can pass the credentials to the driver in a different manner.

Comment 3 Jan Safranek 2021-08-17 15:57:27 UTC
I installed OCP on Azure Stack Hub. I got the CSI driver working out of the box, with no configuration on my side. I don't see the CSI driver crashlooping:

NAME                                                READY   STATUS    RESTARTS        AGE
azure-disk-csi-driver-controller-5d95db56b6-24wk7   11/11   Running   15 (102m ago)   127m
azure-disk-csi-driver-controller-5d95db56b6-k57cx   11/11   Running   6 (102m ago)    127m
azure-disk-csi-driver-node-lxq8x                    3/3     Running   0               127m
azure-disk-csi-driver-node-r29tl                    3/3     Running   0               10m
azure-disk-csi-driver-node-rw7dx                    3/3     Running   0               125m
azure-disk-csi-driver-node-tdbkl                    3/3     Running   0               105m
azure-disk-csi-driver-node-wqrhp                    3/3     Running   0               105m
azure-disk-csi-driver-node-xtb6h                    3/3     Running   0               127m
azure-disk-csi-driver-operator-5d7d65d5b8-dszvq     1/1     Running   0               128m

Server Version: 4.9.0-0.ci-2021-08-17-062049


Node driver logs show:

I0817 14:08:47.572449       1 azure.go:62] reading cloud config from secret
E0817 14:08:51.527786       1 azure_config.go:45] Failed to get cloud-config from secret: failed to get secret kube-system/azure-cloud-provider: Get "https://172.30.0.1:443/api/v1/namespaces/kube-system/secrets/azure-cloud-provider": dial tcp 172.30.0.1:443: connect: no route to host
I0817 14:08:51.527878       1 azure.go:65] InitializeCloudFromSecret failed with error: InitializeCloudFromSecret: failed to get cloud config from secret kube-system/azure-cloud-provider: failed to get secret kube-system/azure-cloud-provider: Get "https://172.30.0.1:443/api/v1/namespaces/kube-system/secrets/azure-cloud-provider": dial tcp 172.30.0.1:443: connect: no route to host
I0817 14:08:51.527919       1 azure.go:70] could not read cloud config from secret
I0817 14:08:51.527955       1 azure.go:73] AZURE_CREDENTIAL_FILE env var set as /etc/kubernetes/cloud.conf
I0817 14:08:51.528167       1 azure.go:92] read cloud config from file: /etc/kubernetes/cloud.conf successfully
I0817 14:08:51.586542       1 azure_auth.go:119] azure: using client_id+client_secret to retrieve access token


The driver is fine when it cannot get the secret from the API server, at least now. It still uses /etc/kubernetes/cloud.conf, projected from the host via HostPath volume.

Comment 4 Jan Safranek 2021-08-30 12:10:15 UTC
Now that https://github.com/openshift/installer/pull/5138 is merged, nodes no longer have Azure credentials. Azure Disk CSI driver operator should create its own CredentialsRequest.

Comment 5 Jan Safranek 2021-08-31 12:03:21 UTC
This bug makes storage not working on AzureStackHub.

Comment 10 Wei Duan 2021-09-08 06:14:46 UTC
Verified pass on 4.9.0-0.nightly-2021-09-07-201519

$ oc -n openshift-cluster-csi-drivers get pod
NAME                                                READY   STATUS    RESTARTS      AGE
azure-disk-csi-driver-controller-578586f546-r727k   11/11   Running   0             56m
azure-disk-csi-driver-controller-578586f546-tnclh   11/11   Running   3 (46m ago)   56m
azure-disk-csi-driver-node-485w5                    3/3     Running   0             56m
azure-disk-csi-driver-node-52jdw                    3/3     Running   0             39m
azure-disk-csi-driver-node-9nvvn                    3/3     Running   0             39m
azure-disk-csi-driver-node-d2rdh                    3/3     Running   0             39m
azure-disk-csi-driver-node-ffbwg                    3/3     Running   0             51m
azure-disk-csi-driver-node-qhx7r                    3/3     Running   0             56m
azure-disk-csi-driver-operator-6c77674cc7-cjvzb     1/1     Running   0             56m

Creating pod with pvc, it works.

Comment 12 errata-xmlrpc 2021-10-18 17:45:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759