If an Azure Stack Hub environment uses an internal certificate authority, like our WWT environment does, the Azure CSI drivers fail to authenticate the ARM endpoint. This would also be the case if the cluster was installed with a proxy requiring an internal CA. We plan to create an enhancement for handling non-proxy cluster trust bundles, so as a temporary workaround to these problems we can use the trust bundle provided by the proxy method. How reproducible: Steps to Reproduce: 1. provide internal CA in the additionalTrustBundle in the install config 2. create manifests, edit /manifests/cluster-proxy-01-config.yaml so that .spec.trustedCA.name = user-ca-bundle 3. Run install Actual results: $ oc logs azure-disk-csi-driver-node-22jhv -c csi-driver -n openshift-cluster-csi-drivers I1206 19:12:25.809535 1 main.go:101] set up prometheus server on [::]:29604 I1206 19:12:25.809818 1 azuredisk.go:189] DRIVER INFORMATION: ------------------- Build Date: "2021-09-06T17:23:39Z" Compiler: gc Driver Name: disk.csi.azure.com Driver Version: v1.5.0 Git Commit: ade737312a66074a55c8a216af3c1bfac23337fb Go Version: go1.16.6 Platform: linux/amd64 Topology Key: topology.disk.csi.azure.com/zone Streaming logs below: I1206 19:12:25.812566 1 azure.go:62] reading cloud config from secret E1206 19:12:25.818991 1 azure_config.go:45] Failed to get cloud-config from secret: failed to get secret kube-system/azure-cloud-provider: secrets "azure-cloud-provider" is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:azure-disk-csi-driver-node-sa" cannot get resource "secrets" in API group "" in the namespace "kube-system" I1206 19:12:25.819012 1 azure.go:65] InitializeCloudFromSecret failed with error: InitializeCloudFromSecret: failed to get cloud config from secret kube-system/azure-cloud-provider: failed to get secret kube-system/azure-cloud-provider: secrets "azure-cloud-provider" is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:azure-disk-csi-driver-node-sa" cannot get resource "secrets" in API group "" in the namespace "kube-system" I1206 19:12:25.819018 1 azure.go:70] could not read cloud config from secret I1206 19:12:25.819023 1 azure.go:73] AZURE_CREDENTIAL_FILE env var set as /etc/kubernetes/cloud.conf I1206 19:12:25.819042 1 azure.go:92] read cloud config from file: /etc/kubernetes/cloud.conf successfully F1206 19:12:25.846564 1 azuredisk.go:192] failed to get Azure Cloud Provider, error: Get "https://management.mtcazs.wwtatc.com/metadata/endpoints?api-version=1.0": x509: certificate signed by unknown authority goroutine 1 [running]: k8s.io/klog/v2.stacks(0xc00013a001, 0xc000274200, 0xd8, 0x1f5) /go/src/github.com/openshift/azure-disk-csi-driver/vendor/k8s.io/klog/v2/klog.go:1021 +0xb9 k8s.io/klog/v2.(*loggingT).output(0x2bca440, 0xc000000003, 0x0, 0x0, 0xc0002791f0, 0x23ac9ef, 0xc, 0xc0, 0x0) /go/src/github.com/openshift/azure-disk-csi-driver/vendor/k8s.io/klog/v2/klog.go:970 +0x191 k8s.io/klog/v2.(*loggingT).printf(0x2bca440, 0xc000000003, 0x0, 0x0, 0x0, 0x0, 0x1d619c5, 0x2d, 0xc00059ba40, 0x1, ...) /go/src/github.com/openshift/azure-disk-csi-driver/vendor/k8s.io/klog/v2/klog.go:751 +0x191 k8s.io/klog/v2.Fatalf(...) /go/src/github.com/openshift/azure-disk-csi-driver/vendor/k8s.io/klog/v2/klog.go:1509 sigs.k8s.io/azuredisk-csi-driver/pkg/azuredisk.(*Driver).Run(0xc000196000, 0x7ffebbdd433d, 0x14, 0x0, 0x0, 0x1f80001) /go/src/github.com/openshift/azure-disk-csi-driver/pkg/azuredisk/azuredisk.go:192 +0x366 main.handle() /go/src/github.com/openshift/azure-disk-csi-driver/pkg/azurediskplugin/main.go:87 +0x130 main.main() /go/src/github.com/openshift/azure-disk-csi-driver/pkg/azurediskplugin/main.go:69 +0xae Additional info: https://docs.openshift.com/container-platform/4.8/networking/configuring-a-custom-pki.html#nw-proxy-configure-object_configuring-a-custom-pki Recent CCMO implementation: https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/136 Slack Conversation about unifying non-proxy CA: https://coreos.slack.com/archives/CBZHF4DHC/p1638477526269300
Please reach out to Casey Carson to get access to our WWT environment.
I am setting the priority/severity to urgent as this is blocking installs when a user requires an internal CI. My original comment mentions a workaround but to be clear, this BZ is about enabling the workaround. The workaround is currently broken.
Tested with 4.10.0-0.nightly-2022-01-11-065245 and it looks like the CSI driver is starting successfully. Cluster has some issues with some other operators but I think this one is fixed. [m@fedora ASH-IPI]$ oc get pods -n openshift-cluster-csi-drivers NAME READY STATUS RESTARTS AGE azure-disk-csi-driver-controller-6f7cbbcc84-hr57s 11/11 Running 0 28m azure-disk-csi-driver-controller-6f7cbbcc84-k54kp 11/11 Running 0 28m azure-disk-csi-driver-node-4n6jc 3/3 Running 0 28m azure-disk-csi-driver-node-kb9jx 3/3 Running 0 28m azure-disk-csi-driver-node-pkjxq 3/3 Running 0 28m azure-disk-csi-driver-operator-84546b8dc9-f7hld 1/1 Running 0 28m [m@fedora ASH-IPI]$ oc logs azure-disk-csi-driver-node-4n6jc -c csi-driver -n openshift-cluster-csi-drivers I0111 15:52:18.710511 1 main.go:112] set up prometheus server on [::]:29604 I0111 15:52:18.710852 1 azuredisk.go:142] DRIVER INFORMATION: ------------------- Build Date: "2021-12-16T19:29:19Z" Compiler: gc Driver Name: disk.csi.azure.com Driver Version: v1.9.0 Git Commit: c0142b0408f0f25e9d0ceffe1b2706a9e72d312c Go Version: go1.17.2 Platform: linux/amd64 Topology Key: topology.disk.csi.azure.com/zone Streaming logs below: I0111 15:52:18.710871 1 azuredisk.go:145] driver userAgent: disk.csi.azure.com/v1.9.0 gc/go1.17.2 (amd64-linux) I0111 15:52:18.711763 1 azure_disk_utils.go:129] reading cloud config from secret kube-system/azure-cloud-provider W0111 15:52:18.719149 1 azure_disk_utils.go:136] InitializeCloudFromSecret: failed to get cloud config from secret kube-system/azure-cloud-provider: failed to get secret kube-system/azure-cloud-provider: secrets "azure-cloud-provider" is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:azure-disk-csi-driver-node-sa" cannot get resource "secrets" in API group "" in the namespace "kube-system" I0111 15:52:18.719172 1 azure_disk_utils.go:141] could not read cloud config from secret kube-system/azure-cloud-provider I0111 15:52:18.719178 1 azure_disk_utils.go:144] AZURE_CREDENTIAL_FILE env var set as /etc/kubernetes/cloud.conf I0111 15:52:18.719194 1 azure_disk_utils.go:159] read cloud config from file: /etc/kubernetes/cloud.conf successfully I0111 15:52:18.752343 1 azure_auth.go:119] azure: using client_id+client_secret to retrieve access token I0111 15:52:18.752466 1 azure.go:692] Azure cloudprovider using try backoff: retries=6, exponent=1.500000, duration=6, jitter=1.000000 I0111 15:52:18.752612 1 azure_diskclient.go:67] Azure DisksClient using API version: 2019-03-01 I0111 15:52:18.752660 1 azure.go:909] attach/detach disk operation rate limit QPS: 6.000000, Bucket: 10 I0111 15:52:18.890176 1 mount_linux.go:202] Cannot run systemd-run, assuming non-systemd OS I0111 15:52:18.890199 1 driver.go:81] Enabling controller service capability: CREATE_DELETE_VOLUME I0111 15:52:18.890207 1 driver.go:81] Enabling controller service capability: PUBLISH_UNPUBLISH_VOLUME I0111 15:52:18.890210 1 driver.go:81] Enabling controller service capability: CREATE_DELETE_SNAPSHOT I0111 15:52:18.890213 1 driver.go:81] Enabling controller service capability: LIST_SNAPSHOTS I0111 15:52:18.890216 1 driver.go:81] Enabling controller service capability: CLONE_VOLUME I0111 15:52:18.890219 1 driver.go:81] Enabling controller service capability: EXPAND_VOLUME I0111 15:52:18.890222 1 driver.go:81] Enabling controller service capability: LIST_VOLUMES I0111 15:52:18.890224 1 driver.go:81] Enabling controller service capability: LIST_VOLUMES_PUBLISHED_NODES I0111 15:52:18.890227 1 driver.go:81] Enabling controller service capability: SINGLE_NODE_MULTI_WRITER I0111 15:52:18.890231 1 driver.go:100] Enabling volume access mode: SINGLE_NODE_WRITER I0111 15:52:18.890234 1 driver.go:100] Enabling volume access mode: SINGLE_NODE_READER_ONLY I0111 15:52:18.890237 1 driver.go:100] Enabling volume access mode: SINGLE_NODE_SINGLE_WRITER I0111 15:52:18.890240 1 driver.go:100] Enabling volume access mode: SINGLE_NODE_MULTI_WRITER I0111 15:52:18.890243 1 driver.go:91] Enabling node service capability: STAGE_UNSTAGE_VOLUME I0111 15:52:18.890246 1 driver.go:91] Enabling node service capability: EXPAND_VOLUME I0111 15:52:18.890249 1 driver.go:91] Enabling node service capability: GET_VOLUME_STATS I0111 15:52:18.890252 1 driver.go:91] Enabling node service capability: SINGLE_NODE_MULTI_WRITER I0111 15:52:18.890553 1 server.go:117] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"} I0111 15:52:20.811653 1 utils.go:95] GRPC call: /csi.v1.Identity/GetPluginInfo I0111 15:52:20.811678 1 utils.go:96] GRPC request: {} I0111 15:52:20.812855 1 utils.go:102] GRPC response: {"name":"disk.csi.azure.com","vendor_version":"v1.9.0"} I0111 15:52:21.786724 1 utils.go:95] GRPC call: /csi.v1.Node/NodeGetInfo I0111 15:52:21.786747 1 utils.go:96] GRPC request: {} I0111 15:52:22.359338 1 utils.go:102] GRPC response: {"accessible_topology":{"segments":{"topology.disk.csi.azure.com/zone":""}},"max_volumes_per_node":32,"node_id":"ipi410gahagan-8smcg-master-1"} I0111 15:52:24.218730 1 utils.go:95] GRPC call: /csi.v1.Identity/GetPluginInfo I0111 15:52:24.218754 1 utils.go:96] GRPC request: {} I0111 15:52:24.218810 1 utils.go:102] GRPC response: {"name":"disk.csi.azure.com","vendor_version":"v1.9.0"} [m@fedora ASH-IPI]$ ./openshift-install version ./openshift-install 4.10.0-0.nightly-2022-01-11-065245 built from commit 28cfc831cee01eb503a2340b4d5365fd281bf867 release image registry.ci.openshift.org/ocp/release@sha256:d9759e7c8ec5e2555419d84ff36aff2a4c8f9367236c18e722a3fe4d7c4f6dee release architecture amd64
Many thanks for the verification. @Mike Gahagan
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056