2029570 – Azure Stack Hub: CSI Driver does not use user-ca-bundle

Bug 2029570 - Azure Stack Hub: CSI Driver does not use user-ca-bundle

Summary: Azure Stack Hub: CSI Driver does not use user-ca-bundle

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Fabio Bertinatto
QA Contact:	Wei Duan
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2029571
TreeView+	depends on / blocked

Reported:	2021-12-06 19:29 UTC by Patrick Dillon
Modified:	2022-03-10 16:32 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	2029571 (view as bug list)
Environment:
Last Closed:	2022-03-10 16:32:18 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift azure-disk-csi-driver-operator pull 38	0	None	open	[WIP] Bug 2029570: Add custom CA bundle support	2021-12-23 19:04:35 UTC
Red Hat Product Errata	RHSA-2022:0056	0	None	None	None	2022-03-10 16:32:33 UTC

Description Patrick Dillon 2021-12-06 19:29:01 UTC

If an Azure Stack Hub environment uses an internal certificate authority, like our WWT environment does, the Azure CSI drivers fail to authenticate the ARM endpoint. This would also be the case if the cluster was installed with a proxy requiring an internal CA.

We plan to create an enhancement for handling non-proxy cluster trust bundles, so as a temporary workaround to these problems we can use the trust bundle provided by the proxy method. 

How reproducible:

Steps to Reproduce:
1. provide internal CA in the additionalTrustBundle in the install config
2. create manifests, edit /manifests/cluster-proxy-01-config.yaml so that .spec.trustedCA.name = user-ca-bundle
3. Run install

Actual results:
$ oc logs azure-disk-csi-driver-node-22jhv -c csi-driver -n openshift-cluster-csi-drivers
I1206 19:12:25.809535       1 main.go:101] set up prometheus server on [::]:29604
I1206 19:12:25.809818       1 azuredisk.go:189] 
DRIVER INFORMATION:
-------------------
Build Date: "2021-09-06T17:23:39Z"
Compiler: gc
Driver Name: disk.csi.azure.com
Driver Version: v1.5.0
Git Commit: ade737312a66074a55c8a216af3c1bfac23337fb
Go Version: go1.16.6
Platform: linux/amd64
Topology Key: topology.disk.csi.azure.com/zone

Streaming logs below:
I1206 19:12:25.812566       1 azure.go:62] reading cloud config from secret
E1206 19:12:25.818991       1 azure_config.go:45] Failed to get cloud-config from secret: failed to get secret kube-system/azure-cloud-provider: secrets "azure-cloud-provider" is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:azure-disk-csi-driver-node-sa" cannot get resource "secrets" in API group "" in the namespace "kube-system"
I1206 19:12:25.819012       1 azure.go:65] InitializeCloudFromSecret failed with error: InitializeCloudFromSecret: failed to get cloud config from secret kube-system/azure-cloud-provider: failed to get secret kube-system/azure-cloud-provider: secrets "azure-cloud-provider" is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:azure-disk-csi-driver-node-sa" cannot get resource "secrets" in API group "" in the namespace "kube-system"
I1206 19:12:25.819018       1 azure.go:70] could not read cloud config from secret
I1206 19:12:25.819023       1 azure.go:73] AZURE_CREDENTIAL_FILE env var set as /etc/kubernetes/cloud.conf
I1206 19:12:25.819042       1 azure.go:92] read cloud config from file: /etc/kubernetes/cloud.conf successfully
F1206 19:12:25.846564       1 azuredisk.go:192] failed to get Azure Cloud Provider, error: Get "https://management.mtcazs.wwtatc.com/metadata/endpoints?api-version=1.0": x509: certificate signed by unknown authority
goroutine 1 [running]:
k8s.io/klog/v2.stacks(0xc00013a001, 0xc000274200, 0xd8, 0x1f5)
	/go/src/github.com/openshift/azure-disk-csi-driver/vendor/k8s.io/klog/v2/klog.go:1021 +0xb9
k8s.io/klog/v2.(*loggingT).output(0x2bca440, 0xc000000003, 0x0, 0x0, 0xc0002791f0, 0x23ac9ef, 0xc, 0xc0, 0x0)
	/go/src/github.com/openshift/azure-disk-csi-driver/vendor/k8s.io/klog/v2/klog.go:970 +0x191
k8s.io/klog/v2.(*loggingT).printf(0x2bca440, 0xc000000003, 0x0, 0x0, 0x0, 0x0, 0x1d619c5, 0x2d, 0xc00059ba40, 0x1, ...)
	/go/src/github.com/openshift/azure-disk-csi-driver/vendor/k8s.io/klog/v2/klog.go:751 +0x191
k8s.io/klog/v2.Fatalf(...)
	/go/src/github.com/openshift/azure-disk-csi-driver/vendor/k8s.io/klog/v2/klog.go:1509
sigs.k8s.io/azuredisk-csi-driver/pkg/azuredisk.(*Driver).Run(0xc000196000, 0x7ffebbdd433d, 0x14, 0x0, 0x0, 0x1f80001)
	/go/src/github.com/openshift/azure-disk-csi-driver/pkg/azuredisk/azuredisk.go:192 +0x366
main.handle()
	/go/src/github.com/openshift/azure-disk-csi-driver/pkg/azurediskplugin/main.go:87 +0x130
main.main()
	/go/src/github.com/openshift/azure-disk-csi-driver/pkg/azurediskplugin/main.go:69 +0xae



Additional info:
https://docs.openshift.com/container-platform/4.8/networking/configuring-a-custom-pki.html#nw-proxy-configure-object_configuring-a-custom-pki

Recent CCMO implementation: https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/136

Slack Conversation about unifying non-proxy CA: https://coreos.slack.com/archives/CBZHF4DHC/p1638477526269300

Comment 1 Patrick Dillon 2021-12-06 19:30:19 UTC

Please reach out to Casey Carson to get access to our WWT environment.

Comment 2 Patrick Dillon 2021-12-15 16:45:32 UTC

I am setting the priority/severity to urgent as this is blocking installs when a user requires an internal CI. My original comment mentions a workaround but to be clear, this BZ is about enabling the workaround. The workaround is currently broken.

Comment 8 Mike Gahagan 2022-01-11 16:24:17 UTC

Tested with 4.10.0-0.nightly-2022-01-11-065245 and it looks like the CSI driver is starting successfully. Cluster has some issues with some other operators but I think this one is fixed.


[m@fedora ASH-IPI]$ oc get pods -n openshift-cluster-csi-drivers
NAME                                                READY   STATUS    RESTARTS   AGE
azure-disk-csi-driver-controller-6f7cbbcc84-hr57s   11/11   Running   0          28m
azure-disk-csi-driver-controller-6f7cbbcc84-k54kp   11/11   Running   0          28m
azure-disk-csi-driver-node-4n6jc                    3/3     Running   0          28m
azure-disk-csi-driver-node-kb9jx                    3/3     Running   0          28m
azure-disk-csi-driver-node-pkjxq                    3/3     Running   0          28m
azure-disk-csi-driver-operator-84546b8dc9-f7hld     1/1     Running   0          28m
[m@fedora ASH-IPI]$ oc logs azure-disk-csi-driver-node-4n6jc -c csi-driver -n openshift-cluster-csi-drivers
I0111 15:52:18.710511       1 main.go:112] set up prometheus server on [::]:29604
I0111 15:52:18.710852       1 azuredisk.go:142] 
DRIVER INFORMATION:
-------------------
Build Date: "2021-12-16T19:29:19Z"
Compiler: gc
Driver Name: disk.csi.azure.com
Driver Version: v1.9.0
Git Commit: c0142b0408f0f25e9d0ceffe1b2706a9e72d312c
Go Version: go1.17.2
Platform: linux/amd64
Topology Key: topology.disk.csi.azure.com/zone

Streaming logs below:
I0111 15:52:18.710871       1 azuredisk.go:145] driver userAgent: disk.csi.azure.com/v1.9.0 gc/go1.17.2 (amd64-linux)
I0111 15:52:18.711763       1 azure_disk_utils.go:129] reading cloud config from secret kube-system/azure-cloud-provider
W0111 15:52:18.719149       1 azure_disk_utils.go:136] InitializeCloudFromSecret: failed to get cloud config from secret kube-system/azure-cloud-provider: failed to get secret kube-system/azure-cloud-provider: secrets "azure-cloud-provider" is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:azure-disk-csi-driver-node-sa" cannot get resource "secrets" in API group "" in the namespace "kube-system"
I0111 15:52:18.719172       1 azure_disk_utils.go:141] could not read cloud config from secret kube-system/azure-cloud-provider
I0111 15:52:18.719178       1 azure_disk_utils.go:144] AZURE_CREDENTIAL_FILE env var set as /etc/kubernetes/cloud.conf
I0111 15:52:18.719194       1 azure_disk_utils.go:159] read cloud config from file: /etc/kubernetes/cloud.conf successfully
I0111 15:52:18.752343       1 azure_auth.go:119] azure: using client_id+client_secret to retrieve access token
I0111 15:52:18.752466       1 azure.go:692] Azure cloudprovider using try backoff: retries=6, exponent=1.500000, duration=6, jitter=1.000000
I0111 15:52:18.752612       1 azure_diskclient.go:67] Azure DisksClient using API version: 2019-03-01
I0111 15:52:18.752660       1 azure.go:909] attach/detach disk operation rate limit QPS: 6.000000, Bucket: 10
I0111 15:52:18.890176       1 mount_linux.go:202] Cannot run systemd-run, assuming non-systemd OS
I0111 15:52:18.890199       1 driver.go:81] Enabling controller service capability: CREATE_DELETE_VOLUME
I0111 15:52:18.890207       1 driver.go:81] Enabling controller service capability: PUBLISH_UNPUBLISH_VOLUME
I0111 15:52:18.890210       1 driver.go:81] Enabling controller service capability: CREATE_DELETE_SNAPSHOT
I0111 15:52:18.890213       1 driver.go:81] Enabling controller service capability: LIST_SNAPSHOTS
I0111 15:52:18.890216       1 driver.go:81] Enabling controller service capability: CLONE_VOLUME
I0111 15:52:18.890219       1 driver.go:81] Enabling controller service capability: EXPAND_VOLUME
I0111 15:52:18.890222       1 driver.go:81] Enabling controller service capability: LIST_VOLUMES
I0111 15:52:18.890224       1 driver.go:81] Enabling controller service capability: LIST_VOLUMES_PUBLISHED_NODES
I0111 15:52:18.890227       1 driver.go:81] Enabling controller service capability: SINGLE_NODE_MULTI_WRITER
I0111 15:52:18.890231       1 driver.go:100] Enabling volume access mode: SINGLE_NODE_WRITER
I0111 15:52:18.890234       1 driver.go:100] Enabling volume access mode: SINGLE_NODE_READER_ONLY
I0111 15:52:18.890237       1 driver.go:100] Enabling volume access mode: SINGLE_NODE_SINGLE_WRITER
I0111 15:52:18.890240       1 driver.go:100] Enabling volume access mode: SINGLE_NODE_MULTI_WRITER
I0111 15:52:18.890243       1 driver.go:91] Enabling node service capability: STAGE_UNSTAGE_VOLUME
I0111 15:52:18.890246       1 driver.go:91] Enabling node service capability: EXPAND_VOLUME
I0111 15:52:18.890249       1 driver.go:91] Enabling node service capability: GET_VOLUME_STATS
I0111 15:52:18.890252       1 driver.go:91] Enabling node service capability: SINGLE_NODE_MULTI_WRITER
I0111 15:52:18.890553       1 server.go:117] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"}
I0111 15:52:20.811653       1 utils.go:95] GRPC call: /csi.v1.Identity/GetPluginInfo
I0111 15:52:20.811678       1 utils.go:96] GRPC request: {}
I0111 15:52:20.812855       1 utils.go:102] GRPC response: {"name":"disk.csi.azure.com","vendor_version":"v1.9.0"}
I0111 15:52:21.786724       1 utils.go:95] GRPC call: /csi.v1.Node/NodeGetInfo
I0111 15:52:21.786747       1 utils.go:96] GRPC request: {}
I0111 15:52:22.359338       1 utils.go:102] GRPC response: {"accessible_topology":{"segments":{"topology.disk.csi.azure.com/zone":""}},"max_volumes_per_node":32,"node_id":"ipi410gahagan-8smcg-master-1"}
I0111 15:52:24.218730       1 utils.go:95] GRPC call: /csi.v1.Identity/GetPluginInfo
I0111 15:52:24.218754       1 utils.go:96] GRPC request: {}
I0111 15:52:24.218810       1 utils.go:102] GRPC response: {"name":"disk.csi.azure.com","vendor_version":"v1.9.0"}
[m@fedora ASH-IPI]$ ./openshift-install version
./openshift-install 4.10.0-0.nightly-2022-01-11-065245
built from commit 28cfc831cee01eb503a2340b4d5365fd281bf867
release image registry.ci.openshift.org/ocp/release@sha256:d9759e7c8ec5e2555419d84ff36aff2a4c8f9367236c18e722a3fe4d7c4f6dee
release architecture amd64

Comment 9 Wei Duan 2022-01-12 05:14:38 UTC

Many thanks for the verification. @Mike Gahagan

Comment 13 errata-xmlrpc 2022-03-10 16:32:18 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Note You need to log in before you can comment on or make changes to this bug.