Hide Forgot
Description of problem: When a cluster is configured to use a cluster-wide proxy [0] that makes use of a private CA certificate stored in the cluster's trusted CA bundle store [1], it does not use the trusted CA bundle when communicating to *googleapis.com addresses. This results in the gcp-pd-csi-driver-controller entering a CrashLoopBackOff state because it does not trust the CA used by the proxy: E1119 05:04:43.542047 1 gce.go:195] error fetching initial token: oauth2: cannot fetch token: Post "https://oauth2.googleapis.com/token": proxyconnect tcp: x509: certificate signed by unknown authority The gcp-pd-csi-driver should make use of a CA bundle that has the user-ca-bundle injected into it so that communications occur successfully via a proxy. [2] [0] https://docs.openshift.com/container-platform/4.8/networking/enable-cluster-wide-proxy.html [1] https://docs.openshift.com/container-platform/4.8/networking/configuring-a-custom-pki.html [2] https://docs.openshift.com/container-platform/4.8/networking/configuring-a-custom-pki.html#certificate-injection-using-operators_configuring-a-custom-pki Version-Release number of selected component (if applicable): 4.8 How reproducible: Steps to Reproduce: 1. Configure a cluster to use a cluster-wide proxy that uses a privately-signed CA for communications. 2. After MCO applies the change to the cluster, the pod will be in a crash looping state. Expected results: The pod should have no issues communicating via a cluster-wide proxy. Node Log (of failed PODs): From gcp-pd-csi-driver-controller-86b8f6c6d-5hvs2: I1119 05:15:24.483448 1 main.go:71] Driver vendor version v4.8.0-202108312109.p0.git.0b61889.assembly.stream-0-ge68012f-dirty I1119 05:15:24.483555 1 gce.go:83] Using GCE provider config <nil> I1119 05:15:24.483683 1 gce.go:134] GOOGLE_APPLICATION_CREDENTIALS env var set /etc/cloud-sa/service_account.json I1119 05:15:24.483694 1 gce.go:138] Using DefaultTokenSource &oauth2.reuseTokenSource{new:jwt.jwtSource{ctx:(*context.cancelCtx)(0xc00042e7c0), conf:(*jwt.Config)(0xc000248a00)}, mu:sync.Mutex{state:0, sema:0x0}, t:(*oauth2.Token)(nil)} E1119 05:15:24.529075 1 gce.go:195] error fetching initial token: oauth2: cannot fetch token: Post "https://oauth2.googleapis.com/token": proxyconnect tcp: x509: certificate signed by unknown authority E1119 05:15:29.538027 1 gce.go:195] error fetching initial token: oauth2: cannot fetch token: Post "https://oauth2.googleapis.com/token": proxyconnect tcp: x509: certificate signed by unknown authority E1119 05:15:34.538449 1 gce.go:195] error fetching initial token: oauth2: cannot fetch token: Post "https://oauth2.googleapis.com/token": proxyconnect tcp: x509: certificate signed by unknown authority E1119 05:15:39.538740 1 gce.go:195] error fetching initial token: oauth2: cannot fetch token: Post "https://oauth2.googleapis.com/token": proxyconnect tcp: x509: certificate signed by unknown authority E1119 05:15:44.538544 1 gce.go:195] error fetching initial token: oauth2: cannot fetch token: Post "https://oauth2.googleapis.com/token": proxyconnect tcp: x509: certificate signed by unknown authority E1119 05:15:49.537687 1 gce.go:195] error fetching initial token: oauth2: cannot fetch token: Post "https://oauth2.googleapis.com/token": proxyconnect tcp: x509: certificate signed by unknown authority E1119 05:15:54.537770 1 gce.go:195] error fetching initial token: oauth2: cannot fetch token: Post "https://oauth2.googleapis.com/token": proxyconnect tcp: x509: certificate signed by unknown authority E1119 05:15:54.545507 1 gce.go:195] error fetching initial token: oauth2: cannot fetch token: Post "https://oauth2.googleapis.com/token": proxyconnect tcp: x509: certificate signed by unknown authority F1119 05:15:54.545531 1 main.go:87] Failed to get cloud provider: timed out waiting for the condition
@Wei, would it be possible to verify this fix before merging it? I've tested it myself (with the environment that you helped me to get), but I'd like someone else to take a look before merging it.
1. Reproduced on 4.10.0-0.nightly-2022-01-05-181126 Config the proxy as: $ oc get proxy cluster -o yaml spec: httpProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3129 httpsProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3129 noProxy: test.no-proxy.com,.apps.wduan-0106c.qe.gcp.devcluster.openshift.com trustedCA: name: user-ca-bundle $ oc get pod -l app=gcp-pd-csi-driver-controller -n openshift-cluster-csi-drivers NAME READY STATUS RESTARTS AGE gcp-pd-csi-driver-controller-5f949d8bb8-hbqk2 9/10 CrashLoopBackOff 76 (2m49s ago) 3h47m gcp-pd-csi-driver-controller-5f949d8bb8-rsm2g 9/10 CrashLoopBackOff 66 (3m34s ago) 3h42m $ oc -n openshift-cluster-csi-drivers logs gcp-pd-csi-driver-controller-5f949d8bb8-hbqk2 -c csi-driver I0106 07:13:11.477353 1 main.go:73] Driver vendor version v4.10.0-202112171255.p0.g19e9a57.assembly.stream-0-ge32ee06-dirty I0106 07:13:11.490654 1 gce.go:84] Using GCE provider config <nil> I0106 07:13:11.491281 1 gce.go:135] GOOGLE_APPLICATION_CREDENTIALS env var set /etc/cloud-sa/service_account.json I0106 07:13:11.491693 1 gce.go:139] Using DefaultTokenSource &oauth2.reuseTokenSource{new:jwt.jwtSource{ctx:(*context.cancelCtx)(0xc00043e000), conf:(*jwt.Config)(0xc000442140)}, mu:sync.Mutex{state:0, sema:0x0}, t:(*oauth2.Token)(nil)} E0106 07:13:11.546518 1 gce.go:196] error fetching initial token: oauth2: cannot fetch token: Post "https://oauth2.googleapis.com/token": x509: certificate signed by unknown authority 2. Verified pass on 4.10.0-0.ci.test-2022-01-06-031931-ci-ln-p8isg1k-latest with pre-merged PR: openshift gcp-pd-csi-driver-operator pull 40 $ oc -n openshift-cluster-csi-drivers get pod NAME READY STATUS RESTARTS AGE gcp-pd-csi-driver-controller-798f8d89cd-f7mqj 10/10 Running 0 101m gcp-pd-csi-driver-controller-798f8d89cd-tj7xv 10/10 Running 0 95m gcp-pd-csi-driver-node-2p5dw 3/3 Running 3 106m gcp-pd-csi-driver-node-c8lr9 3/3 Running 3 105m gcp-pd-csi-driver-node-nkpmx 3/3 Running 3 105m gcp-pd-csi-driver-node-nqtbp 3/3 Running 3 105m gcp-pd-csi-driver-node-wzckk 3/3 Running 3 106m gcp-pd-csi-driver-operator-6f785b94b7-bwltc 1/1 Running 0 95m Create pvc with CSI driver and pod, pod is running. So update Verified: Tested.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056