Bug 2024804 - gcp-pd-csi-driver does not use trusted-ca-bundle when cluster proxy configured
Summary: gcp-pd-csi-driver does not use trusted-ca-bundle when cluster proxy configured
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.8
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.10.0
Assignee: Fabio Bertinatto
QA Contact: Wei Duan
URL:
Whiteboard:
Depends On:
Blocks: 2038191
TreeView+ depends on / blocked
 
Reported: 2021-11-19 05:21 UTC by Matt Bargenquast
Modified: 2022-03-10 16:30 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:29:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift gcp-pd-csi-driver-operator pull 40 0 None open Bug 2024804: Add custom CA bundle support 2022-01-04 13:24:15 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:30:04 UTC

Description Matt Bargenquast 2021-11-19 05:21:18 UTC
Description of problem:

When a cluster is configured to use a cluster-wide proxy [0] that makes use of a private CA certificate stored in the cluster's trusted CA bundle store [1], it does not use the trusted CA bundle when communicating to *googleapis.com addresses.

This results in the gcp-pd-csi-driver-controller entering a CrashLoopBackOff state because it does not trust the CA used by the proxy:

E1119 05:04:43.542047       1 gce.go:195] error fetching initial token: oauth2: cannot fetch token: Post "https://oauth2.googleapis.com/token": proxyconnect tcp: x509: certificate signed by unknown authority

The gcp-pd-csi-driver should make use of a CA bundle that has the user-ca-bundle injected into it so that communications occur successfully via a proxy. [2]

[0] https://docs.openshift.com/container-platform/4.8/networking/enable-cluster-wide-proxy.html
[1] https://docs.openshift.com/container-platform/4.8/networking/configuring-a-custom-pki.html
[2] https://docs.openshift.com/container-platform/4.8/networking/configuring-a-custom-pki.html#certificate-injection-using-operators_configuring-a-custom-pki

Version-Release number of selected component (if applicable):
4.8

How reproducible:

Steps to Reproduce:
1. Configure a cluster to use a cluster-wide proxy that uses a privately-signed CA for communications.
2. After MCO applies the change to the cluster, the pod will be in a crash looping state.

Expected results:

The pod should have no issues communicating via a cluster-wide proxy.

Node Log (of failed PODs):

From gcp-pd-csi-driver-controller-86b8f6c6d-5hvs2:

I1119 05:15:24.483448       1 main.go:71] Driver vendor version v4.8.0-202108312109.p0.git.0b61889.assembly.stream-0-ge68012f-dirty
I1119 05:15:24.483555       1 gce.go:83] Using GCE provider config <nil>
I1119 05:15:24.483683       1 gce.go:134] GOOGLE_APPLICATION_CREDENTIALS env var set /etc/cloud-sa/service_account.json
I1119 05:15:24.483694       1 gce.go:138] Using DefaultTokenSource &oauth2.reuseTokenSource{new:jwt.jwtSource{ctx:(*context.cancelCtx)(0xc00042e7c0), conf:(*jwt.Config)(0xc000248a00)}, mu:sync.Mutex{state:0, sema:0x0}, t:(*oauth2.Token)(nil)}
E1119 05:15:24.529075       1 gce.go:195] error fetching initial token: oauth2: cannot fetch token: Post "https://oauth2.googleapis.com/token": proxyconnect tcp: x509: certificate signed by unknown authority
E1119 05:15:29.538027       1 gce.go:195] error fetching initial token: oauth2: cannot fetch token: Post "https://oauth2.googleapis.com/token": proxyconnect tcp: x509: certificate signed by unknown authority
E1119 05:15:34.538449       1 gce.go:195] error fetching initial token: oauth2: cannot fetch token: Post "https://oauth2.googleapis.com/token": proxyconnect tcp: x509: certificate signed by unknown authority
E1119 05:15:39.538740       1 gce.go:195] error fetching initial token: oauth2: cannot fetch token: Post "https://oauth2.googleapis.com/token": proxyconnect tcp: x509: certificate signed by unknown authority
E1119 05:15:44.538544       1 gce.go:195] error fetching initial token: oauth2: cannot fetch token: Post "https://oauth2.googleapis.com/token": proxyconnect tcp: x509: certificate signed by unknown authority
E1119 05:15:49.537687       1 gce.go:195] error fetching initial token: oauth2: cannot fetch token: Post "https://oauth2.googleapis.com/token": proxyconnect tcp: x509: certificate signed by unknown authority
E1119 05:15:54.537770       1 gce.go:195] error fetching initial token: oauth2: cannot fetch token: Post "https://oauth2.googleapis.com/token": proxyconnect tcp: x509: certificate signed by unknown authority
E1119 05:15:54.545507       1 gce.go:195] error fetching initial token: oauth2: cannot fetch token: Post "https://oauth2.googleapis.com/token": proxyconnect tcp: x509: certificate signed by unknown authority
F1119 05:15:54.545531       1 main.go:87] Failed to get cloud provider: timed out waiting for the condition

Comment 1 Fabio Bertinatto 2022-01-04 13:30:12 UTC
@Wei, would it be possible to verify this fix before merging it?

I've tested it myself (with the environment that you helped me to get), but I'd like someone else to take a look before merging it.

Comment 3 Wei Duan 2022-01-06 08:13:52 UTC
1. Reproduced on 4.10.0-0.nightly-2022-01-05-181126
Config the proxy as:
$ oc get proxy cluster -o yaml
spec:
  httpProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3129
  httpsProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3129
  noProxy: test.no-proxy.com,.apps.wduan-0106c.qe.gcp.devcluster.openshift.com
  trustedCA:
    name: user-ca-bundle


$ oc  get pod -l app=gcp-pd-csi-driver-controller -n openshift-cluster-csi-drivers
NAME                                            READY   STATUS             RESTARTS         AGE
gcp-pd-csi-driver-controller-5f949d8bb8-hbqk2   9/10    CrashLoopBackOff   76 (2m49s ago)   3h47m
gcp-pd-csi-driver-controller-5f949d8bb8-rsm2g   9/10    CrashLoopBackOff   66 (3m34s ago)   3h42m

$ oc -n openshift-cluster-csi-drivers logs gcp-pd-csi-driver-controller-5f949d8bb8-hbqk2 -c csi-driver
I0106 07:13:11.477353       1 main.go:73] Driver vendor version v4.10.0-202112171255.p0.g19e9a57.assembly.stream-0-ge32ee06-dirty
I0106 07:13:11.490654       1 gce.go:84] Using GCE provider config <nil>
I0106 07:13:11.491281       1 gce.go:135] GOOGLE_APPLICATION_CREDENTIALS env var set /etc/cloud-sa/service_account.json
I0106 07:13:11.491693       1 gce.go:139] Using DefaultTokenSource &oauth2.reuseTokenSource{new:jwt.jwtSource{ctx:(*context.cancelCtx)(0xc00043e000), conf:(*jwt.Config)(0xc000442140)}, mu:sync.Mutex{state:0, sema:0x0}, t:(*oauth2.Token)(nil)}
E0106 07:13:11.546518       1 gce.go:196] error fetching initial token: oauth2: cannot fetch token: Post "https://oauth2.googleapis.com/token": x509: certificate signed by unknown authority

2. Verified pass on 4.10.0-0.ci.test-2022-01-06-031931-ci-ln-p8isg1k-latest with pre-merged PR: openshift gcp-pd-csi-driver-operator pull 40
$ oc -n openshift-cluster-csi-drivers get pod
NAME                                            READY   STATUS    RESTARTS   AGE
gcp-pd-csi-driver-controller-798f8d89cd-f7mqj   10/10   Running   0          101m
gcp-pd-csi-driver-controller-798f8d89cd-tj7xv   10/10   Running   0          95m
gcp-pd-csi-driver-node-2p5dw                    3/3     Running   3          106m
gcp-pd-csi-driver-node-c8lr9                    3/3     Running   3          105m
gcp-pd-csi-driver-node-nkpmx                    3/3     Running   3          105m
gcp-pd-csi-driver-node-nqtbp                    3/3     Running   3          105m
gcp-pd-csi-driver-node-wzckk                    3/3     Running   3          106m
gcp-pd-csi-driver-operator-6f785b94b7-bwltc     1/1     Running   0          95m

Create pvc with CSI driver and pod, pod is running.

So update Verified: Tested.

Comment 9 errata-xmlrpc 2022-03-10 16:29:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.