+++ This bug was initially created as a clone of Bug #1749882 +++ Example: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-gcp-serial-4.2/101 We see a variety of tests failing, from CSI, Local Volume, and Volume Metrics, such as: [sig-storage] [Serial] Volume metrics should create metrics for total time taken in volume operations in P/V Controller [Suite:openshift/conformance/serial] [Suite:k8s] [sig-storage] In-tree Volumes [Driver: local][LocalVolumeType: gce-localssd-scsi-fs] [Serial] [Testpattern: Pre-provisioned PV (ext3)] volumes should allow exec of files on the volume [Suite:openshift/conformance/serial] [Suite:k8s] [sig-storage] CSI Volumes [Driver: pd.csi.storage.gke.io][Serial] [Testpattern: Dynamic PV (default fs)] provisioning should create and delete block persistent volumes [Suite:openshift/conformance/serial] [Suite:k8s] In addition, this seems to be similar to the CRI-O hostNetwork issue, as the following is found in the build.log: csi-gce-pd-node-rv4dv/gce-pd-driver: W0906 11:10:28.546739 1 gce.go:126] GOOGLE_APPLICATION_CREDENTIALS env var not set csi-gce-pd-node-rv4dv/gce-pd-driver: I0906 11:10:28.546776 1 gce.go:128] Using DefaultTokenSource &oauth2.reuseTokenSource{new:google.computeSource{account:""}, mu:sync.Mutex{state:0, sema:0x0}, t:(*oauth2.Token)(nil)} csi-gce-pd-node-rv4dv/gce-pd-driver: E0906 11:10:29.570703 1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused csi-gce-pd-node-rv4dv/gce-pd-driver: E0906 11:10:35.586678 1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused csi-gce-pd-node-rv4dv/gce-pd-driver: E0906 11:10:40.578686 1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused csi-gce-pd-node-rv4dv/gce-pd-driver: E0906 11:10:45.634635 1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused csi-gce-pd-node-rv4dv/gce-pd-driver: E0906 11:10:50.626754 1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused csi-gce-pd-node-rv4dv/gce-pd-driver: E0906 11:10:55.618659 1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused csi-gce-pd-node-rv4dv/gce-pd-driver: E0906 11:11:00.610726 1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused csi-gce-pd-node-rv4dv/gce-pd-driver: E0906 11:11:01.634650 1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused csi-gce-pd-node-rv4dv/gce-pd-driver: F0906 11:11:01.634690 1 main.go:62] Failed to get cloud provider: timed out waiting for the condition Note: The CSI tests should have been disabled previously under https://github.com/openshift/origin/pull/23720 .
We know pr 23720 was inadequate and are addressing more CSI tests in https://github.com/openshift/origin/pull/23760
There are a number of CSI tests still failing even after the majority were disabled. The fastest way to getting this working may be to turn on hostNetwork for the entire serial gcp suite. If you look here you can see the definition of the serial job and which template it uses currently: https://github.com/openshift/release/blob/69ffbdb41b4efdb435c97b1512fb671fe74e2246/ci-operator/jobs/openshift/release/openshift-release-release-4.2-periodics.yaml#L1051 You wouldn't want to edit that template because it is used by all the test jobs. I'm assuming you'd create a new template like this one but add in the hostNetwork configuration in the spec. Here's an example: https://github.com/openshift/release/blob/69ffbdb41b4efdb435c97b1512fb671fe74e2246/ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml#L70 If that fails another option would be to create a separate test suite for running the serial tests that require hostNetwork.
Eric tells me CSI is not supported on GCP. In that case please just use this bug to track the fix to disable the appropriate tests.
I believe the CSI failure above is #1750926 This should be used to look into any problems with local volumes and volume metrics on GCP. CSI tests should be disabled.
I think the offending tests have now been skipped. Have a look at recent results.
gce-localssd-scsi-fs tests are being skipped. No recent failures: https://testgrid.k8s.io/redhat-openshift-release-informing#redhat-canary-openshift-ocp-installer-e2e-gcp-serial-4.2&show-stale-tests=
All the failed teat cases are skipped in the 4.2.0-0.nightly-2019-09-13-032440
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922
*** Bug 1749882 has been marked as a duplicate of this bug. ***