Bug 1750968 - [4.2] e2e-gcp-serial: [sig-storage] ... [Serial] tests are failing
Summary: [4.2] e2e-gcp-serial: [sig-storage] ... [Serial] tests are failing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.2.0
Assignee: Hemant Kumar
QA Contact: Qin Ping
URL:
Whiteboard:
: 1749882 (view as bug list)
Depends On: 1749882
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-10 21:03 UTC by Brenton Leanhardt
Modified: 2019-11-26 15:28 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1749882
Environment:
Last Closed: 2019-10-16 06:40:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:41:04 UTC

Description Brenton Leanhardt 2019-09-10 21:03:55 UTC
+++ This bug was initially created as a clone of Bug #1749882 +++

Example:

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-gcp-serial-4.2/101

We see a variety of tests failing, from CSI, Local Volume, and Volume Metrics, such as:

[sig-storage] [Serial] Volume metrics should create metrics for total time taken in volume operations in P/V Controller [Suite:openshift/conformance/serial] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: local][LocalVolumeType: gce-localssd-scsi-fs] [Serial] [Testpattern: Pre-provisioned PV (ext3)] volumes should allow exec of files on the volume [Suite:openshift/conformance/serial] [Suite:k8s] 
[sig-storage] CSI Volumes [Driver: pd.csi.storage.gke.io][Serial] [Testpattern: Dynamic PV (default fs)] provisioning should create and delete block persistent volumes [Suite:openshift/conformance/serial] [Suite:k8s] 

In addition, this seems to be similar to the CRI-O hostNetwork issue, as the following is found in the build.log:

csi-gce-pd-node-rv4dv/gce-pd-driver: W0906 11:10:28.546739       1 gce.go:126] GOOGLE_APPLICATION_CREDENTIALS env var not set
csi-gce-pd-node-rv4dv/gce-pd-driver: I0906 11:10:28.546776       1 gce.go:128] Using DefaultTokenSource &oauth2.reuseTokenSource{new:google.computeSource{account:""}, mu:sync.Mutex{state:0, sema:0x0}, t:(*oauth2.Token)(nil)}
csi-gce-pd-node-rv4dv/gce-pd-driver: E0906 11:10:29.570703       1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused
csi-gce-pd-node-rv4dv/gce-pd-driver: E0906 11:10:35.586678       1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused
csi-gce-pd-node-rv4dv/gce-pd-driver: E0906 11:10:40.578686       1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused
csi-gce-pd-node-rv4dv/gce-pd-driver: E0906 11:10:45.634635       1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused
csi-gce-pd-node-rv4dv/gce-pd-driver: E0906 11:10:50.626754       1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused
csi-gce-pd-node-rv4dv/gce-pd-driver: E0906 11:10:55.618659       1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused
csi-gce-pd-node-rv4dv/gce-pd-driver: E0906 11:11:00.610726       1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused
csi-gce-pd-node-rv4dv/gce-pd-driver: E0906 11:11:01.634650       1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused
csi-gce-pd-node-rv4dv/gce-pd-driver: F0906 11:11:01.634690       1 main.go:62] Failed to get cloud provider: timed out waiting for the condition

Note: The CSI tests should have been disabled previously under https://github.com/openshift/origin/pull/23720 .

Comment 1 Eric Paris 2019-09-10 21:06:03 UTC
We know pr 23720 was inadequate and are addressing more CSI tests in https://github.com/openshift/origin/pull/23760

Comment 2 Brenton Leanhardt 2019-09-10 21:06:51 UTC
There are a number of CSI tests still failing even after the majority were disabled.  The fastest way to getting this working may be to turn on hostNetwork for the entire serial gcp suite.  If you look here you can see the definition of the serial job and which template it uses currently:

https://github.com/openshift/release/blob/69ffbdb41b4efdb435c97b1512fb671fe74e2246/ci-operator/jobs/openshift/release/openshift-release-release-4.2-periodics.yaml#L1051

You wouldn't want to edit that template because it is used by all the test jobs.  I'm assuming you'd create a new template like this one but add in the hostNetwork configuration in the spec.  Here's an example:

https://github.com/openshift/release/blob/69ffbdb41b4efdb435c97b1512fb671fe74e2246/ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml#L70

If that fails another option would be to create a separate test suite for running the serial tests that require hostNetwork.

Comment 3 Brenton Leanhardt 2019-09-10 21:10:55 UTC
Eric tells me CSI is not supported on GCP.  In that case please just use this bug to track the fix to disable the appropriate tests.

Comment 4 Eric Paris 2019-09-10 21:16:24 UTC
I believe the CSI failure above is #1750926

This should be used to look into any problems with local volumes and volume metrics on GCP.

CSI tests should be disabled.

Comment 14 David Eads 2019-09-13 13:53:13 UTC
I think the offending tests have now been skipped. Have a look at recent results.

Comment 15 Fabio Bertinatto 2019-09-17 12:24:13 UTC
gce-localssd-scsi-fs tests are being skipped. No recent failures:

https://testgrid.k8s.io/redhat-openshift-release-informing#redhat-canary-openshift-ocp-installer-e2e-gcp-serial-4.2&show-stale-tests=

Comment 17 Qin Ping 2019-09-18 06:37:51 UTC
All the failed teat cases are skipped in the 4.2.0-0.nightly-2019-09-13-032440

Comment 18 errata-xmlrpc 2019-10-16 06:40:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Comment 19 Christian Huffman 2019-11-26 15:28:37 UTC
*** Bug 1749882 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.