Bug 1745720

Summary: e2e-gcp-serial: [sig-storage] CSI Volumes [Driver: pd.csi.storage.gke.io][Serial] tests are failing
Product: OpenShift Container Platform Reporter: Abhinav Dahiya <adahiya>
Component: StorageAssignee: Christian Huffman <chuffman>
Status: CLOSED ERRATA QA Contact: Chao Yang <chaoyang>
Severity: high Docs Contact:
Priority: high    
Version: 4.2.0CC: aos-bugs, aos-storage-staff, bbennett, bleanhar, fbertina, jsafrane, piqin
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:37:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1744046    

Description Abhinav Dahiya 2019-08-26 17:53:00 UTC
Example:

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-gcp-serial-4.2/47


`csi-gce-pd-node-tn8wq/gce-pd-driver: W0826 01:21:01.269305       1 gce.go:126] GOOGLE_APPLICATION_CREDENTIALS env var not set
csi-gce-pd-node-tn8wq/gce-pd-driver: I0826 01:21:01.269351       1 gce.go:128] Using DefaultTokenSource &oauth2.reuseTokenSource{new:google.computeSource{account:""}, mu:sync.Mutex{state:0, sema:0x0}, t:(*oauth2.Token)(nil)}
csi-gce-pd-node-tn8wq/gce-pd-driver: E0826 01:21:02.293328       1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused
csi-gce-pd-node-tn8wq/gce-pd-driver: E0826 01:21:08.309261       1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused
csi-gce-pd-node-tn8wq/gce-pd-driver: E0826 01:21:13.301353       1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused
csi-gce-pd-node-tn8wq/gce-pd-driver: E0826 01:21:18.357211       1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused
csi-gce-pd-node-tn8wq/gce-pd-driver: E0826 01:21:23.349291       1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused
csi-gce-pd-node-tn8wq/gce-pd-driver: E0826 01:21:28.342201       1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused
csi-gce-pd-node-tn8wq/gce-pd-driver: E0826 01:21:33.333275       1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused
csi-gce-pd-node-tn8wq/gce-pd-driver: E0826 01:21:34.357182       1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused`

```
csi-gce-pd-node-tn8wq/gce-pd-driver: W0826 01:21:01.269305       1 gce.go:126] GOOGLE_APPLICATION_CREDENTIALS env var not set
csi-gce-pd-node-tn8wq/gce-pd-driver: I0826 01:21:01.269351       1 gce.go:128] Using DefaultTokenSource &oauth2.reuseTokenSource{new:google.computeSource{account:""}, mu:sync.Mutex{state:0, sema:0x0}, t:(*oauth2.Token)(nil)}
csi-gce-pd-node-tn8wq/gce-pd-driver: E0826 01:21:02.293328       1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused
csi-gce-pd-node-tn8wq/gce-pd-driver: E0826 01:21:08.309261       1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused
csi-gce-pd-node-tn8wq/gce-pd-driver: E0826 01:21:13.301353       1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused
csi-gce-pd-node-tn8wq/gce-pd-driver: E0826 01:21:18.357211       1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused
csi-gce-pd-node-tn8wq/gce-pd-driver: E0826 01:21:23.349291       1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused
csi-gce-pd-node-tn8wq/gce-pd-driver: E0826 01:21:28.342201       1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused
csi-gce-pd-node-tn8wq/gce-pd-driver: E0826 01:21:33.333275       1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused
csi-gce-pd-node-tn8wq/gce-pd-driver: E0826 01:21:34.357182       1 gce.go:135] error fetching initial token: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token: dial tcp 169.254.169.254:80: connect: connection refused
```

Like AWS and Azure, pods are not allowed access to metadata service unless they are host-network for security reasons.

Comment 1 Christian Huffman 2019-08-30 20:56:45 UTC
I spent a fair bit of time attempting to test this in the openshift/origin repository. From discussions with other developers, this isn't readily reproducible in kubernetes, as they use docker instead of cri-o.

I was able to resolve the issue by running the "[sig-storage] CSI Volumes [Driver: pd.csi.storage.gke.io][Serial] [Testpattern: Dynamic PV (default fs)] subPath should support non-existent path [Suite:openshift/conformance/serial] [Suite:k8s]" test and manually editing the DaemonSet and StatefulSet to include `hostNetwork: true`. I've submitted https://github.com/kubernetes/kubernetes/pull/82197 to address this in upstream. Once merged I'll backport the fix to openshift/origin.

Comment 2 Jan Safranek 2019-09-02 11:33:23 UTC
I'd prefer fixing CRI-O in bug #1718389 (aiming at 4.3) instead of changing of every test Pod. We can disable the offending CSI tests in 4.2.

Comment 4 Christian Huffman 2019-09-03 14:55:09 UTC
After discussing this during the bug triage meeting, we're going to close [1] and focus on addressing the CRI-O bug for 4.3. The offending CSI tests will be disabled in 4.2.

[1] https://github.com/kubernetes/kubernetes/pull/82197

Comment 5 Christian Huffman 2019-09-03 19:41:13 UTC
I've submitted https://github.com/openshift/origin/pull/23720 to disable these tests for OCP 4.2. These should be re-enabled once the CRI-O bug is resolved.

Comment 8 errata-xmlrpc 2019-10-16 06:37:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922