Bug 1753930

Summary:	[proxy] Need to add "metadata.google.internal." to noProxy list on GCP
Product:	OpenShift Container Platform	Reporter:	Gaoyun Pei <gpei>
Component:	Installer	Assignee:	Daneyon Hansen <dhansen>
Installer sub component:	openshift-installer	QA Contact:	Gaoyun Pei <gpei>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	high	CC:	dhansen, sdodson, wking
Version:	4.2.0	Keywords:	TestBlocker
Target Milestone:	---
Target Release:	4.3.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1754049 1759245 (view as bug list)		Environment:
Last Closed:	2020-01-23 11:06:39 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1759245

Comment 1 Daneyon Hansen 2019-09-20 16:15:37 UTC

Gaoyun can you remove trailing dot from noProxy: metadata.google.internal. and try again? Your install-config proxy should look like:

proxy:
  httpProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3128
  httpsProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3128
  noProxy: metadata.google.internal

Comment 2 Daneyon Hansen 2019-09-20 16:27:33 UTC

PTAL at the following code:

https://github.com/openshift/installer/blob/master/pkg/types/validation/installconfig.go#L265-L274
https://github.com/openshift/installer/blob/master/pkg/validate/validate.go#L66-L72

This is not a bug, but expected behavior. noProxy domain names must not contain a trailing dot.

Comment 3 Daneyon Hansen 2019-09-20 16:42:51 UTC

I don't believe metadata.go should include the trailing dot in the gce metadata api call:

Sep 20 06:34:36 qe-gpei-09204-09200617-bootstrap.c.openshift-qe.internal hyperkube[1554]: I0920 06:34:36.810860    1554 metadata.go:212] Failed to Get service accounts from gce metadata server: http status code: 403 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/

Can you open a separate bug for this issue?

Comment 4 Gaoyun Pei 2019-09-21 09:29:47 UTC

(In reply to Daneyon Hansen from comment #1)
> Gaoyun can you remove trailing dot from noProxy: metadata.google.internal.
> and try again? Your install-config proxy should look like:
> 
> proxy:
>   httpProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3128
>   httpsProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3128
>   noProxy: metadata.google.internal

"metadata.google.internal" doesn't work here.

On bootstrap node, we got the expected noProxy list:
NO_PROXY=.cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.qe-gpei-09208.qe.gcp.devcluster.openshift.com,api.qe-gpei-09208.qe.gcp.devcluster.openshift.com,etcd-0.qe-gpei-09208.qe.gcp.devcluster.openshift.com,etcd-1.qe-gpei-09208.qe.gcp.devcluster.openshift.com,etcd-2.qe-gpei-09208.qe.gcp.devcluster.openshift.com,localhost,metadata.google.internal

But kubelet service still failed for url http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/

Sep 21 09:13:37 qe-gpei-09209-09210423-bootstrap.c.openshift-qe.internal hyperkube[13175]: I0921 09:13:37.677732   13175 metadata.go:212] Failed to Get service accounts from gce metadata server: http status code: 403 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/

Comment 5 Gaoyun Pei 2019-09-21 09:44:38 UTC

(In reply to Daneyon Hansen from comment #3)
> I don't believe metadata.go should include the trailing dot in the gce
> metadata api call:
> 
> Sep 20 06:34:36 qe-gpei-09204-09200617-bootstrap.c.openshift-qe.internal
> hyperkube[1554]: I0920 06:34:36.810860    1554 metadata.go:212] Failed to
> Get service accounts from gce metadata server: http status code: 403 while
> fetching url
> http://metadata.google.internal./computeMetadata/v1/instance/service-
> accounts/
> 
> Can you open a separate bug for this issue?

Yes, https://cloud.google.com/compute/docs/storing-retrieving-metadata#default is also saying:
"You can query the contents of the metadata server by making a request to the following root URLs from within a virtual machine instance. Use the http://metadata.google.internal/computeMetadata/v1/ URL to make requests to the metadata server."

But I see "metadata.google.internal." is used in the gcp metadataUrl in kubernetes
https://github.com/kubernetes/kubernetes/blob/master/pkg/credentialprovider/gcp/metadata.go#L34

"metadata.google.internal." should be the Fully Qualified Domain Name of the metadata URL.

Comment 6 Gaoyun Pei 2019-09-21 10:04:30 UTC

I also find some official doc about how to access the gcp metadata server via a proxy, but it seems doesn't help in our case.

"Configuring an instance as a network proxy" - https://cloud.google.com/vpc/docs/special-configurations

In this doc, it also use no_proxy to avoid requests to the metadata server to be forwarded to the proxy
export no_proxy=169.254.169.254,metadata,metadata.google.internal
and prevent the access to metadata server via proxy by deny "169.254.169.254",
but actually, "169.254.169.254" and "metadata.google.internal" already in our no_proxy list, so this doc should assume we're using "metadata,metadata.google.internal" in the metadata URL.

Comment 7 Daneyon Hansen 2019-09-23 15:41:32 UTC

@Gaoyun can you inspect the proxy logs to verify the 'http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/' call is not going through the proxy after adding `metadata.google.internal` to noProxy?

Comment 8 Daneyon Hansen 2019-09-23 15:58:49 UTC

@Gaoyun ptal at [1] for troubleshooting access to metadata. Can you use curl or wget to access metadata? Note that the 'Metadata-Flavor: Google' header must be set.

[1] https://cloud.google.com/compute/docs/storing-retrieving-metadata

Comment 10 Daneyon Hansen 2019-09-25 16:37:21 UTC

@Gaoyun according to [2], you add `metadata.google.internal` to noProxy and you successfully curl `metadata.google.internal` and `metadata.google.internal.` while bypassing the proxy. This is the desired behavior. It appears that curl strips the trailing dot is based on the output message:

* Connection #0 to host metadata.google.internal left intact

Since we do not want to proxy cloud provider api calls, I don't think the other tests in https://gitlab.cee.redhat.com/openshift-qe/qe-40-blog/blob/master/test_result/bz1753930_curl_metadata.md are relevant. I prefer not adding trailing dot support due to the potential unknown side affects. I have submitted PR [4] to add the google metadata hostname support according to GCP doc recommendations [5] to the installer noProxy defaults. I also submitted PR [6] to fix this issue in cluster-network-operator.

[2] https://gitlab.cee.redhat.com/openshift-qe/qe-40-blog/blob/master/test_result/bz1753930_curl_metadata.md#on-boostrap-node
[3] https://gitlab.cee.redhat.com/openshift-qe/qe-40-blog/blob/master/test_result/bz1753930_curl_metadata.md#access-metadata-directly-on-the-node
[4] https://github.com/openshift/installer/pull/2407
[5] https://cloud.google.com/vpc/docs/special-configurations
[6] https://github.com/openshift/cluster-network-operator/pull/325

Comment 11 Gaoyun Pei 2019-09-26 09:00:18 UTC

(In reply to Daneyon Hansen from comment #10)
> I prefer not adding
> trailing dot support due to the potential unknown side affects. 
Ack

> I have
> submitted PR [4] to add the google metadata hostname support according to
> GCP doc recommendations [5] to the installer noProxy defaults. I also
> submitted PR [6] to fix this issue in cluster-network-operator.

Thanks for the update.

Comment 12 W. Trevor King 2019-09-27 13:34:25 UTC

All the PRs have landed.

Comment 15 Gaoyun Pei 2019-10-08 06:05:26 UTC

Verify this bug with 4.3.0-0.ci-2019-10-07-234313

On bootstrap node, "metadata,metadata.google.internal,metadata.google.internal." were added into default noProxy

[root@qe-gpei-4302-10080346-bootstrap ~]# env |grep NO_PROXY
NO_PROXY=.cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.qe-gpei-4302.qe.gcp.devcluster.openshift.com,api.qe-gpei-4302.qe.gcp.devcluster.openshift.com,etcd-0.qe-gpei-4302.qe.gcp.devcluster.openshift.com,etcd-1.qe-gpei-4302.qe.gcp.devcluster.openshift.com,etcd-2.qe-gpei-4302.qe.gcp.devcluster.openshift.com,localhost,metadata,metadata.google.internal,metadata.google.internal.,test.no-proxy.com

kubelet service was running well, bootkube.service completed successfully.


In proxy/cluster, "metadata,metadata.google.internal,metadata.google.internal." also added into default noProxy

[root@qe-gpei-4302-10080346-bootstrap ~]# oc get proxy cluster -o jsonpath='{.status.noProxy}' --config=/var/opt/openshift/auth/kubeconfig
.cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.qe-gpei-4302.qe.gcp.devcluster.openshift.com,etcd-0.qe-gpei-4302.qe.gcp.devcluster.openshift.com,etcd-1.qe-gpei-4302.qe.gcp.devcluster.openshift.com,etcd-2.qe-gpei-4302.qe.gcp.devcluster.openshift.com,localhost,metadata,metadata.google.internal,metadata.google.internal.,test.no-proxy.com

Comment 16 Gaoyun Pei 2019-10-08 07:42:44 UTC

Tried again with 4.3.0-0.ci-2019-10-08-032501 which includes https://github.com/openshift/installer/pull/2425, external api server address was removed from default noProxy list in both bootstrap node and proxy/cluster.

[root@qe-gpei-4303-10080624-bootstrap ~]# env |grep NO_PROXY
NO_PROXY=.cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.qe-gpei-4303.qe.gcp.devcluster.openshift.com,etcd-0.qe-gpei-4303.qe.gcp.devcluster.openshift.com,etcd-1.qe-gpei-4303.qe.gcp.devcluster.openshift.com,etcd-2.qe-gpei-4303.qe.gcp.devcluster.openshift.com,localhost,metadata,metadata.google.internal,metadata.google.internal.,test.no-proxy.com

[root@qe-gpei-4303-10080624-bootstrap ~]# oc get proxy cluster -o jsonpath='{.status.noProxy}' --config=/var/opt/openshift/auth/kubeconfig
.cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.qe-gpei-4303.qe.gcp.devcluster.openshift.com,etcd-0.qe-gpei-4303.qe.gcp.devcluster.openshift.com,etcd-1.qe-gpei-4303.qe.gcp.devcluster.openshift.com,etcd-2.qe-gpei-4303.qe.gcp.devcluster.openshift.com,localhost,metadata,metadata.google.internal,metadata.google.internal.,test.no-proxy.com

All the cluster operators are ready.

[root@qe-gpei-4303-10080624-bootstrap ~]# oc get co --config=/var/opt/openshift/auth/kubeconfig
NAME                                       VERSION                        AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.3.0-0.ci-2019-10-08-032501   True        False         False      15m
cloud-credential                           4.3.0-0.ci-2019-10-08-032501   True        False         False      35m
cluster-autoscaler                         4.3.0-0.ci-2019-10-08-032501   True        False         False      27m
console                                    4.3.0-0.ci-2019-10-08-032501   True        False         False      17m
dns                                        4.3.0-0.ci-2019-10-08-032501   True        False         False      35m
image-registry                             4.3.0-0.ci-2019-10-08-032501   True        False         False      21m
ingress                                    4.3.0-0.ci-2019-10-08-032501   True        False         False      21m
insights                                   4.3.0-0.ci-2019-10-08-032501   True        False         False      35m
kube-apiserver                             4.3.0-0.ci-2019-10-08-032501   True        False         False      33m
kube-controller-manager                    4.3.0-0.ci-2019-10-08-032501   True        False         False      33m
kube-scheduler                             4.3.0-0.ci-2019-10-08-032501   True        False         False      33m
machine-api                                4.3.0-0.ci-2019-10-08-032501   True        False         False      34m
machine-config                             4.3.0-0.ci-2019-10-08-032501   True        False         False      35m
marketplace                                4.3.0-0.ci-2019-10-08-032501   True        False         False      27m
monitoring                                 4.3.0-0.ci-2019-10-08-032501   True        False         False      20m
network                                    4.3.0-0.ci-2019-10-08-032501   True        False         False      35m
node-tuning                                4.3.0-0.ci-2019-10-08-032501   True        False         False      28m
openshift-apiserver                        4.3.0-0.ci-2019-10-08-032501   True        False         False      32m
openshift-controller-manager               4.3.0-0.ci-2019-10-08-032501   True        False         False      33m
openshift-samples                          4.3.0-0.ci-2019-10-08-032501   True        False         False      25m
operator-lifecycle-manager                 4.3.0-0.ci-2019-10-08-032501   True        False         False      34m
operator-lifecycle-manager-catalog         4.3.0-0.ci-2019-10-08-032501   True        False         False      34m
operator-lifecycle-manager-packageserver   4.3.0-0.ci-2019-10-08-032501   True        False         False      32m
service-ca                                 4.3.0-0.ci-2019-10-08-032501   True        False         False      35m
service-catalog-apiserver                  4.3.0-0.ci-2019-10-08-032501   True        False         False      29m
service-catalog-controller-manager         4.3.0-0.ci-2019-10-08-032501   True        False         False      28m
storage                                    4.3.0-0.ci-2019-10-08-032501   True        False         False      28m

Comment 18 errata-xmlrpc 2020-01-23 11:06:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062