Bug 1753930
Summary: | [proxy] Need to add "metadata.google.internal." to noProxy list on GCP | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Gaoyun Pei <gpei> | |
Component: | Installer | Assignee: | Daneyon Hansen <dhansen> | |
Installer sub component: | openshift-installer | QA Contact: | Gaoyun Pei <gpei> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | high | CC: | dhansen, sdodson, wking | |
Version: | 4.2.0 | Keywords: | TestBlocker | |
Target Milestone: | --- | |||
Target Release: | 4.3.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1754049 1759245 (view as bug list) | Environment: | ||
Last Closed: | 2020-01-23 11:06:39 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1759245 |
Comment 1
Daneyon Hansen
2019-09-20 16:15:37 UTC
PTAL at the following code: https://github.com/openshift/installer/blob/master/pkg/types/validation/installconfig.go#L265-L274 https://github.com/openshift/installer/blob/master/pkg/validate/validate.go#L66-L72 This is not a bug, but expected behavior. noProxy domain names must not contain a trailing dot. I don't believe metadata.go should include the trailing dot in the gce metadata api call: Sep 20 06:34:36 qe-gpei-09204-09200617-bootstrap.c.openshift-qe.internal hyperkube[1554]: I0920 06:34:36.810860 1554 metadata.go:212] Failed to Get service accounts from gce metadata server: http status code: 403 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/ Can you open a separate bug for this issue? (In reply to Daneyon Hansen from comment #1) > Gaoyun can you remove trailing dot from noProxy: metadata.google.internal. > and try again? Your install-config proxy should look like: > > proxy: > httpProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3128 > httpsProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3128 > noProxy: metadata.google.internal "metadata.google.internal" doesn't work here. On bootstrap node, we got the expected noProxy list: NO_PROXY=.cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.qe-gpei-09208.qe.gcp.devcluster.openshift.com,api.qe-gpei-09208.qe.gcp.devcluster.openshift.com,etcd-0.qe-gpei-09208.qe.gcp.devcluster.openshift.com,etcd-1.qe-gpei-09208.qe.gcp.devcluster.openshift.com,etcd-2.qe-gpei-09208.qe.gcp.devcluster.openshift.com,localhost,metadata.google.internal But kubelet service still failed for url http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/ Sep 21 09:13:37 qe-gpei-09209-09210423-bootstrap.c.openshift-qe.internal hyperkube[13175]: I0921 09:13:37.677732 13175 metadata.go:212] Failed to Get service accounts from gce metadata server: http status code: 403 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/ (In reply to Daneyon Hansen from comment #3) > I don't believe metadata.go should include the trailing dot in the gce > metadata api call: > > Sep 20 06:34:36 qe-gpei-09204-09200617-bootstrap.c.openshift-qe.internal > hyperkube[1554]: I0920 06:34:36.810860 1554 metadata.go:212] Failed to > Get service accounts from gce metadata server: http status code: 403 while > fetching url > http://metadata.google.internal./computeMetadata/v1/instance/service- > accounts/ > > Can you open a separate bug for this issue? Yes, https://cloud.google.com/compute/docs/storing-retrieving-metadata#default is also saying: "You can query the contents of the metadata server by making a request to the following root URLs from within a virtual machine instance. Use the http://metadata.google.internal/computeMetadata/v1/ URL to make requests to the metadata server." But I see "metadata.google.internal." is used in the gcp metadataUrl in kubernetes https://github.com/kubernetes/kubernetes/blob/master/pkg/credentialprovider/gcp/metadata.go#L34 "metadata.google.internal." should be the Fully Qualified Domain Name of the metadata URL. I also find some official doc about how to access the gcp metadata server via a proxy, but it seems doesn't help in our case. "Configuring an instance as a network proxy" - https://cloud.google.com/vpc/docs/special-configurations In this doc, it also use no_proxy to avoid requests to the metadata server to be forwarded to the proxy export no_proxy=169.254.169.254,metadata,metadata.google.internal and prevent the access to metadata server via proxy by deny "169.254.169.254", but actually, "169.254.169.254" and "metadata.google.internal" already in our no_proxy list, so this doc should assume we're using "metadata,metadata.google.internal" in the metadata URL. @Gaoyun can you inspect the proxy logs to verify the 'http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/' call is not going through the proxy after adding `metadata.google.internal` to noProxy? @Gaoyun ptal at [1] for troubleshooting access to metadata. Can you use curl or wget to access metadata? Note that the 'Metadata-Flavor: Google' header must be set. [1] https://cloud.google.com/compute/docs/storing-retrieving-metadata @Gaoyun according to [2], you add `metadata.google.internal` to noProxy and you successfully curl `metadata.google.internal` and `metadata.google.internal.` while bypassing the proxy. This is the desired behavior. It appears that curl strips the trailing dot is based on the output message: * Connection #0 to host metadata.google.internal left intact Since we do not want to proxy cloud provider api calls, I don't think the other tests in https://gitlab.cee.redhat.com/openshift-qe/qe-40-blog/blob/master/test_result/bz1753930_curl_metadata.md are relevant. I prefer not adding trailing dot support due to the potential unknown side affects. I have submitted PR [4] to add the google metadata hostname support according to GCP doc recommendations [5] to the installer noProxy defaults. I also submitted PR [6] to fix this issue in cluster-network-operator. [2] https://gitlab.cee.redhat.com/openshift-qe/qe-40-blog/blob/master/test_result/bz1753930_curl_metadata.md#on-boostrap-node [3] https://gitlab.cee.redhat.com/openshift-qe/qe-40-blog/blob/master/test_result/bz1753930_curl_metadata.md#access-metadata-directly-on-the-node [4] https://github.com/openshift/installer/pull/2407 [5] https://cloud.google.com/vpc/docs/special-configurations [6] https://github.com/openshift/cluster-network-operator/pull/325 (In reply to Daneyon Hansen from comment #10) > I prefer not adding > trailing dot support due to the potential unknown side affects. Ack > I have > submitted PR [4] to add the google metadata hostname support according to > GCP doc recommendations [5] to the installer noProxy defaults. I also > submitted PR [6] to fix this issue in cluster-network-operator. Thanks for the update. All the PRs have landed. Verify this bug with 4.3.0-0.ci-2019-10-07-234313 On bootstrap node, "metadata,metadata.google.internal,metadata.google.internal." were added into default noProxy [root@qe-gpei-4302-10080346-bootstrap ~]# env |grep NO_PROXY NO_PROXY=.cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.qe-gpei-4302.qe.gcp.devcluster.openshift.com,api.qe-gpei-4302.qe.gcp.devcluster.openshift.com,etcd-0.qe-gpei-4302.qe.gcp.devcluster.openshift.com,etcd-1.qe-gpei-4302.qe.gcp.devcluster.openshift.com,etcd-2.qe-gpei-4302.qe.gcp.devcluster.openshift.com,localhost,metadata,metadata.google.internal,metadata.google.internal.,test.no-proxy.com kubelet service was running well, bootkube.service completed successfully. In proxy/cluster, "metadata,metadata.google.internal,metadata.google.internal." also added into default noProxy [root@qe-gpei-4302-10080346-bootstrap ~]# oc get proxy cluster -o jsonpath='{.status.noProxy}' --config=/var/opt/openshift/auth/kubeconfig .cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.qe-gpei-4302.qe.gcp.devcluster.openshift.com,etcd-0.qe-gpei-4302.qe.gcp.devcluster.openshift.com,etcd-1.qe-gpei-4302.qe.gcp.devcluster.openshift.com,etcd-2.qe-gpei-4302.qe.gcp.devcluster.openshift.com,localhost,metadata,metadata.google.internal,metadata.google.internal.,test.no-proxy.com Tried again with 4.3.0-0.ci-2019-10-08-032501 which includes https://github.com/openshift/installer/pull/2425, external api server address was removed from default noProxy list in both bootstrap node and proxy/cluster. [root@qe-gpei-4303-10080624-bootstrap ~]# env |grep NO_PROXY NO_PROXY=.cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.qe-gpei-4303.qe.gcp.devcluster.openshift.com,etcd-0.qe-gpei-4303.qe.gcp.devcluster.openshift.com,etcd-1.qe-gpei-4303.qe.gcp.devcluster.openshift.com,etcd-2.qe-gpei-4303.qe.gcp.devcluster.openshift.com,localhost,metadata,metadata.google.internal,metadata.google.internal.,test.no-proxy.com [root@qe-gpei-4303-10080624-bootstrap ~]# oc get proxy cluster -o jsonpath='{.status.noProxy}' --config=/var/opt/openshift/auth/kubeconfig .cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.qe-gpei-4303.qe.gcp.devcluster.openshift.com,etcd-0.qe-gpei-4303.qe.gcp.devcluster.openshift.com,etcd-1.qe-gpei-4303.qe.gcp.devcluster.openshift.com,etcd-2.qe-gpei-4303.qe.gcp.devcluster.openshift.com,localhost,metadata,metadata.google.internal,metadata.google.internal.,test.no-proxy.com All the cluster operators are ready. [root@qe-gpei-4303-10080624-bootstrap ~]# oc get co --config=/var/opt/openshift/auth/kubeconfig NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.3.0-0.ci-2019-10-08-032501 True False False 15m cloud-credential 4.3.0-0.ci-2019-10-08-032501 True False False 35m cluster-autoscaler 4.3.0-0.ci-2019-10-08-032501 True False False 27m console 4.3.0-0.ci-2019-10-08-032501 True False False 17m dns 4.3.0-0.ci-2019-10-08-032501 True False False 35m image-registry 4.3.0-0.ci-2019-10-08-032501 True False False 21m ingress 4.3.0-0.ci-2019-10-08-032501 True False False 21m insights 4.3.0-0.ci-2019-10-08-032501 True False False 35m kube-apiserver 4.3.0-0.ci-2019-10-08-032501 True False False 33m kube-controller-manager 4.3.0-0.ci-2019-10-08-032501 True False False 33m kube-scheduler 4.3.0-0.ci-2019-10-08-032501 True False False 33m machine-api 4.3.0-0.ci-2019-10-08-032501 True False False 34m machine-config 4.3.0-0.ci-2019-10-08-032501 True False False 35m marketplace 4.3.0-0.ci-2019-10-08-032501 True False False 27m monitoring 4.3.0-0.ci-2019-10-08-032501 True False False 20m network 4.3.0-0.ci-2019-10-08-032501 True False False 35m node-tuning 4.3.0-0.ci-2019-10-08-032501 True False False 28m openshift-apiserver 4.3.0-0.ci-2019-10-08-032501 True False False 32m openshift-controller-manager 4.3.0-0.ci-2019-10-08-032501 True False False 33m openshift-samples 4.3.0-0.ci-2019-10-08-032501 True False False 25m operator-lifecycle-manager 4.3.0-0.ci-2019-10-08-032501 True False False 34m operator-lifecycle-manager-catalog 4.3.0-0.ci-2019-10-08-032501 True False False 34m operator-lifecycle-manager-packageserver 4.3.0-0.ci-2019-10-08-032501 True False False 32m service-ca 4.3.0-0.ci-2019-10-08-032501 True False False 35m service-catalog-apiserver 4.3.0-0.ci-2019-10-08-032501 True False False 29m service-catalog-controller-manager 4.3.0-0.ci-2019-10-08-032501 True False False 28m storage 4.3.0-0.ci-2019-10-08-032501 True False False 28m Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062 |