Gaoyun can you remove trailing dot from noProxy: metadata.google.internal. and try again? Your install-config proxy should look like: proxy: httpProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3128 httpsProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3128 noProxy: metadata.google.internal
PTAL at the following code: https://github.com/openshift/installer/blob/master/pkg/types/validation/installconfig.go#L265-L274 https://github.com/openshift/installer/blob/master/pkg/validate/validate.go#L66-L72 This is not a bug, but expected behavior. noProxy domain names must not contain a trailing dot.
I don't believe metadata.go should include the trailing dot in the gce metadata api call: Sep 20 06:34:36 qe-gpei-09204-09200617-bootstrap.c.openshift-qe.internal hyperkube[1554]: I0920 06:34:36.810860 1554 metadata.go:212] Failed to Get service accounts from gce metadata server: http status code: 403 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/ Can you open a separate bug for this issue?
(In reply to Daneyon Hansen from comment #1) > Gaoyun can you remove trailing dot from noProxy: metadata.google.internal. > and try again? Your install-config proxy should look like: > > proxy: > httpProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3128 > httpsProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.0.2:3128 > noProxy: metadata.google.internal "metadata.google.internal" doesn't work here. On bootstrap node, we got the expected noProxy list: NO_PROXY=.cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.qe-gpei-09208.qe.gcp.devcluster.openshift.com,api.qe-gpei-09208.qe.gcp.devcluster.openshift.com,etcd-0.qe-gpei-09208.qe.gcp.devcluster.openshift.com,etcd-1.qe-gpei-09208.qe.gcp.devcluster.openshift.com,etcd-2.qe-gpei-09208.qe.gcp.devcluster.openshift.com,localhost,metadata.google.internal But kubelet service still failed for url http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/ Sep 21 09:13:37 qe-gpei-09209-09210423-bootstrap.c.openshift-qe.internal hyperkube[13175]: I0921 09:13:37.677732 13175 metadata.go:212] Failed to Get service accounts from gce metadata server: http status code: 403 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/
(In reply to Daneyon Hansen from comment #3) > I don't believe metadata.go should include the trailing dot in the gce > metadata api call: > > Sep 20 06:34:36 qe-gpei-09204-09200617-bootstrap.c.openshift-qe.internal > hyperkube[1554]: I0920 06:34:36.810860 1554 metadata.go:212] Failed to > Get service accounts from gce metadata server: http status code: 403 while > fetching url > http://metadata.google.internal./computeMetadata/v1/instance/service- > accounts/ > > Can you open a separate bug for this issue? Yes, https://cloud.google.com/compute/docs/storing-retrieving-metadata#default is also saying: "You can query the contents of the metadata server by making a request to the following root URLs from within a virtual machine instance. Use the http://metadata.google.internal/computeMetadata/v1/ URL to make requests to the metadata server." But I see "metadata.google.internal." is used in the gcp metadataUrl in kubernetes https://github.com/kubernetes/kubernetes/blob/master/pkg/credentialprovider/gcp/metadata.go#L34 "metadata.google.internal." should be the Fully Qualified Domain Name of the metadata URL.
I also find some official doc about how to access the gcp metadata server via a proxy, but it seems doesn't help in our case. "Configuring an instance as a network proxy" - https://cloud.google.com/vpc/docs/special-configurations In this doc, it also use no_proxy to avoid requests to the metadata server to be forwarded to the proxy export no_proxy=169.254.169.254,metadata,metadata.google.internal and prevent the access to metadata server via proxy by deny "169.254.169.254", but actually, "169.254.169.254" and "metadata.google.internal" already in our no_proxy list, so this doc should assume we're using "metadata,metadata.google.internal" in the metadata URL.
@Gaoyun can you inspect the proxy logs to verify the 'http://metadata.google.internal./computeMetadata/v1/instance/service-accounts/' call is not going through the proxy after adding `metadata.google.internal` to noProxy?
@Gaoyun ptal at [1] for troubleshooting access to metadata. Can you use curl or wget to access metadata? Note that the 'Metadata-Flavor: Google' header must be set. [1] https://cloud.google.com/compute/docs/storing-retrieving-metadata
@Gaoyun according to [2], you add `metadata.google.internal` to noProxy and you successfully curl `metadata.google.internal` and `metadata.google.internal.` while bypassing the proxy. This is the desired behavior. It appears that curl strips the trailing dot is based on the output message: * Connection #0 to host metadata.google.internal left intact Since we do not want to proxy cloud provider api calls, I don't think the other tests in https://gitlab.cee.redhat.com/openshift-qe/qe-40-blog/blob/master/test_result/bz1753930_curl_metadata.md are relevant. I prefer not adding trailing dot support due to the potential unknown side affects. I have submitted PR [4] to add the google metadata hostname support according to GCP doc recommendations [5] to the installer noProxy defaults. I also submitted PR [6] to fix this issue in cluster-network-operator. [2] https://gitlab.cee.redhat.com/openshift-qe/qe-40-blog/blob/master/test_result/bz1753930_curl_metadata.md#on-boostrap-node [3] https://gitlab.cee.redhat.com/openshift-qe/qe-40-blog/blob/master/test_result/bz1753930_curl_metadata.md#access-metadata-directly-on-the-node [4] https://github.com/openshift/installer/pull/2407 [5] https://cloud.google.com/vpc/docs/special-configurations [6] https://github.com/openshift/cluster-network-operator/pull/325
(In reply to Daneyon Hansen from comment #10) > I prefer not adding > trailing dot support due to the potential unknown side affects. Ack > I have > submitted PR [4] to add the google metadata hostname support according to > GCP doc recommendations [5] to the installer noProxy defaults. I also > submitted PR [6] to fix this issue in cluster-network-operator. Thanks for the update.
All the PRs have landed.
Verify this bug with 4.3.0-0.ci-2019-10-07-234313 On bootstrap node, "metadata,metadata.google.internal,metadata.google.internal." were added into default noProxy [root@qe-gpei-4302-10080346-bootstrap ~]# env |grep NO_PROXY NO_PROXY=.cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.qe-gpei-4302.qe.gcp.devcluster.openshift.com,api.qe-gpei-4302.qe.gcp.devcluster.openshift.com,etcd-0.qe-gpei-4302.qe.gcp.devcluster.openshift.com,etcd-1.qe-gpei-4302.qe.gcp.devcluster.openshift.com,etcd-2.qe-gpei-4302.qe.gcp.devcluster.openshift.com,localhost,metadata,metadata.google.internal,metadata.google.internal.,test.no-proxy.com kubelet service was running well, bootkube.service completed successfully. In proxy/cluster, "metadata,metadata.google.internal,metadata.google.internal." also added into default noProxy [root@qe-gpei-4302-10080346-bootstrap ~]# oc get proxy cluster -o jsonpath='{.status.noProxy}' --config=/var/opt/openshift/auth/kubeconfig .cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.qe-gpei-4302.qe.gcp.devcluster.openshift.com,etcd-0.qe-gpei-4302.qe.gcp.devcluster.openshift.com,etcd-1.qe-gpei-4302.qe.gcp.devcluster.openshift.com,etcd-2.qe-gpei-4302.qe.gcp.devcluster.openshift.com,localhost,metadata,metadata.google.internal,metadata.google.internal.,test.no-proxy.com
Tried again with 4.3.0-0.ci-2019-10-08-032501 which includes https://github.com/openshift/installer/pull/2425, external api server address was removed from default noProxy list in both bootstrap node and proxy/cluster. [root@qe-gpei-4303-10080624-bootstrap ~]# env |grep NO_PROXY NO_PROXY=.cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.qe-gpei-4303.qe.gcp.devcluster.openshift.com,etcd-0.qe-gpei-4303.qe.gcp.devcluster.openshift.com,etcd-1.qe-gpei-4303.qe.gcp.devcluster.openshift.com,etcd-2.qe-gpei-4303.qe.gcp.devcluster.openshift.com,localhost,metadata,metadata.google.internal,metadata.google.internal.,test.no-proxy.com [root@qe-gpei-4303-10080624-bootstrap ~]# oc get proxy cluster -o jsonpath='{.status.noProxy}' --config=/var/opt/openshift/auth/kubeconfig .cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.qe-gpei-4303.qe.gcp.devcluster.openshift.com,etcd-0.qe-gpei-4303.qe.gcp.devcluster.openshift.com,etcd-1.qe-gpei-4303.qe.gcp.devcluster.openshift.com,etcd-2.qe-gpei-4303.qe.gcp.devcluster.openshift.com,localhost,metadata,metadata.google.internal,metadata.google.internal.,test.no-proxy.com All the cluster operators are ready. [root@qe-gpei-4303-10080624-bootstrap ~]# oc get co --config=/var/opt/openshift/auth/kubeconfig NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.3.0-0.ci-2019-10-08-032501 True False False 15m cloud-credential 4.3.0-0.ci-2019-10-08-032501 True False False 35m cluster-autoscaler 4.3.0-0.ci-2019-10-08-032501 True False False 27m console 4.3.0-0.ci-2019-10-08-032501 True False False 17m dns 4.3.0-0.ci-2019-10-08-032501 True False False 35m image-registry 4.3.0-0.ci-2019-10-08-032501 True False False 21m ingress 4.3.0-0.ci-2019-10-08-032501 True False False 21m insights 4.3.0-0.ci-2019-10-08-032501 True False False 35m kube-apiserver 4.3.0-0.ci-2019-10-08-032501 True False False 33m kube-controller-manager 4.3.0-0.ci-2019-10-08-032501 True False False 33m kube-scheduler 4.3.0-0.ci-2019-10-08-032501 True False False 33m machine-api 4.3.0-0.ci-2019-10-08-032501 True False False 34m machine-config 4.3.0-0.ci-2019-10-08-032501 True False False 35m marketplace 4.3.0-0.ci-2019-10-08-032501 True False False 27m monitoring 4.3.0-0.ci-2019-10-08-032501 True False False 20m network 4.3.0-0.ci-2019-10-08-032501 True False False 35m node-tuning 4.3.0-0.ci-2019-10-08-032501 True False False 28m openshift-apiserver 4.3.0-0.ci-2019-10-08-032501 True False False 32m openshift-controller-manager 4.3.0-0.ci-2019-10-08-032501 True False False 33m openshift-samples 4.3.0-0.ci-2019-10-08-032501 True False False 25m operator-lifecycle-manager 4.3.0-0.ci-2019-10-08-032501 True False False 34m operator-lifecycle-manager-catalog 4.3.0-0.ci-2019-10-08-032501 True False False 34m operator-lifecycle-manager-packageserver 4.3.0-0.ci-2019-10-08-032501 True False False 32m service-ca 4.3.0-0.ci-2019-10-08-032501 True False False 35m service-catalog-apiserver 4.3.0-0.ci-2019-10-08-032501 True False False 29m service-catalog-controller-manager 4.3.0-0.ci-2019-10-08-032501 True False False 28m storage 4.3.0-0.ci-2019-10-08-032501 True False False 28m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062