Description of problem:
docker-registry pod searches for api svc at 172.30.0.1 instead of kubernetes.default.svc.cluster.local.
This occurs in a situation in which the system is deployed in an environment that requires a proxy. The installer properly configures the NO_PROXY environment variable in the pod using the domain extensions and FQDNs required. There are however some code segments still using an IP Addr in their calls. These IPs are not proxied. This particular instance manifests itself as:
- timeouts in the builder pod log
- authentication failure messages in the registry pod log.
- other issues at the command-line...
The workaround was suggested by Gerald Nunn for Red Hat Toronto.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Configure environment behind a proxy
2. Run advances install ensuring these values are set...
3. Check environment with an application that uses an S2I build
Build fails at docker push... timeout
Edited /etc/sysconfig/docker on all nodes
Bounced docker service on all nodes
Bounced master and node services on all nodes
scaled down the docker-registry pod and scaled it back up
Then launched a build.
From the log of project default, pod docker-registry-1-lcq98
10.131.0.1 - - [19/Oct/2017:17:08:29 +0000] "GET /healthz HTTP/2.0" 200 0 "" "Go-http-client/2.0"
time="2017-10-19T17:08:37.651602967Z" level=debug msg="invalid token: Get https://172.30.0.1:443/oapi/v1/users/~: Unable to connect" go.version=go1.7.6 http.request.host="docker-registry.default.svc:5000" http.request.id=af111c4b-19ed-4bf5-b7e9-d2458db29d2b http.request.method=GET http.request.remoteaddr="10.129.0.1:53402" http.request.uri="/openshift/token?account=serviceaccount&scope=repository%3Aproduct-catalog-test%2Fproduct-catalog%3Apush%2Cpull" http.request.useragent="docker/1.12.6 go/go1.8.3 kernel/3.10.0-693.2.2.el7.x86_64 os/linux arch/amd64 UpstreamClient(go-dockerclient)" instance.id=db50a99b-c3a0-4337-81a9-ea42fce79d1f openshift.logger=registry
10.129.0.1 - - [19/Oct/2017:17:06:30 +0000] "GET /openshift/token?account=serviceaccount&scope=repository%3Aproduct-catalog-test%2Fproduct-catalog%3Apush%2Cpull HTTP/1.1" 401 0 "" "docker/1.12.6 go/go1.8.3 kernel/3.10.0-693.2.2.el7.x86_64 os/linux arch/amd64 UpstreamClient(go-dockerclient)"
Workaround - add 172.30.0.1 to pods NO_PROXY environment.
Boom! Builds succeed.
Took several hours to diagnose though...
You can set KUBERNETES_MASTER on the registry pod to tell the registry how to reach the master, but using the kube service ip (172.30.0.1) in this case is the way we will continue to initialize our client as it is the value k8s has told us to use to reach the master.
Adding 172.30.0.1 to NO_PROXY is the correct solution.
It looks like this is the default behavior for the installer:
Given that you have openshift_generate_no_proxy_hosts=true set, it seems like this should have happened automatically, so i'm going to transfer this to the installer.
(possible the installer did not add the no_proxy env to the registry pod)
Has the registry always communicated with the API via IP rather than hostname?
Can you get the NO_PROXY environment variable for the registry? We've
As far as i know, yes it's always used the k8s service host variable
Is that before or after you manually added NO_PROXY to the registry pod?
I don't know why you're blaming the build pod when you were able to fix the problem by editing your registry pod.
The fundamental issue here is that the ansible installer did not configure the system to add the NO_PROXY env variable to the registry pod (but apparently did add the HTTP_PROXY/HTTPS_PROXY env variables to the registry pod)
Does the builder pod use a hard-coded ip address for the registry? Yes, or no?
In https://bugzilla.redhat.com/show_bug.cgi?id=1527210 we're adding the kube service ip address to the list of NO_PROXY entries which should resolve this issue as well.
Verified in openshift-ansible-3.9.0-0.34.0.git.0.c7d9585.el7.noarch.rpm
172.30.0.1 is added to docker-registry NO_PROXY env variable successfully.
And S2I build succeeded.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.