Description of problem: When deploying OpenShift with an HTTP Proxy, pushes to the hosted docker registry fail because the docker registry attempts to call the master node by the overlay network IP address (so not excludable via no_proxy) via the http proxy which obviously does not have connectivity to the overlay network IPs Version-Release number of selected component (if applicable): * Currently exists in 3.7 stable branch through to master How reproducible: * Straightforward Steps to Reproduce: 1. Deploy openshift with an http/https proxy 2. Create an application out of the service catalog that requires a build (eg. NodeJS sample application) 3. Observe build fails Actual results: * Build fails with error that pushing to hosted registry timed out Expected results: * Build push should succeed Additional info: More details in proposed patch/revert -- https://github.com/openshift/openshift-ansible/pull/6598 An alternative might be to modify the openshift hosted registry to call the master api by hostname and then add that hostname to the no_proxy list automatically generated by ansible installer
> the docker registry attempts to call the master node by the overlay network IP address (so not excludable via no_proxy) why is that not excludable via no_proxy? the no_proxy value should absolutely be including the master api (and probably the entire service ip subnet)
Seems like the bug here is that the installer isn't adding the master api service ip to the no_proxy value. I thought this was already being done but perhaps not in 3.7?
At least in 3.7, the "real" IPs of the master are added to NO_PROXY but the registry attempts to reach the api by calling an overlay network IP Ideally the entire overlay network would be in NO_PROXY but since subnets are not supported in this field and enumerating this also doesn't make sense, would it be better for the docker registry to call the FQDN (internal or external) of the master ? If so, can that change be made in the installer config parameters or is this a code change in the hosted registry?
This is intrinsic in the k8s client logic (which is used by the registry when calling the api server): https://github.com/kubernetes/client-go/blob/33bd23f75b6de861994706a322b0afab824b2171/rest/config.go#L306-L311 (KUBERNETES_SERVICE_HOST is injected into all pods as an ip address) The way to fix this would be to change the k8s to register a kubernetes_service_hostname variable (perhaps there is a k8s configuration today that can make that happen? I'm not aware) and have the client code use that variable instead. Or have it register the KUBERNETES_SERVICE_HOST value as a hostname instead of an ip address. In any case we can't (reasonably) fix this behavior in the registry logic. That said, the kubernetes api is, as far as i know, always the ".1" ip within the service ip subnet, so the installer could perhaps reasonably special case adding that to the noproxy list, even though we can't noproxy the entire subnet.
My concern about setting it at install is that it feels pretty brittle (not sure how stable this IP really is and if it would vary by network plugin). I think there are a couple of options more from the registry side: 1) Appending ${KUBERNETES_SERVICE_HOST} to the NO_PROXY list within the pod at startup (or something with a similar affect that is more runtime driven)? 2) Access the master by "well known name", it appears that the api can be reached by name at "kubernetes.default": https://kubernetes.io/docs/tasks/administer-cluster/access-cluster-api/ Right now from what I can tell no_proxy generally has only `.cluster.local` and `.svc` from a "cluster specific dns" in it , so for above use case I believe you would call https://kubernetes.default.svc:${KUBERNETES_SERVICE_PORT} or https://kubernetes.default.svc.cluster.local:${KUBERNETES_SERVICE_PORT} (tried both of these from within a container in my deployment and they resolved to the api) ``` sh-4.2$ curl https://kubernetes.default.svc.cluster.local:${KUBERNETES_SERVICE_PORT} -k -s | head -n2 { "paths": [ ```
Option 1 is plausible but requires every component that runs as a pod and wants to reach the master api(and otherwise gets the proxy settings set), implement that solution. This really isn't a registry specific problem. It's a pod configuration problem. (There will be more components running this way in the future). And as I said, option 2 is not under our control, it is the k8s client behavior that determines how the k8s api is reached from within a pod. Imho the "right" fix would be for k8s to inject the service_host as a hostname not an ip.
W.r.t. #2 , sorry about that, I misunderstood what you were saying. I thought you were saying you implemented to connection establishment yourself following the same pattern which would have given you the flexibility to change it. Agreed that if you are using the kubernetes rest client it is out of scope of the registry to address that. I tend to agree with you on the "right" fix, but this would be a pretty significant change that would have to go through upstream kubernetes. For option 1, I agree it isn't ideal but at least feels better to me than an installer "fix". I was wondering if this issue hasn't come up before because it may not be that common to want, within a single pod, to be able to (1) access outside networks and (2) access the kubernetes API while also being deployed in a network that requires an HTTP(s) proxy to get to these outside resources.
The way we've fixed this in 3.9 is to add the kube service ip address to the global list of default NO_PROXY variables. This value is computed to be the first IP address in the kubernetes CIDR. https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_facts/library/openshift_facts.py#L1156-L1157 We're waiting on our QE teams to verify a bug before we backport these changes to 3.7 and 3.6. This would only take effect on new installs, we'll instruct admins to modify existing environment variables on the DC to add the kube service ip. *** This bug has been marked as a duplicate of bug 1511870 ***