1544073 – Pushing to hosted image registry fails when proxy variables are set

Bug 1544073 - Pushing to hosted image registry fails when proxy variables are set

Summary: Pushing to hosted image registry fails when proxy variables are set

Keywords:
Status:	CLOSED DUPLICATE of bug 1511870
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	unspecified
Hardware:	All
OS:	All
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Scott Dodson
QA Contact:	Johnny Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-02-10 00:53 UTC by David H
Modified:	2018-02-12 18:49 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-02-12 18:49:24 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description David H 2018-02-10 00:53:55 UTC

Description of problem:

When deploying OpenShift with an HTTP Proxy, pushes to the hosted docker registry fail because the docker registry attempts to call the master node by the overlay network IP address (so not excludable via no_proxy) via the http proxy which obviously does not have connectivity to the overlay network IPs

Version-Release number of selected component (if applicable):
* Currently exists in 3.7 stable branch through to master

How reproducible:
* Straightforward

Steps to Reproduce:
1. Deploy openshift with an http/https proxy
2. Create an application out of the service catalog that requires a build (eg. NodeJS sample application)
3. Observe build fails

Actual results:

* Build fails with error that pushing to hosted registry timed out


Expected results:

* Build push should succeed


Additional info:

More details in proposed patch/revert --
https://github.com/openshift/openshift-ansible/pull/6598

An alternative might be to modify the openshift hosted registry to call the master api by hostname and then add that hostname to the no_proxy list automatically generated by ansible installer

Comment 1 Ben Parees 2018-02-10 02:31:21 UTC

> the docker registry attempts to call the master node by the overlay network IP address (so not excludable via no_proxy) 

why is that not excludable via no_proxy?

the no_proxy value should absolutely be including the master api (and probably the entire service ip subnet)

Comment 2 Ben Parees 2018-02-10 02:39:12 UTC

Seems like the bug here is that the installer isn't adding the master api service ip to the no_proxy value.  I thought this was already being done but perhaps not in 3.7?

Comment 3 David H 2018-02-10 02:46:00 UTC

At least in 3.7, the "real" IPs of the master are added to NO_PROXY but the registry attempts to reach the api by calling an overlay network IP

Ideally the entire overlay network would be in NO_PROXY but since subnets are not supported in this field and enumerating this also doesn't make sense, would it be better for the docker registry to call the FQDN (internal or external) of the master ? If so, can that change be made in the installer config parameters or is this a code change in the hosted registry?

Comment 4 Ben Parees 2018-02-10 03:00:11 UTC

This is intrinsic in the k8s client logic (which is used by the registry when calling the api server):

https://github.com/kubernetes/client-go/blob/33bd23f75b6de861994706a322b0afab824b2171/rest/config.go#L306-L311

(KUBERNETES_SERVICE_HOST is injected into all pods as an ip address)

The way to fix this would be to change the k8s to register a kubernetes_service_hostname variable (perhaps there is a k8s configuration today that can make that happen?  I'm not aware) and have the client code use that variable instead.  Or have it register the KUBERNETES_SERVICE_HOST value as a hostname instead of an ip address.

In any case we can't (reasonably) fix this behavior in the registry logic.  That said, the kubernetes api is, as far as i know, always the ".1" ip within the service ip subnet, so the installer could perhaps reasonably special case adding that to the noproxy list, even though we can't noproxy the entire subnet.

Comment 5 David H 2018-02-10 03:18:47 UTC

My concern about setting it at install is that it feels pretty brittle (not sure how stable this IP really is and if it would vary by network plugin).

I think there are a couple of options more from the registry side:

1) Appending ${KUBERNETES_SERVICE_HOST} to the NO_PROXY list within the pod at startup (or something with a similar affect that is more runtime driven)?

2) Access the master by "well known name", it appears that the api can be reached by name at "kubernetes.default":
https://kubernetes.io/docs/tasks/administer-cluster/access-cluster-api/

Right now from what I can tell no_proxy generally has only `.cluster.local` and `.svc` from a "cluster specific dns" in it , so for above use case I believe you would call https://kubernetes.default.svc:${KUBERNETES_SERVICE_PORT} or https://kubernetes.default.svc.cluster.local:${KUBERNETES_SERVICE_PORT} (tried both of these from within a container in my deployment and they resolved to the api)

```
sh-4.2$ curl https://kubernetes.default.svc.cluster.local:${KUBERNETES_SERVICE_PORT} -k -s | head -n2
{
  "paths": [
```

Comment 6 Ben Parees 2018-02-10 03:27:34 UTC

Option 1 is plausible but requires every component that runs as a pod and wants to reach the master api(and otherwise gets the proxy settings set), implement that solution.  This really isn't a registry specific problem.  It's a pod configuration problem.  (There will be more components running this way in the future).

And as I said, option 2 is not under our control, it is the k8s client behavior that determines how the k8s api is reached from within a pod.  Imho the "right" fix would be for k8s to inject the service_host as a hostname not an ip.

Comment 7 David H 2018-02-10 03:53:19 UTC

W.r.t. #2 , sorry about that, I misunderstood what you were saying. I thought you were saying you implemented to connection establishment yourself following the same pattern which would have given you the flexibility to change it. Agreed that if you are using the kubernetes rest client it is out of scope of the registry to address that.

I tend to agree with you on the "right" fix, but this would be a pretty significant change that would have to go through upstream kubernetes.

For option 1, I agree it isn't ideal but at least feels better to me than an installer "fix". I was wondering if this issue hasn't come up before because it may not be that common to want, within a single pod, to be able to (1) access outside networks and (2) access the kubernetes API while also being deployed in a network that requires an HTTP(s) proxy to get to these outside resources.

Comment 8 Scott Dodson 2018-02-12 18:49:24 UTC

The way we've fixed this in 3.9 is to add the kube service ip address to the global list of default NO_PROXY variables. This value is computed to be the first IP address in the kubernetes CIDR.

https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_facts/library/openshift_facts.py#L1156-L1157

We're waiting on our QE teams to verify a bug before we backport these changes to 3.7 and 3.6. This would only take effect on new installs, we'll instruct admins to modify existing environment variables on the DC to add the kube service ip.

*** This bug has been marked as a duplicate of bug 1511870 ***

Note You need to log in before you can comment on or make changes to this bug.