Bug 1677606 - spec.proxy doesn't work after setting in imageregisgtry operator
Summary: spec.proxy doesn't work after setting in imageregisgtry operator
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image Registry
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.1.0
Assignee: Oleg Bulatov
QA Contact: Wenjing Zheng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-15 10:53 UTC by XiuJuan Wang
Modified: 2019-06-04 10:44 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:44:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 0 None None None 2019-06-04 10:44:12 UTC

Description XiuJuan Wang 2019-02-15 10:53:46 UTC
Description of problem:

spec.proxy doesn't work after setting in imageregisgtry operator

Version-Release number of selected component (if applicable):
$ oc get clusterversion 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.nightly-2019-02-13-204401   True        False         26h       Error while reconciling 4.0.0-0.nightly-2019-02-13-204401: the cluster operator openshift-samples is failing

how reproducible:
always

Steps to Reproduce:
1.Set invalid proxy in imageregistry operator
oc edit configs.imageregistry.operator.openshift.io/instance -n openshift-image-registry
  proxy:
    http: http://invalid.rdu.redhat.com:3128
    https: http://invalid.rdu.redhat.com:3128
2. Trigger a build
3. Deploy a pod

Actual results:

build and pod could be running with invalid proxy.

Expected results:
build and pod shouldn't be running 

Additional info:

Comment 1 Oleg Bulatov 2019-02-15 11:14:27 UTC
The proxy field affects how the registry interacts with the world. Basically it's for pullthrough (reference-policy=local) and for storages that use default HTTP transport (for example, S3).

What storage was used for this test?

Comment 2 XiuJuan Wang 2019-02-18 07:18:54 UTC
My imagestream is pullthrough policy, and the storage is S3.
In latest cluster, configure invalid proxy and valid proxy, both met below error:

For deployed pod:
 Failed to pull image "image-registry.openshift-image-registry.svc:5000/openshift/mongodb@sha256:ee70413940eba29389647bb352bcf5041da973de137f260ac3f95745d714a077": rpc error: code = Unknown desc = unable to retrieve auth token: invalid username/password

For builds:
$ oc build-logs  ruby-hello-world-2 -n xiu 
Command "build-logs" is deprecated, use oc logs build/<build-name>
Cloning "https://github.com/openshift/ruby-hello-world.git" ...
	Commit:	787f1beae9956c959c6af62ee43bfdda73769cf7 (Merge pull request #78 from bparees/v22)
	Author:	Ben Parees <bparees.github.com>
	Date:	Thu Jan 17 17:21:03 2019 -0500
Caching blobs under "/var/cache/blobs".
Pulling docker://image-registry.openshift-image-registry.svc:5000/openshift/ruby@sha256:86b9e4119c4fff8ca5cf4b726249d1533602b6aca83b813fe10193cfbd222580
error: build error: 1 error occurred:
	* Error determining manifest MIME type for docker://image-registry.openshift-image-registry.svc:5000/openshift/ruby@sha256:86b9e4119c4fff8ca5cf4b726249d1533602b6aca83b813fe10193cfbd222580: unable to retrieve auth token: invalid username/password

My valid proxy:
  Proxy:
    Http:    http://file.rdu.redhat.com:3128
    Https:   http://file.rdu.redhat.com:3128

The invalid proxy:
  Proxy:
    Http:    http://invalid.rdu.redhat.com:3128
    Https:   http://invalid.rdu.redhat.com:3128

$ oc get clusterversion 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.nightly-2019-02-17-024922   True        False         179m      4.0.0-0.nightly-2019-02-17-024922

Comment 3 Oleg Bulatov 2019-02-18 07:36:57 UTC
Will you get the same error if you remove the proxy configuration from the operator's config? Can you attach logs from the registry pods?

Comment 4 XiuJuan Wang 2019-02-18 08:16:26 UTC
After remove proxy configuration from the imageregistry operator's config, the builds and deploy pod are running well.

Here is the registry pod logs after set proxy
time="2019-02-18T08:14:58.631395869Z" level=error msg="Get user failed with error: Get https://172.30.0.1:443/apis/user.openshift.io/v1/users/~: proxyconnect tcp: dial tcp: lookup file.rdu.redhat.com on 172.30.0.10:53: no such host" go.version=go1.10.8 http.request.host="10.129.0.85:5000" http.request.id=b4b85125-06c7-4e16-b01f-a71ad69f5e29 http.request.method=GET http.request.remoteaddr="10.131.0.33:43174" http.request.uri=/extensions/v2/metrics http.request.useragent=Prometheus/2.7.1
time="2019-02-18T08:14:58.631452416Z" level=error msg="error checking authorization: Get https://172.30.0.1:443/apis/user.openshift.io/v1/users/~: proxyconnect tcp: dial tcp: lookup file.rdu.redhat.com on 172.30.0.10:53: no such host" go.version=go1.10.8 http.request.host="10.129.0.85:5000" http.request.id=b4b85125-06c7-4e16-b01f-a71ad69f5e29 http.request.method=GET http.request.remoteaddr="10.131.0.33:43174" http.request.uri=/extensions/v2/metrics http.request.useragent=Prometheus/2.7.1
time="2019-02-18T08:14:58.631493702Z" level=warning msg="error authorizing context: Get https://172.30.0.1:443/apis/user.openshift.io/v1/users/~: proxyconnect tcp: dial tcp: lookup file.rdu.redhat.com on 172.30.0.10:53: no such host" go.version=go1.10.8 http.request.host="10.129.0.85:5000" http.request.id=b4b85125-06c7-4e16-b01f-a71ad69f5e29 http.request.method=GET http.request.remoteaddr="10.131.0.33:43174" http.request.uri=/extensions/v2/metrics http.request.useragent=Prometheus/2.7.1
time="2019-02-18T08:14:58.631550734Z" level=info msg=response go.version=go1.10.8 http.request.host="10.129.0.85:5000" http.request.id=6ac6c74c-1dfe-4a2b-a826-5c16c421e9cc http.request.method=GET http.request.remoteaddr="10.131.0.33:43174" http.request.uri=/extensions/v2/metrics http.request.useragent=Prometheus/2.7.1 http.response.duration=39.801847ms http.response.status=400 http.response.written=0
time="2019-02-18T08:14:58.631391382Z" level=error msg="Get user failed with error: Get https://172.30.0.1:443/apis/user.openshift.io/v1/users/~: proxyconnect tcp: dial tcp: lookup file.rdu.redhat.com on 172.30.0.10:53: no such host" go.version=go1.10.8 http.request.host="10.129.0.85:5000" http.request.id=ed32c376-70e2-47ba-b9d3-8d75ab485caa http.request.method=GET http.request.remoteaddr="10.129.2.34:36960" http.request.uri=/extensions/v2/metrics http.request.useragent=Prometheus/2.7.1
time="2019-02-18T08:14:58.631729776Z" level=error msg="error checking authorization: Get https://172.30.0.1:443/apis/user.openshift.io/v1/users/~: proxyconnect tcp: dial tcp: lookup file.rdu.redhat.com on 172.30.0.10:53: no such host" go.version=go1.10.8 http.request.host="10.129.0.85:5000" http.request.id=ed32c376-70e2-47ba-b9d3-8d75ab485caa http.request.method=GET http.request.remoteaddr="10.129.2.34:36960" http.request.uri=/extensions/v2/metrics http.request.useragent=Prometheus/2.7.1
time="2019-02-18T08:14:58.631766833Z" level=warning msg="error authorizing context: Get https://172.30.0.1:443/apis/user.openshift.io/v1/users/~: proxyconnect tcp: dial tcp: lookup file.rdu.redhat.com on 172.30.0.10:53: no such host" go.version=go1.10.8 http.request.host="10.129.0.85:5000" http.request.id=ed32c376-70e2-47ba-b9d3-8d75ab485caa http.request.method=GET http.request.remoteaddr="10.129.2.34:36960" http.request.uri=/extensions/v2/metrics http.request.useragent=Prometheus/2.7.1
time="2019-02-18T08:14:58.63180308Z" level=info msg=response go.version=go1.10.8 http.request.host="10.129.0.85:5000" http.request.id=6d31b759-6c4b-47d6-98ed-b36a00b2ef83 http.request.method=GET http.request.remoteaddr="10.129.2.34:36960" http.request.uri=/extensions/v2/metrics http.request.useragent=Prometheus/2.7.1 http.response.duration=38.650089ms http.response.status=400 http.response.written=0
time="2019-02-18T08:15:00.067780796Z" level=info msg=response go.version=go1.10.8 http.request.host="10.129.0.85:5000" http.request.id=a1999c79-eabb-422f-a12c-08cf9bea5b06 http.request.method=GET http.request.remoteaddr="10.129.0.1:60918" http.request.uri=/healthz http.request.useragent=kube-probe/1.12+ http.response.duration="65.181µs" http.response.status=200 http.response.written=0
time="2019-02-18T08:15:02.81390424Z" level=info msg=response go.version=go1.10.8 http.request.host="10.129.0.85:5000" http.request.id=479f4da4-287f-4bc1-bf0d-522d9690922e http.request.method=GET http.request.remoteaddr="10.129.0.1:60934" http.request.uri=/healthz http.request.useragent=kube-probe/1.12+ http.response.duration="88.186µs" http.response.status=200 http.response.written=0

Comment 5 Oleg Bulatov 2019-02-18 13:23:39 UTC
There should be

  Proxy:
    NoProxy: master.hostname.example.com,10.1.0.0/16,172.30.0.0/16

so that the registry wouldn't use proxy when it tried to contact the master API.

Comment 6 Ben Parees 2019-02-18 23:25:33 UTC
Setting back to on_qa since it looks like this was a test case definition/execution problem.

Comment 7 XiuJuan Wang 2019-02-19 02:43:29 UTC
Oleg,I want to registry talk to master via proxy, so I only set [1] without noProxy.
Also I tried adding noProxy [2], but the build logs error change to [3], and the registry logs change to [4]


[1]
  proxy:
    http: http://file.rdu.redhat.com:3128
    https: http://file.rdu.redhat.com:3128

[2]
  proxy:
    http: http://file.rdu.redhat.com:3128
    https: http://file.rdu.redhat.com:3128
    noProxy: image-registry.openshift-image-registry.svc:5000,10.129.0.0/16,172.30.0.0/16

[3]
$ oc build-logs ruby-ex-4 
Command "build-logs" is deprecated, use oc logs build/<build-name>
Cloning "https://github.com/sclorg/ruby-ex.git" ...
	Commit:	c00ecd7c762590f1d52c316c7d00141a745ede18 (Merge pull request #25 from pvalena/master)
	Author:	Honza Horak <hhorak>
	Date:	Thu Dec 13 15:35:54 2018 +0100
Caching blobs under "/var/cache/blobs".
Pulling docker://image-registry.openshift-image-registry.svc:5000/openshift/ruby@sha256:86b9e4119c4fff8ca5cf4b726249d1533602b6aca83b813fe10193cfbd222580
error: build error: 1 error occurred:
	* Error determining manifest MIME type for docker://image-registry.openshift-image-registry.svc:5000/openshift/ruby@sha256:86b9e4119c4fff8ca5cf4b726249d1533602b6aca83b813fe10193cfbd222580: Error reading manifest sha256:86b9e4119c4fff8ca5cf4b726249d1533602b6aca83b813fe10193cfbd222580 in image-registry.openshift-image-registry.svc:5000/openshift/ruby: received unexpected HTTP status: 500 Internal Server Error

[4]
time="2019-02-19T02:36:57.858955187Z" level=info msg="authorized request" go.version=go1.10.8 http.request.host="10.129.0.233:5000" http.request.id=b262a32a-87c2-4aca-a0e7-54a1c5b5908e http.request.method=GET http.request.remoteaddr="10.131.0.41:54096" http.request.uri=/extensions/v2/metrics http.request.useragent=Prometheus/2.7.1 openshift.auth.user="system:serviceaccount:openshift-monitoring:prometheus-k8s"
time="2019-02-19T02:36:57.864811105Z" level=info msg="response completed" go.version=go1.10.8 http.request.host="10.129.0.233:5000" http.request.id=b262a32a-87c2-4aca-a0e7-54a1c5b5908e http.request.method=GET http.request.remoteaddr="10.131.0.41:54096" http.request.uri=/extensions/v2/metrics http.request.useragent=Prometheus/2.7.1 http.response.contenttype="text/plain; version=0.0.4" http.response.duration=24.927084ms http.response.status=200 http.response.written=3113
time="2019-02-19T02:36:57.864966109Z" level=info msg=response go.version=go1.10.8 http.request.host="10.129.0.233:5000" http.request.id=36fa074c-02fc-4a77-9804-c86aa1ba1ea7 http.request.method=GET http.request.remoteaddr="10.131.0.41:54096" http.request.uri=/extensions/v2/metrics http.request.useragent=Prometheus/2.7.1 http.response.contenttype="text/plain; version=0.0.4" http.response.duration=25.123414ms http.response.status=200 http.response.written=3113
time="2019-02-19T02:36:57.93629839Z" level=info msg="authorized request" go.version=go1.10.8 http.request.host="10.129.0.233:5000" http.request.id=7338885c-56d0-4373-89bd-0eb6bdaf8cf5 http.request.method=GET http.request.remoteaddr="10.129.2.41:35926" http.request.uri=/extensions/v2/metrics http.request.useragent=Prometheus/2.7.1 openshift.auth.user="system:serviceaccount:openshift-monitoring:prometheus-k8s"
time="2019-02-19T02:36:57.942620951Z" level=info msg="response completed" go.version=go1.10.8 http.request.host="10.129.0.233:5000" http.request.id=7338885c-56d0-4373-89bd-0eb6bdaf8cf5 http.request.method=GET http.request.remoteaddr="10.129.2.41:35926" http.request.uri=/extensions/v2/metrics http.request.useragent=Prometheus/2.7.1 http.response.contenttype="text/plain; version=0.0.4" http.response.duration=86.525487ms http.response.status=200 http.response.written=3115
time="2019-02-19T02:36:57.942758493Z" level=info msg=response go.version=go1.10.8 http.request.host="10.129.0.233:5000" http.request.id=d5591e96-5a34-41b6-903b-7261b69e1414 http.request.method=GET http.request.remoteaddr="10.129.2.41:35926" http.request.uri=/extensions/v2/metrics http.request.useragent=Prometheus/2.7.1 http.response.contenttype="text/plain; version=0.0.4" http.response.duration=86.70635ms http.response.status=200 http.response.written=3115
time="2019-02-19T02:37:00.922262852Z" level=info msg=response go.version=go1.10.8 http.request.host="10.129.0.233:5000" http.request.id=7cd178a8-0b92-4ff2-9090-f7bcd7365e21 http.request.method=GET http.request.remoteaddr="10.129.0.1:45038" http.request.uri=/healthz http.request.useragent=kube-probe/1.12+ http.response.duration="48.773µs" http.response.status=200 http.response.written=0
time="2019-02-19T02:37:03.970172267Z" level=info msg=response go.version=go1.10.8 http.request.host="10.129.0.233:5000" http.request.id=f5ec9ab7-9f99-4a0c-8a1d-2fb03baeba35 http.request.method=GET http.request.remoteaddr="10.129.0.1:45094" http.request.uri=/healthz http.request.useragent=kube-probe/1.12+ http.response.duration="65.812µs" http.response.status=200 http.response.written=0
time="2019-02-19T02:37:10.922321844Z" level=info msg=response go.version=go1.10.8 http.request.host="10.129.0.233:5000" http.request.id=94936950-42b9-45bb-ab0a-5815ef9ff848 http.request.method=GET http.request.remoteaddr="10.129.0.1:45172" http.request.uri=/healthz http.request.useragent=kube-probe/1.12+ http.response.duration="64.259µs" http.response.status=200 http.response.written=0
time="2019-02-19T02:37:13.986080758Z" level=info msg=response go.version=go1.10.8 http.request.host="10.129.0.233:5000" http.request.id=cfc6f9a1-70c4-4ad3-b74f-90c6b74d398b http.request.method=GET http.request.remoteaddr="10.129.0.1:45220" http.request.uri=/healthz http.request.useragent=kube-probe/1.12+ http.response.duration="49.498µs" http.response.status=200 http.response.written=0
time="2019-02-19T02:37:20.922109127Z" level=info msg=response go.version=go1.10.8 http.request.host="10.129.0.233:5000" http.request.id=5d8419e5-a2c5-4452-9c5f-4437cbaf1b62 http.request.method=GET http.request.remoteaddr="10.129.0.1:45298" http.request.uri=/healthz http.request.useragent=kube-probe/1.12+ http.response.duration="63.377µs" http.response.status=200 http.response.written=0
time="2019-02-19T02:37:23.970517974Z" level=info msg=response go.version=go1.10.8 http.request.host="10.129.0.233:5000" http.request.id=df02b04b-0d82-4cbd-8929-61b8213f5f19 http.request.method=GET http.request.remoteaddr="10.129.0.1:45346" http.request.uri=/healthz http.request.useragent=kube-probe/1.12+ http.response.duration="75.278µs" http.response.status=200 http.response.written=0

Comment 8 Oleg Bulatov 2019-02-19 07:56:26 UTC
In the registry log [4] there is no requests from the builder, only requests to /extensions/v2/metrics and /healthz (i.e. from monitoring and load balancer).

Is it the full log?

BTW, the error

    proxyconnect tcp: dial tcp: lookup file.rdu.redhat.com on 172.30.0.10:53: no such host

means that file.rdu.redhat.com is not resolvable from within the pod. Can you rsh into the pod and check that the proxy is reachable using wget/curl/...?

And I don't think we have an option to talk with the master API through the external IP address. In other words, the registry will always use something like 172.30.0.1, which is not accessible to external proxies.

Comment 9 XiuJuan Wang 2019-02-19 10:11:51 UTC
Here is the full log of registry pod http://pastebin.test.redhat.com/717379 .
My proxy setting:
  proxy:
    http: http://file.rdu.redhat.com:3128
    https: http://file.rdu.redhat.com:3128
    noProxy: ip-10-0-154-218.us-east-2.compute.internal,10.1.0.0/16,172.30.0.0/16

Indeed, in registry pod can't access my proxy file.rdu.redhat.com, so the operator treat file.rdu.redhat.com as invalid configure. If the promption message is accessable in registry pod and builds, I could verify this bug as workable.

$oc rsh image-registry-74bf564f9d-4slrx 
curl: (6) Could not resolve host: file.rdu.redhat.com; Unknown error
sh-4.2$ curl squid.redhat.com:3128 
curl: (6) Could not resolve host: squid.redhat.com; Unknown error
sh-4.2$ curl file.rdu.redhat.com:3128
curl: (6) Could not resolve host: file.rdu.redhat.com; Unknown error

Comment 11 Oleg Bulatov 2019-02-19 13:09:10 UTC
You can try with another proxy. But at least we know that the registry tries to use a proxy server if spec.proxy is set.

Comment 12 XiuJuan Wang 2019-02-20 02:41:47 UTC
After set invalid proxy, the registry could recognize spec.proxy setting, meantime the build and pod aren't running due to not resolve host.
So mark this bug as verified.
Thanks Oleg

$ oc get clusterversion 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.nightly-2019-02-18-224151   True        False         17h       Cluster version is 4.0.0-0.nightly-2019-02-18-224151

Comment 13 XiuJuan Wang 2019-02-20 03:06:24 UTC
Oleg,
when I set proxy with correct noProxy option, if my build and pod could be running? But now you can see the build error in comment #7 [3].
  proxy:
    http: http://invalid.rdu.redhat.com:3128
    https: http://invalid.rdu.redhat.com:3128
    noProxy: ip-10-0-154-218.us-east-2.compute.internal,10.1.0.0/16,172.30.0.0/16

Comment 15 XiuJuan Wang 2019-02-26 02:07:34 UTC
Oleg,
Thanks, I forget to remove the needinfo tag after get feedback from slack.

Comment 18 errata-xmlrpc 2019-06-04 10:44:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.