Bug 1469654 - image pruning doesn't work from outside the cluster
image pruning doesn't work from outside the cluster
Status: ASSIGNED
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image Registry (Show other bugs)
3.5.0
x86_64 Linux
unspecified Severity high
: ---
: 3.7.0
Assigned To: Michal Minar
ge liu
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-11 11:11 EDT by Anton Sherkhonov
Modified: 2017-07-18 09:21 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Anton Sherkhonov 2017-07-11 11:11:53 EDT
Description of problem:
When running the command:
#oc adm prune 
The oc tries to get healthz from the openshift registry via the internal openshift's registry ip address and times out, because registry's internal ip address is obviously unavailable outside the cluster.

Version-Release number of selected component (if applicable):
$ oc version
oc v3.6.0-alpha.1+16132e2-45
kubernetes v1.5.2+43a9be4
features: Basic-Auth

Server https://osemaster.sbu.lab.eng.bos.redhat.com:8443
openshift v3.5.5.15
kubernetes v1.5.2+43a9be4

How reproducible:
Always

Steps to Reproduce:
1. Cluster has a functional internal registry. User has cluster-admin privileges.
2. the client host we'll be running from is outside of the cluster (doesn't have access to Openshift's SDN).
3. run the following command specifying the <timeframe> to make sure you have some images to prune
#oc adm prune images --keep-younger-than=<timeframe>
4. run
#oc adm prune images --keep-younger-than=<timeframe> --confirm --loglevel=8


Actual results:
The level 8 error output:
I0711 10:43:58.188971    2817 prune.go:832] Using registry: 172.22.77.97:5000
I0711 10:43:58.188985    2817 prune.go:193] Trying https for 172.22.77.97:5000
I0711 10:43:58.189026    2817 round_trippers.go:296] GET https://172.22.77.97:5000/healthz
I0711 10:43:58.189034    2817 round_trippers.go:303] Request Headers:
I0711 10:43:58.189041    2817 round_trippers.go:306]     User-Agent: oc/v1.5.2+43a9be4 (darwin/amd64) kubernetes/43a9be4
I0711 10:43:58.189048    2817 round_trippers.go:306]     Authorization: Basic dW51c2VkOnk3VzY1cldGRTdyVXhwdnNsV2twOUtlTXZWSW5aalJsY3F2UzVoU2ZhelU=
I0711 10:45:13.826431    2817 round_trippers.go:321] Response Status:  in 75637 milliseconds
I0711 10:45:13.826454    2817 round_trippers.go:324] Response Headers:
I0711 10:45:13.826477    2817 prune.go:198] Error with https for 172.22.77.97:5000: Get https://172.22.77.97:5000/healthz: dial tcp 172.22.77.97:5000: getsockopt: operation timed out
I0711 10:45:13.826496    2817 prune.go:193] Trying http for 172.22.77.97:5000
I0711 10:45:13.826522    2817 round_trippers.go:296] GET http://172.22.77.97:5000/healthz
I0711 10:45:13.826531    2817 round_trippers.go:303] Request Headers:
I0711 10:45:13.826538    2817 round_trippers.go:306]     User-Agent: oc/v1.5.2+43a9be4 (darwin/amd64) kubernetes/43a9be4
I0711 10:45:13.826546    2817 round_trippers.go:306]     Authorization: Basic dW51c2VkOnk3VzY1cldGRTdyVXhwdnNsV2twOUtlTXZWSW5aalJsY3F2UzVoU2ZhelU=
I0711 10:46:30.104671    2817 round_trippers.go:321] Response Status:  in 76278 milliseconds
I0711 10:46:30.104700    2817 round_trippers.go:324] Response Headers:
I0711 10:46:30.104718    2817 prune.go:198] Error with http for 172.22.77.97:5000: Get http://172.22.77.97:5000/healthz: dial tcp 172.22.77.97:5000: getsockopt: operation timed out
F0711 10:46:30.106273    2817 helpers.go:116] error: error communicating with registry: Get http://172.22.77.97:5000/healthz: dial tcp 172.22.77.97:5000: getsockopt: operation timed out

(172.22.77.97 is the ip of the internal registry).

The images are still not pruned.


Expected results:
The images are pruned

Additional info:
When running `oc adm prune images` on one of the nodes under same user - the command succeeds, images are pruned.
Comment 1 Michal Minar 2017-07-12 04:23:44 EDT
Anton, for this purpose we offer `--registry-url` option for the `oadm prune images` command. Could you please try it and report back?
Comment 2 Anton Sherkhonov 2017-07-12 08:22:55 EDT
`--registry-url` works, I still think this is a bug though:
It is very inconsistent with other usages of `oc` command where you don't need anything except for master api endpoint.
If you don't have `--registry-url` the commmand without `--confirm` works just fine, that suggests that it should work with `--confirm` as well.
The error you get is "operation timed out" which doesn't make it easy to understand that you need `--registry-url`.
There is nothing in the documentation to support that.
Comment 3 Peter Portante 2017-07-12 12:06:00 EDT
So if "oc adm prune images" works from in the cluster, doesn't that mean it is accessing the registry via the internal service name?  If so, why not always use the external service name, if it exists, and only use the internal service name when it doe s not exist?

Then you would not need --registry url for internal vs external access, right?
Comment 4 Michal Minar 2017-07-17 11:30:10 EDT
The `--registry-url` flag is covered in PR [1]. But I see that the section about `--registry-url` will need to be back-ported to earlier versions. I'll take care of it.

[1] https://github.com/openshift/openshift-docs/pull/4471

> The error you get is "operation timed out" which doesn't make it easy to understand that you need `--registry-url`. There is nothing in the documentation to support that.

This really isn't a good user experience. It will be fixed by this bz.

> If you don't have `--registry-url` the commmand without `--confirm` works just fine, that suggests that it should work with `--confirm` as well.

I'm not really sure about this point. The `--registry-url` isn't really needed for the dry-run. Would it be enough to just document this better in command's help?

> If so, why not always use the external service name, if it exists, and only use the internal service name when it does not exist?

Unfortunately, it's pretty hard to determine the working external url of the registry. We don't have a way to safely determine it. Recently, we started to allow for external registry name to propagate into image streams [2]. However, making use of it is still optional, which still makes the internal IP the safest option from inside of cluster.

[2] https://github.com/openshift/origin/pull/14882

For the usage outside of cluster, I don't see a better option to `--registry-url`. Or making the URL discover-able from the master which has been discussed several times already.

Note You need to log in before you can comment on or make changes to this bug.