Hide Forgot
Description of problem: When running the command: #oc adm prune The oc tries to get healthz from the openshift registry via the internal openshift's registry ip address and times out, because registry's internal ip address is obviously unavailable outside the cluster. Version-Release number of selected component (if applicable): $ oc version oc v3.6.0-alpha.1+16132e2-45 kubernetes v1.5.2+43a9be4 features: Basic-Auth Server https://osemaster.sbu.lab.eng.bos.redhat.com:8443 openshift v3.5.5.15 kubernetes v1.5.2+43a9be4 How reproducible: Always Steps to Reproduce: 1. Cluster has a functional internal registry. User has cluster-admin privileges. 2. the client host we'll be running from is outside of the cluster (doesn't have access to Openshift's SDN). 3. run the following command specifying the <timeframe> to make sure you have some images to prune #oc adm prune images --keep-younger-than=<timeframe> 4. run #oc adm prune images --keep-younger-than=<timeframe> --confirm --loglevel=8 Actual results: The level 8 error output: I0711 10:43:58.188971 2817 prune.go:832] Using registry: 172.22.77.97:5000 I0711 10:43:58.188985 2817 prune.go:193] Trying https for 172.22.77.97:5000 I0711 10:43:58.189026 2817 round_trippers.go:296] GET https://172.22.77.97:5000/healthz I0711 10:43:58.189034 2817 round_trippers.go:303] Request Headers: I0711 10:43:58.189041 2817 round_trippers.go:306] User-Agent: oc/v1.5.2+43a9be4 (darwin/amd64) kubernetes/43a9be4 I0711 10:43:58.189048 2817 round_trippers.go:306] Authorization: Basic dW51c2VkOnk3VzY1cldGRTdyVXhwdnNsV2twOUtlTXZWSW5aalJsY3F2UzVoU2ZhelU= I0711 10:45:13.826431 2817 round_trippers.go:321] Response Status: in 75637 milliseconds I0711 10:45:13.826454 2817 round_trippers.go:324] Response Headers: I0711 10:45:13.826477 2817 prune.go:198] Error with https for 172.22.77.97:5000: Get https://172.22.77.97:5000/healthz: dial tcp 172.22.77.97:5000: getsockopt: operation timed out I0711 10:45:13.826496 2817 prune.go:193] Trying http for 172.22.77.97:5000 I0711 10:45:13.826522 2817 round_trippers.go:296] GET http://172.22.77.97:5000/healthz I0711 10:45:13.826531 2817 round_trippers.go:303] Request Headers: I0711 10:45:13.826538 2817 round_trippers.go:306] User-Agent: oc/v1.5.2+43a9be4 (darwin/amd64) kubernetes/43a9be4 I0711 10:45:13.826546 2817 round_trippers.go:306] Authorization: Basic dW51c2VkOnk3VzY1cldGRTdyVXhwdnNsV2twOUtlTXZWSW5aalJsY3F2UzVoU2ZhelU= I0711 10:46:30.104671 2817 round_trippers.go:321] Response Status: in 76278 milliseconds I0711 10:46:30.104700 2817 round_trippers.go:324] Response Headers: I0711 10:46:30.104718 2817 prune.go:198] Error with http for 172.22.77.97:5000: Get http://172.22.77.97:5000/healthz: dial tcp 172.22.77.97:5000: getsockopt: operation timed out F0711 10:46:30.106273 2817 helpers.go:116] error: error communicating with registry: Get http://172.22.77.97:5000/healthz: dial tcp 172.22.77.97:5000: getsockopt: operation timed out (172.22.77.97 is the ip of the internal registry). The images are still not pruned. Expected results: The images are pruned Additional info: When running `oc adm prune images` on one of the nodes under same user - the command succeeds, images are pruned.
Anton, for this purpose we offer `--registry-url` option for the `oadm prune images` command. Could you please try it and report back?
`--registry-url` works, I still think this is a bug though: It is very inconsistent with other usages of `oc` command where you don't need anything except for master api endpoint. If you don't have `--registry-url` the commmand without `--confirm` works just fine, that suggests that it should work with `--confirm` as well. The error you get is "operation timed out" which doesn't make it easy to understand that you need `--registry-url`. There is nothing in the documentation to support that.
So if "oc adm prune images" works from in the cluster, doesn't that mean it is accessing the registry via the internal service name? If so, why not always use the external service name, if it exists, and only use the internal service name when it doe s not exist? Then you would not need --registry url for internal vs external access, right?
The `--registry-url` flag is covered in PR [1]. But I see that the section about `--registry-url` will need to be back-ported to earlier versions. I'll take care of it. [1] https://github.com/openshift/openshift-docs/pull/4471 > The error you get is "operation timed out" which doesn't make it easy to understand that you need `--registry-url`. There is nothing in the documentation to support that. This really isn't a good user experience. It will be fixed by this bz. > If you don't have `--registry-url` the commmand without `--confirm` works just fine, that suggests that it should work with `--confirm` as well. I'm not really sure about this point. The `--registry-url` isn't really needed for the dry-run. Would it be enough to just document this better in command's help? > If so, why not always use the external service name, if it exists, and only use the internal service name when it does not exist? Unfortunately, it's pretty hard to determine the working external url of the registry. We don't have a way to safely determine it. Recently, we started to allow for external registry name to propagate into image streams [2]. However, making use of it is still optional, which still makes the internal IP the safest option from inside of cluster. [2] https://github.com/openshift/origin/pull/14882 For the usage outside of cluster, I don't see a better option to `--registry-url`. Or making the URL discover-able from the master which has been discussed several times already.
>> The error you get is "operation timed out" which doesn't make it easy to understand that you need `--registry-url`. There is nothing in the documentation to support that. > This really isn't a good user experience. It will be fixed by this bz. @michal, did this ever get fixed? > The `--registry-url` flag is covered in PR [1]. But I see that the section about `--registry-url` will need to be back-ported to earlier versions. I'll take care of it. did this documentation get backported? > I'm not really sure about this point. The `--registry-url` isn't really needed for the dry-run. Would it be enough to just document this better in command's help? yes, that seems reasonable, let's do that. I am lowering the severity of this bug as all I see are: 1) better error/timeout logic 2) some better docs 3) some better help text Certainly not a blocker.
I don't think got fixed, the error is really not good user experience, we should probably improve it to "operation timed out while contacting registry XYZ".
oadm prune help and error-reporting fixing PR: https://github.com/openshift/origin/pull/16655
The PR has been fixed. The only missing pieces are documentation back-port PRs [1] and [2] for the --registry-url flag. [1] https://github.com/openshift/openshift-docs/pull/5535 (OCP 3.5) [2] https://github.com/openshift/openshift-docs/pull/5536 (OCP 3.4)
The documentation PRs have been merged as well.
Verified # oc version oc v3.7.0-0.147.0 kubernetes v1.7.6+a08f5eeb62 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://:8443 openshift v3.7.0-0.143.2 kubernetes v1.7.0+80709908fd When prune images outside cluster without '--registry-url' flag will prompt error like below: # oadm prune images --keep-younger-than=0 --confirm error: failed to ping registry docker-registry.default.svc:5000: [Get https://docker-registry.default.svc:5000/: dial tcp: lookup docker-registry.default.svc on 10.72.17.5:53: no such host, Get http://docker-registry.default.svc:5000/: dial tcp: lookup docker-registry.default.svc on 10.72.17.5:53: no such host] * Please provide a reachable route to the integrated registry using --registry-url. Docs and help text look good, so move to verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188