Bug 1911470 - ServiceAccount Registry Authfiles Do Not Contain Entries for Public Hostnames
Summary: ServiceAccount Registry Authfiles Do Not Contain Entries for Public Hostnames
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image Registry
Version: 4.1.z
: 4.8.0
Assignee: Ricardo Maraschini
QA Contact: Wenjing Zheng
Blocks: 1931856
TreeView+ depends on / blocked
Reported: 2020-12-29 16:06 UTC by Steve Kuznetsov
Modified: 2021-07-27 22:36 UTC (History)
5 users (show)

Doc Type: Bug Fix
Cause: Automatically created docker config secret does not include credentials for integrated internal registry routes. Consequence: As no credentials were present for accessing the registry through any of its routes pods attempting to reach the registry were failing due to lack of authentication. Fix: Include all configured registry routes to the default docker credential secret. Result: Now pods can reach the integrated registry by any of its routes as credentials now contain an entry for each route.
Last Closed: 2021-07-27 22:35:38 UTC
System ID Private Priority Status Summary Last Updated
Github openshift cluster-openshift-controller-manager-operator pull 197 0 None open Bug 1911470: Set registry routes in operand config 2021-02-15 15:28:04 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:36:04 UTC

Description Steve Kuznetsov 2020-12-29 16:06:05 UTC
We have a relatively new cluster:

$ oc --context app.ci get clusterversion version
version   4.6.1     True        False         67d     Cluster version is 4.6.1

This cluster has the registry configured to use a custom route as well as a default one:

$ oc --context app.ci get configs.imageregistry.operator.openshift.io cluster -o jsonpath={.spec.defaultRoute}
$ oc --context app.ci get configs.imageregistry.operator.openshift.io cluster -o jsonpath={.spec.routes[0].hostname}

Both are live:

$ oc --context app.ci get image.config.openshift.io cluster -o jsonpath={.status.externalRegistryHostnames}
[default-route-openshift-image-registry.apps.ci.l2s4.p1.openshiftapps.com registry.ci.openshift.org]

However, no ServiceAccount for this cluster has any entry for the external hostnames in the authfile generated for the registry:

$ oc --context app.ci --namespace ocp extract secrets/default-dockercfg-x878r --to=- | jq 'keys'
# .dockercfg

On a 3.11 cluster, entries are automatically generated for the external hostnames.

Comment 1 Clayton Coleman 2020-12-29 16:29:13 UTC
On an OSD cluster with 2 public routes (correctly configured in the cluster image configuration)

- apiVersion: config.openshift.io/v1
  kind: Image
      release.openshift.io/create-only: "true"
    creationTimestamp: "2020-04-16T19:11:37Z"
    generation: 2
    name: cluster
    resourceVersion: "211280865"
    selfLink: /apis/config.openshift.io/v1/images/cluster
    uid: e6d14209-b45e-40ac-bf51-74b870d7c0ad
    - registry.ci.openshift.org
    - default-route-openshift-image-registry.apps.ci.l2s4.p1.openshiftapps.com
    - registry.ci.openshift.org
    internalRegistryHostname: image-registry.openshift-image-registry.svc:5

the openshiftcontrollermanagers config is only sending the internal address to be generated into the pull secret:

$ oc get openshiftcontrollermanagers.operator.openshift.io -o yaml

    logLevel: ""
    managementState: Managed
          resources: {}
          format: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2986a09ed686a571312bcb20d648baac46b422efa072f8b68eb41c7996e94610
          format: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f42509c18cf5e41201d64cf3a9c1994ffa5318f8d7cee5de45fa2da914e68bbc
        internalRegistryHostname: image-registry.openshift-image-registry.svc:5000
        ingressIPNetworkCIDR: ""
    operatorLogLevel: ""
    unsupportedConfigOverrides: null

It should be sending all the public names as well as the internal registry name.  This allows someone to work with both public names and private names equally and pods still work.  The public names allow resiliency if someone sets up a proxy in front of one of those names.

I'm not positive this is a regression, but this absolutely broke a scenario that worked in 3.11 when we moved to 4.6, so marking it as such (it could break others moving from 3 to 4) and we'll need to assess whether this has been broken all of 4.x or just 4.6 in determining whether to backport (if it's broken in all 4.x, i think 4.6 only is acceptable).

Comment 3 Ricardo Maraschini 2021-01-04 12:12:04 UTC
I can confirm that this is the behavior in 4.1 as well: no extra entry for the external route is created.

Comment 4 Oleg Bulatov 2021-01-04 14:19:30 UTC
What you described works as designed and made intentionally. If you want to use the external name, you need to create a secret manually. This will make your manifests transferable between clusters. If your manifests are supposed to be used only on one cluster, then there is no reason to use the external route. Usage of external routes may give you feeling that you can easily transfer you manifest to another cluster, but you can't.

So I'd be very careful with adding external names to these secrets.

Steve, what is your use case? You haven't described why you need this. This looks more like an RFE.

> The public names allow resiliency if someone sets up a proxy in front of one of those names.

Clayton, can you elaborate on what kind of proxy you want in front of image-registry.openshift-image-registry.svc?

Comment 5 Hongkai Liu 2021-01-04 14:36:13 UTC
Just want to report a use case (not the same as the one reported for this bug by Steve) while migrating the CI registry from a 311 cluster to 4.6.
The suggested workaround is to use the internal hostname of registry's svc.

Comment 6 Oleg Bulatov 2021-01-05 11:24:36 UTC
Migration is expected, you cannot use docker-registry.default.svc:5000 either. If you want to use an external name, you should treat it as an external registry.

Comment 7 Steve Kuznetsov 2021-01-05 15:52:39 UTC
How is this an RFE if it's a regression over previous behavior? I want to be able to use the internal or external hostname to refer to the registry. They're identical. We've built up an enormous amount of nonsense automation to re-write secrets including the external hostname to deal with this.

Comment 8 Oleg Bulatov 2021-01-05 17:10:46 UTC
> They're identical.

Except that the traffic for the external hostname goes through the load balancer and the router.

Not every 3.11 feature is supported by 4.x. Storage quota is gone, Alibaba storage is gone, etc. That can return, if somebody asks for it. But so far we removed external hostnames to protect you from creating configurations that are hard to migrate from one cluster to another cluster. It protects you from using external load balancers when they are not needed. I guess you have a reason why it's better to use external names in your case, but you haven't told it yet.

As we already have 5 big releases without this feature, so I don't consider it's to be a regression. I'm OK to make an option in 4.8 (or in a version that our PM selects) that will enable the behavior that you want.

Comment 9 Steve Kuznetsov 2021-01-08 18:16:35 UTC
Using a known, functional external hostname makes the migration easier, not harder. Since this works and is valid in 3.x and not functional in 4.x, this is a regression in the product that will break user workloads that expect it to continue working. We should reinstate this - if not by default, at least as opt-in.

Comment 10 Clayton Coleman 2021-01-08 18:26:42 UTC
This is a regression in the product.  The design of the image stream public field is that the public hostname can be pulled by pods.  It must be fixed.  It should be on by default.  There is no downside to having this on by default.

Comment 11 Wenjing Zheng 2021-02-07 09:09:53 UTC
It works with the open PR with below results:
$ oc extract secrets/default-dockercfg-dbn69 --to=- | jq 'keys'
# .dockercfg
zhengwenjings-MacBook-Pro:4.0 wzheng$ oc get routes
NAME            HOST/PORT                                                                                             PATH   SERVICES         PORT    TERMINATION   WILDCARD
default-route   default-route-openshift-image-registry.apps.ci-ln-2tc5v1t-f76d1.origin-ci-int-gce.dev.openshift.com          image-registry   <all>   reencrypt     None
myregistry      registry.ci.openshift.org                                                                                    image-registry   <all>   reencrypt     None

Comment 13 Wenjing Zheng 2021-03-02 08:06:36 UTC
$ oc extract secrets/default-dockercfg-82d4h --to=- | jq 'keys'
# .dockercfg
[wzheng@preserve-docker-slave 4.8]$ oc get routes
Unable to connect to the server: Service Unavailable
[wzheng@preserve-docker-slave 4.8]$ oc get routes
NAME            HOST/PORT                                                                                    PATH   SERVICES         PORT    TERMINATION   WILDCARD
default-route   default-route-openshift-image-registry.apps.wxj-c2s32.govcloudemu.devcluster.openshift.com          image-registry   <all>   reencrypt     None

Verified on  4.8.0-0.nightly-2021-03-01-143026.

Comment 14 Hongkai Liu 2021-04-13 20:14:43 UTC
Thanks for the fix.

oc get clusterversion
version   4.7.6     True        False         25m     Cluster version is 4.7.6

oc get secret default-dockercfg-ftz8j -o yaml | yq -r '.data.".dockercfg"' | base64 -d | jq -r '.|keys[]'

Comment 17 errata-xmlrpc 2021-07-27 22:35:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


