Bug 1911470
Summary: | ServiceAccount Registry Authfiles Do Not Contain Entries for Public Hostnames | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Steve Kuznetsov <skuznets> |
Component: | Image Registry | Assignee: | Ricardo Maraschini <rmarasch> |
Status: | CLOSED ERRATA | QA Contact: | Wenjing Zheng <wzheng> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.1.z | CC: | aaleman, aos-bugs, ccoleman, hongkliu, rmarasch |
Target Milestone: | --- | ||
Target Release: | 4.8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause:
Automatically created docker config secret does not include credentials for integrated internal registry routes.
Consequence:
As no credentials were present for accessing the registry through any of its routes pods attempting to reach the registry were failing due to lack of authentication.
Fix:
Include all configured registry routes to the default docker credential secret.
Result:
Now pods can reach the integrated registry by any of its routes as credentials now contain an entry for each route.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-27 22:35:38 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1931856 |
Description
Steve Kuznetsov
2020-12-29 16:06:05 UTC
On an OSD cluster with 2 public routes (correctly configured in the cluster image configuration) - apiVersion: config.openshift.io/v1 kind: Image metadata: annotations: release.openshift.io/create-only: "true" creationTimestamp: "2020-04-16T19:11:37Z" generation: 2 name: cluster resourceVersion: "211280865" selfLink: /apis/config.openshift.io/v1/images/cluster uid: e6d14209-b45e-40ac-bf51-74b870d7c0ad spec: externalRegistryHostnames: - registry.ci.openshift.org status: externalRegistryHostnames: - default-route-openshift-image-registry.apps.ci.l2s4.p1.openshiftapps.com - registry.ci.openshift.org internalRegistryHostname: image-registry.openshift-image-registry.svc:5 the openshiftcontrollermanagers config is only sending the internal address to be generated into the pull secret: $ oc get openshiftcontrollermanagers.operator.openshift.io -o yaml spec: logLevel: "" managementState: Managed observedConfig: build: buildDefaults: resources: {} imageTemplateFormat: format: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2986a09ed686a571312bcb20d648baac46b422efa072f8b68eb41c7996e94610 deployer: imageTemplateFormat: format: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f42509c18cf5e41201d64cf3a9c1994ffa5318f8d7cee5de45fa2da914e68bbc dockerPullSecret: internalRegistryHostname: image-registry.openshift-image-registry.svc:5000 ingress: ingressIPNetworkCIDR: "" operatorLogLevel: "" unsupportedConfigOverrides: null It should be sending all the public names as well as the internal registry name. This allows someone to work with both public names and private names equally and pods still work. The public names allow resiliency if someone sets up a proxy in front of one of those names. I'm not positive this is a regression, but this absolutely broke a scenario that worked in 3.11 when we moved to 4.6, so marking it as such (it could break others moving from 3 to 4) and we'll need to assess whether this has been broken all of 4.x or just 4.6 in determining whether to backport (if it's broken in all 4.x, i think 4.6 only is acceptable). I can confirm that this is the behavior in 4.1 as well: no extra entry for the external route is created. What you described works as designed and made intentionally. If you want to use the external name, you need to create a secret manually. This will make your manifests transferable between clusters. If your manifests are supposed to be used only on one cluster, then there is no reason to use the external route. Usage of external routes may give you feeling that you can easily transfer you manifest to another cluster, but you can't.
So I'd be very careful with adding external names to these secrets.
Steve, what is your use case? You haven't described why you need this. This looks more like an RFE.
> The public names allow resiliency if someone sets up a proxy in front of one of those names.
Clayton, can you elaborate on what kind of proxy you want in front of image-registry.openshift-image-registry.svc?
Just want to report a use case (not the same as the one reported for this bug by Steve) while migrating the CI registry from a 311 cluster to 4.6. https://github.com/openshift/release/pull/14522/files#r548190491 The suggested workaround is to use the internal hostname of registry's svc. Migration is expected, you cannot use docker-registry.default.svc:5000 either. If you want to use an external name, you should treat it as an external registry. How is this an RFE if it's a regression over previous behavior? I want to be able to use the internal or external hostname to refer to the registry. They're identical. We've built up an enormous amount of nonsense automation to re-write secrets including the external hostname to deal with this. > They're identical.
Except that the traffic for the external hostname goes through the load balancer and the router.
Not every 3.11 feature is supported by 4.x. Storage quota is gone, Alibaba storage is gone, etc. That can return, if somebody asks for it. But so far we removed external hostnames to protect you from creating configurations that are hard to migrate from one cluster to another cluster. It protects you from using external load balancers when they are not needed. I guess you have a reason why it's better to use external names in your case, but you haven't told it yet.
As we already have 5 big releases without this feature, so I don't consider it's to be a regression. I'm OK to make an option in 4.8 (or in a version that our PM selects) that will enable the behavior that you want.
Using a known, functional external hostname makes the migration easier, not harder. Since this works and is valid in 3.x and not functional in 4.x, this is a regression in the product that will break user workloads that expect it to continue working. We should reinstate this - if not by default, at least as opt-in. This is a regression in the product. The design of the image stream public field is that the public hostname can be pulled by pods. It must be fixed. It should be on by default. There is no downside to having this on by default. It works with the open PR with below results: $ oc extract secrets/default-dockercfg-dbn69 --to=- | jq 'keys' # .dockercfg [ "172.30.148.19:5000", "default-route-openshift-image-registry.apps.ci-ln-2tc5v1t-f76d1.origin-ci-int-gce.dev.openshift.com", "image-registry.openshift-image-registry.svc.cluster.local:5000", "image-registry.openshift-image-registry.svc:5000" ] zhengwenjings-MacBook-Pro:4.0 wzheng$ oc get routes NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD default-route default-route-openshift-image-registry.apps.ci-ln-2tc5v1t-f76d1.origin-ci-int-gce.dev.openshift.com image-registry <all> reencrypt None myregistry registry.ci.openshift.org image-registry <all> reencrypt None $ oc extract secrets/default-dockercfg-82d4h --to=- | jq 'keys' # .dockercfg [ "172.30.67.124:5000", "default-route-openshift-image-registry.apps.wxj-c2s32.govcloudemu.devcluster.openshift.com", "image-registry.openshift-image-registry.svc.cluster.local:5000", "image-registry.openshift-image-registry.svc:5000" ] [wzheng@preserve-docker-slave 4.8]$ oc get routes Unable to connect to the server: Service Unavailable [wzheng@preserve-docker-slave 4.8]$ oc get routes NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD default-route default-route-openshift-image-registry.apps.wxj-c2s32.govcloudemu.devcluster.openshift.com image-registry <all> reencrypt None Verified on 4.8.0-0.nightly-2021-03-01-143026. Thanks for the fix. oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.6 True False 25m Cluster version is 4.7.6 oc get secret default-dockercfg-ftz8j -o yaml | yq -r '.data.".dockercfg"' | base64 -d | jq -r '.|keys[]' 172.30.49.128:5000 default-route-openshift-image-registry.apps.ci.l2s4.p1.openshiftapps.com image-registry.openshift-image-registry.svc.cluster.local:5000 image-registry.openshift-image-registry.svc:5000 registry.ci.openshift.org Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |