Bug 2008539 - Registry doesn't fall back to secondary ImageContentSourcePolicy Mirror
Summary: Registry doesn't fall back to secondary ImageContentSourcePolicy Mirror
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image Registry
Version: 4.9
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.10.0
Assignee: Oleg Bulatov
QA Contact: XiuJuan Wang
URL:
Whiteboard:
Depends On:
Blocks: 2029987
TreeView+ depends on / blocked
 
Reported: 2021-09-28 13:53 UTC by Manuel Dewald
Modified: 2022-10-26 00:35 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: the registry proxied response from the first available mirrored registry Consequence: when the mirror registry were available but didn't have the requested data, pull-through didn't try to use other mirrors, even if they had the needed data Fix: pull-through should try other registries if the first mirror replied with Not Found Result: pull-through can discover data if it exists on any mirror registry
Clone Of:
Environment:
Last Closed: 2022-03-10 16:13:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift image-registry pull 296 0 None open WIP Bug 2008539: Try another registry if blob is not found 2021-10-18 08:41:29 UTC
Github openshift library-go pull 1226 0 None open Bug 2008539: Try another registry if blob is not found 2021-11-16 09:12:24 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:14:14 UTC

Description Manuel Dewald 2021-09-28 13:53:51 UTC
Description of problem:
Depending on the order of mirrors in a imagecontentsourcepolicy, pulling an image fails or succeeds.

Version-Release number of selected component (if applicable): 4.9.0-rc.3


How reproducible:


Steps to Reproduce:
1. Create cluster with following imagecontentsources in install-config

```
imageContentSources:
- mirrors:
  - pull.q1w2.quay.rhcloud.com/openshift-release-dev/ocp-release
  - quay.io/openshift-release-dev/ocp-release
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - pull.q1w2.quay.rhcloud.com/openshift-release-dev/ocp-art-dev
  - quay.io/openshift-release-dev/ocp-v4.0-art-dev
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
- mirrors:
  - pull.q1w2.quay.rhcloud.com/app-sre/managed-upgrade-operator
  - quay.io/app-sre/managed-upgrade-operator
  source: quay.io/app-sre/managed-upgrade-operator
- mirrors:
  - pull.q1w2.quay.rhcloud.com/app-sre/managed-upgrade-operator-registry
  - quay.io/app-sre/managed-upgrade-operator-registry
  source: quay.io/app-sre/managed-upgrade-operator-registry
```

2. Create imagestream

```
apiVersion: image.openshift.io/v1
kind: ImageStream
metadata:
  annotations:
    include.release.openshift.io/ibm-cloud-managed: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
    openshift.io/image.dockerRepositoryCheck: "2021-09-28T07:52:13Z"
  name: cli
  namespace: openshift
spec:
  lookupPolicy:
    local: false
  tags:
  - annotations: null
    from:
      kind: DockerImage
      name: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eaabf06b1df3c1eb92d43d9c16681061b4bfd5c03e80a4c5aa5d4adf287811a4
    generation: 2
    importPolicy:
      scheduled: true
    name: latest
    referencePolicy:
      type: Source
```

3. create pod using the openshift/cli image

```
oc create deploy --image image-registry.openshift-image-registry.svc:5000/openshift/cli:latest cli-test
```

Actual results:

Pulling the image fails depending on the order of the mirrors.


Expected results:

Even if one of the mirrors doesn't work, the image gets pulled from the other mirror.


Additional info:

With the following imagecontentsource in install-config, it works:

```
imageContentSources:
- mirrors:
  - quay.io/openshift-release-dev/ocp-release
  - pull.q1w2.quay.rhcloud.com/openshift-release-dev/ocp-release
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - quay.io/openshift-release-dev/ocp-v4.0-art-dev
  - pull.q1w2.quay.rhcloud.com/openshift-release-dev/ocp-art-dev
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
- mirrors:
  - quay.io/app-sre/managed-upgrade-operator
  - pull.q1w2.quay.rhcloud.com/app-sre/managed-upgrade-operator
  source: quay.io/app-sre/managed-upgrade-operator
- mirrors:
  - quay.io/app-sre/managed-upgrade-operator-registry
  - pull.q1w2.quay.rhcloud.com/app-sre/managed-upgrade-operator-registry
  source: quay.io/app-sre/managed-upgrade-operator-registry
```

* This is observed as flaky behavior in the OSD e2e pipeline on 4.9, as the order of mirrors is shuffled: https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/osde2e-stage-aws-e2e-next-y
* At this point we're not sure why pulling from pull.q1w2.quay.rhcloud.com/openshift-release-dev/ocp-release fails, and it seems to work sometimes. However, even if it fails, registry should fall back to the second mirror.

Comment 1 Oleg Bulatov 2021-09-29 09:20:11 UTC
Can you attach must-gather or at least YAML for the image stream openshift/cli? If the image stream is successfully imported, then I also need registry logs.

Comment 2 Oleg Bulatov 2021-09-29 10:13:11 UTC
Example of a failed job with must-gather: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/osde2e-stage-aws-e2e-next-y/1443063209534689280

Comment 9 Oleg Bulatov 2021-10-15 14:18:29 UTC
This BZ will be a blocker for 4.10 and the fix should be backported to 4.9.z and 4.8.z.
An initial fix for librar-go's client is proposed at https://github.com/openshift/library-go/pull/1226.

Comment 12 XiuJuan Wang 2021-12-02 04:16:21 UTC
Verified this image on 
4.10.0-0.nightly-2021-12-01-210213 

  spec:
    repositoryDigestMirrors:
    - mirrors:
      - pull.q1w2.quay.rhcloud.com/openshift-release-dev/ocp-release
      - pull.q1w3.quay.rhcloud.com/openshift-release-dev/ocp-release
      - pull.q1w4.quay.rhcloud.com/openshift-release-dev/ocp-release
      - quay.io/openshift-release-dev/ocp-release
      source: quay.io/openshift-release-dev/ocp-release
    - mirrors:
      - pull.q1w2.quay.rhcloud.com/openshift-release-dev/ocp-art-dev
      - pull.q1w3.quay.rhcloud.com/openshift-release-dev/ocp-release
      - pull.q1w4.quay.rhcloud.com/openshift-release-dev/ocp-release
      - quay.io/openshift-release-dev/ocp-v4.0-art-dev
      source: quay.io/openshift-release-dev/ocp-v4.0-art-dev

$oc create deploy --image image-registry.openshift-image-registry.svc:5000/openshift/cli:latest cli-test
deployment.apps/cli-test created

  Normal   Pulled          24s                kubelet            Successfully pulled image "image-registry.openshift-image-registry.svc:5000/openshift/cli:latest" in 52.76675ms

Comment 17 errata-xmlrpc 2022-03-10 16:13:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.