Bug 2008539

Summary: Registry doesn't fall back to secondary ImageContentSourcePolicy Mirror
Product: OpenShift Container Platform Reporter: Manuel Dewald <mdewald>
Component: Image RegistryAssignee: Oleg Bulatov <obulatov>
Status: CLOSED ERRATA QA Contact: XiuJuan Wang <xiuwang>
Severity: high Docs Contact:
Priority: high    
Version: 4.9CC: anaim, aos-bugs, calfonso, cblecker, dmurga, dofinn, jaharrin, nmalik, obulatov, pveiga, vlaad, wewang, wgordon, wking
Target Milestone: ---Keywords: ServiceDeliveryBlocker, ServiceDeliveryImpact
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: the registry proxied response from the first available mirrored registry Consequence: when the mirror registry were available but didn't have the requested data, pull-through didn't try to use other mirrors, even if they had the needed data Fix: pull-through should try other registries if the first mirror replied with Not Found Result: pull-through can discover data if it exists on any mirror registry
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 16:13:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2029987    

Description Manuel Dewald 2021-09-28 13:53:51 UTC
Description of problem:
Depending on the order of mirrors in a imagecontentsourcepolicy, pulling an image fails or succeeds.

Version-Release number of selected component (if applicable): 4.9.0-rc.3


How reproducible:


Steps to Reproduce:
1. Create cluster with following imagecontentsources in install-config

```
imageContentSources:
- mirrors:
  - pull.q1w2.quay.rhcloud.com/openshift-release-dev/ocp-release
  - quay.io/openshift-release-dev/ocp-release
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - pull.q1w2.quay.rhcloud.com/openshift-release-dev/ocp-art-dev
  - quay.io/openshift-release-dev/ocp-v4.0-art-dev
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
- mirrors:
  - pull.q1w2.quay.rhcloud.com/app-sre/managed-upgrade-operator
  - quay.io/app-sre/managed-upgrade-operator
  source: quay.io/app-sre/managed-upgrade-operator
- mirrors:
  - pull.q1w2.quay.rhcloud.com/app-sre/managed-upgrade-operator-registry
  - quay.io/app-sre/managed-upgrade-operator-registry
  source: quay.io/app-sre/managed-upgrade-operator-registry
```

2. Create imagestream

```
apiVersion: image.openshift.io/v1
kind: ImageStream
metadata:
  annotations:
    include.release.openshift.io/ibm-cloud-managed: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
    openshift.io/image.dockerRepositoryCheck: "2021-09-28T07:52:13Z"
  name: cli
  namespace: openshift
spec:
  lookupPolicy:
    local: false
  tags:
  - annotations: null
    from:
      kind: DockerImage
      name: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eaabf06b1df3c1eb92d43d9c16681061b4bfd5c03e80a4c5aa5d4adf287811a4
    generation: 2
    importPolicy:
      scheduled: true
    name: latest
    referencePolicy:
      type: Source
```

3. create pod using the openshift/cli image

```
oc create deploy --image image-registry.openshift-image-registry.svc:5000/openshift/cli:latest cli-test
```

Actual results:

Pulling the image fails depending on the order of the mirrors.


Expected results:

Even if one of the mirrors doesn't work, the image gets pulled from the other mirror.


Additional info:

With the following imagecontentsource in install-config, it works:

```
imageContentSources:
- mirrors:
  - quay.io/openshift-release-dev/ocp-release
  - pull.q1w2.quay.rhcloud.com/openshift-release-dev/ocp-release
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - quay.io/openshift-release-dev/ocp-v4.0-art-dev
  - pull.q1w2.quay.rhcloud.com/openshift-release-dev/ocp-art-dev
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
- mirrors:
  - quay.io/app-sre/managed-upgrade-operator
  - pull.q1w2.quay.rhcloud.com/app-sre/managed-upgrade-operator
  source: quay.io/app-sre/managed-upgrade-operator
- mirrors:
  - quay.io/app-sre/managed-upgrade-operator-registry
  - pull.q1w2.quay.rhcloud.com/app-sre/managed-upgrade-operator-registry
  source: quay.io/app-sre/managed-upgrade-operator-registry
```

* This is observed as flaky behavior in the OSD e2e pipeline on 4.9, as the order of mirrors is shuffled: https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/osde2e-stage-aws-e2e-next-y
* At this point we're not sure why pulling from pull.q1w2.quay.rhcloud.com/openshift-release-dev/ocp-release fails, and it seems to work sometimes. However, even if it fails, registry should fall back to the second mirror.

Comment 1 Oleg Bulatov 2021-09-29 09:20:11 UTC
Can you attach must-gather or at least YAML for the image stream openshift/cli? If the image stream is successfully imported, then I also need registry logs.

Comment 2 Oleg Bulatov 2021-09-29 10:13:11 UTC
Example of a failed job with must-gather: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/osde2e-stage-aws-e2e-next-y/1443063209534689280

Comment 9 Oleg Bulatov 2021-10-15 14:18:29 UTC
This BZ will be a blocker for 4.10 and the fix should be backported to 4.9.z and 4.8.z.
An initial fix for librar-go's client is proposed at https://github.com/openshift/library-go/pull/1226.

Comment 12 XiuJuan Wang 2021-12-02 04:16:21 UTC
Verified this image on 
4.10.0-0.nightly-2021-12-01-210213 

  spec:
    repositoryDigestMirrors:
    - mirrors:
      - pull.q1w2.quay.rhcloud.com/openshift-release-dev/ocp-release
      - pull.q1w3.quay.rhcloud.com/openshift-release-dev/ocp-release
      - pull.q1w4.quay.rhcloud.com/openshift-release-dev/ocp-release
      - quay.io/openshift-release-dev/ocp-release
      source: quay.io/openshift-release-dev/ocp-release
    - mirrors:
      - pull.q1w2.quay.rhcloud.com/openshift-release-dev/ocp-art-dev
      - pull.q1w3.quay.rhcloud.com/openshift-release-dev/ocp-release
      - pull.q1w4.quay.rhcloud.com/openshift-release-dev/ocp-release
      - quay.io/openshift-release-dev/ocp-v4.0-art-dev
      source: quay.io/openshift-release-dev/ocp-v4.0-art-dev

$oc create deploy --image image-registry.openshift-image-registry.svc:5000/openshift/cli:latest cli-test
deployment.apps/cli-test created

  Normal   Pulled          24s                kubelet            Successfully pulled image "image-registry.openshift-image-registry.svc:5000/openshift/cli:latest" in 52.76675ms

Comment 17 errata-xmlrpc 2022-03-10 16:13:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056