Bug 2069976 - Error with oc-mirror using an old outdated unsupported oc client in AI
Summary: Error with oc-mirror using an old outdated unsupported oc client in AI
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: Infrastructure Operator
Version: rhacm-2.5
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: rhacm-2.6
Assignee: Eran Cohen
QA Contact: Chad Crum
Derek
URL:
Whiteboard:
Depends On: 1823143
Blocks: 2072879
TreeView+ depends on / blocked
 
Reported: 2022-03-30 07:55 UTC by Alberto Morgante Medina
Modified: 2022-10-03 20:22 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-10-03 20:22:34 UTC
Target Upstream Version:
Embargoed:
ercohen: Blocker+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift assisted-service pull 3598 0 None open Bug 2069976: Error with oc-mirror using an old outdated unsupported oc client in AI 2022-03-30 17:30:36 UTC
Github openshift assisted-service pull 3754 0 None open Bug 2069976: oc client in AI failed to resolve MCO image when working with oc-mirror 2022-05-03 11:41:03 UTC
Github stolostron backlog issues 21277 0 None None None 2022-03-30 12:35:46 UTC
Red Hat Bugzilla 1823143 1 high CLOSED oc adm release extract --command, --tools doesn't pull from localregistry when given a localregistry/image 2024-03-25 15:49:06 UTC
Red Hat Issue Tracker MGMTBUGSM-256 0 None None None 2022-03-30 11:34:36 UTC

Description Alberto Morgante Medina 2022-03-30 07:55:57 UTC
Description of problem:
When we're using oc-mirror to mirroring olm and ocp releases, the mirror is done successfully and the icsp auto-generated are splitted in 2 namespaces (release and release-images):
```
---
apiVersion: operator.openshift.io/v1alpha1
kind: ImageContentSourcePolicy
metadata:
  name: release-0
spec:
  repositoryDigestMirrors:
  - mirrors:
    - kubeframe-registry-kubeframe-registry.apps.test-ci.alklabs.com/olm/openshift/release
    source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
  - mirrors:
    - kubeframe-registry-kubeframe-registry.apps.test-ci.alklabs.com/olm/openshift/release-images
    source: quay.io/openshift-release-dev/ocp-release
```

During the Agent installation, the agent logs show this error:
```
Failed to fetch container images needed for installation from
      ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images@sha256:051c76b923d9826bd577aeb1e176a4096a9af47e9a7c1819158282a5a417170b,ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images@sha256:eae57963e5746607da7ce8070275abd301ab420ce767706d5b36972f64a0e450'
```

From the agent node:
The image the validation did manage to pull:
```
[root@localhost core]# podman pull ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images:4.9.13-x86_64
Trying to pull ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images:4.9.13-x86_64...
Getting image source signatures
Copying blob d2d8e7d185cc skipped: already exists  
Copying blob 47aa3ed2034c skipped: already exists  
Copying blob eac1b95df832 skipped: already exists  
Copying blob 24e8d2e218b0 [--------------------------------------] 0.0b / 0.0b
Copying blob f43eacb86576 [--------------------------------------] 0.0b / 0.0b
Copying blob c67baba1e9cb [--------------------------------------] 0.0b / 0.0b
Copying config ac96e0ee77 done  
Writing manifest to image destination
Storing signatures
ac96e0ee77a3534df5707c4e42453c3ded8ce83bb9ccb39e6c31705aed73739a
```

The image that failed the validation:
```
[root@localhost core]# podman pull ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images@sha256:051c76b923d9826bd577aeb1e176a4096a9af47e9a7c1819158282a5a417170b
Trying to pull ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images@sha256:051c76b923d9826bd577aeb1e176a4096a9af47e9a7c1819158282a5a417170b...
Error: Error initializing source docker://ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images@sha256:051c76b923d9826bd577aeb1e176a4096a9af47e9a7c1819158282a5a417170b: Error reading manifest sha256:051c76b923d9826bd577aeb1e176a4096a9af47e9a7c1819158282a5a417170b in ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images: manifest unknown: manifest unknown
```

This is what we have on the node:
```
[root@localhost core]# cat /etc/containers/registries.conf
unqualified-search-registries = ["registry.access.redhat.com", "docker.io", "ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com"]
[[registry]]
  prefix = ""
  location = "quay.io/openshift-release-dev/ocp-release"
  mirror-by-digest-only = true
  [[registry.mirror]]
    location = "ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images"
[[registry]]
  prefix = ""
  location = "quay.io/openshift-release-dev/ocp-v4.0-art-dev"
  mirror-by-digest-only = true
  [[registry.mirror]]
    location = "ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release"
[[registry]]
  prefix = ""
  location = "quay.io/jparrill/registry"
  mirror-by-digest-only = false
  [[registry.mirror]]
    location = "ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/jparrill/registry"
```

This is the only tag in release-images:
```
[root@localhost core]# curl -uYYY:XXXXX -sk https://ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/v2/olm/openshift/release-images/tags/list?n=1000 | jq 
{
  "name": "olm/openshift/release-images",
  "tags": [
    "4.9.13-x86_64"
  ]
}

```

It seems that all the release images are actually here:
```
curl -uYYY:XXX -sk https://ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/v2/olm/openshift/release/tags/list?n=1000 | jq .tags | wc -l
143
```
Sooo, this works:
```
[root@localhost core]# podman pull ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release@sha256:051c76b923d9826bd577aeb1e176a4096a9af47e9a7c1819158282a5a417170b
Trying to pull ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release@sha256:051c76b923d9826bd577aeb1e176a4096a9af47e9a7c1819158282a5a417170b...
Getting image source signatures
Copying blob 47aa3ed2034c skipped: already exists  
Copying blob 24e8d2e218b0 skipped: already exists  
Copying blob f43eacb86576 skipped: already exists  
Copying blob eac1b95df832 skipped: already exists  
Copying blob 55006bb1219d done  
Copying config 28ea52b98c done  
Writing manifest to image destination
Storing signatures
28ea52b98c63aa5dd899d67bf267a3b7dd623f5a694b97a56793bb12597e2de9
```

I guess the issue is with how the assisted-installe decides what images to use
I don't get why assisted is telling the agent to pull the image from the mirror and not passing what it get from oc:
```
[root@localhost core]# oc adm release info --image-for=machine-config-operator ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images:4.9.13-x86_64  
quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:051c76b923d9826bd577aeb1e176a4096a9af47e9a7c1819158282a5a417170b
[root@localhost core]# podman pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:051c76b923d9826bd577aeb1e176a4096a9af47e9a7c1819158282a5a417170b
Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:051c76b923d9826bd577aeb1e176a4096a9af47e9a7c1819158282a5a417170b...
Getting image source signatures
Copying blob 55006bb1219d skipped: already exists  
Copying blob 47aa3ed2034c skipped: already exists  
Copying blob 24e8d2e218b0 skipped: already exists  
Copying blob f43eacb86576 skipped: already exists  
Copying blob eac1b95df832 [--------------------------------------] 0.0b / 0.0b
Copying config 28ea52b98c done  
Writing manifest to image destination
Storing signatures
28ea52b98c63aa5dd899d67bf267a3b7dd623f5a694b97a56793bb12597e2de9
```

The oc version in the assisted-service is returning the wrong image:
```
bash-4.4$ oc adm release info --registry-config=/data/ps --image-for=machine-config-operator ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images:4.9.13-x86_64       
ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images@sha256:051c76b923d9826bd577aeb1e176a4096a9af47e9a7c1819158282a5a417170b
```

Version-Release number of selected component (if applicable):

How reproducible:

Basically the same flow followed with the commands described in the description of the problem

Steps to Reproduce:
1. Mirror ocp release (w/e you want) with oc-mirror command to generate the icsp automatically splitted in 2 namespaces (shown before)
2. Deploy ACM + AI 
3. Deploy Spokes nodes using BMH workflow
4. The agent should show the same error trying to fetch the release image (if icsp are splitted in 2 like is doing oc-mirror)

Actual results:
```
Failed to fetch container images needed for installation from
      ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images@sha256:051c76b923d9826bd577aeb1e176a4096a9af47e9a7c1819158282a5a417170b,ztpfw-registry-ztpfw-registry.apps.test-ci.alklabs.com/olm/openshift/release-images@sha256:eae57963e5746607da7ce8070275abd301ab420ce767706d5b36972f64a0e450'
```
Expected results:

Should fetch the ocp release image in order to pass the validations which are required to start the spoke installation

Additional info:

Comment 1 Eran Cohen 2022-03-30 08:12:42 UTC
In essence, the issue is this:
1. The assisted-service is using an unsupported version of oc in order due to  https://bugzilla.redhat.com/show_bug.cgi?id=1823143 (oc extract isn't aware of ICSP)
2. The oc mirror command recently changed and now the release-images are mirrored to another namespace (different than the actual release image namespace e.g. `olm/openshift/release-images` vs `olm/openshift/release`)
3. The oc binary in the assisted isn't aware of that change and the oc image-for command is returning the wrong namespace.
So, once the actual fix for https://bugzilla.redhat.com/show_bug.cgi?id=1823143 get merged we can start using official oc and this new issue should be resolved as well.

Comment 2 Nelson Jean 2022-03-30 21:56:59 UTC
Hi @mfilanov, the OCP bug looks like it's targeting 4.11 so would we not fix this in a ACM 2.5.z stream instead of 2.6? Is there a workaround that should be documented for ACM 2.5?

Comment 3 Michael Filanov 2022-04-03 08:30:36 UTC
@njean Eran already opened a PR to fix it so it will be merged into 2.5

Comment 4 Eran Cohen 2022-04-06 11:12:03 UTC
While we wait for the actual fix in upstream oc https://github.com/openshift/oc/pull/829 to merge
I posted a PR with a temp fix https://github.com/openshift/assisted-service/pull/3598
I get's the job done but a pain to configure downstream.
If we can't get the upstream fix to merge in time for 2.5.1 we will deliver the temp fix.

About what to document, we should say that disconnected AI deployments require creating the mirror registry using `oc` and not `oc-mirror` (until this issue is resolved)

Comment 6 Eran Cohen 2022-04-07 08:07:08 UTC
Yes, let's deliver the temp fix to 2.5

Comment 7 Eran Cohen 2022-04-18 11:54:16 UTC
UPDATE: the actual fix is merged to assisted-service master - https://github.com/openshift/assisted-service/pull/3657
Since there is no official oc release with this code we are getting it from a nightly build - https://mirror.openshift.com/pub/openshift-v4/clients/ocp-dev-preview/4.11.0-0.nightly-2022-04-08-205307/openshift-client-linux.tar.gz
Since this is a change to the dockerfile and due to the lack of oc RPM in order to get this to the assisted-service downstream image we need to update the assisted-service image to use upstream oc client in ocm-2.5 - https://github.com/stolostron/backlog/issues/21629 

This issue shouldn't be on QA yet since it's not really backported (The change to the Dockerfile doesn't effect the downstream build)

Comment 9 Eran Cohen 2022-04-26 11:35:19 UTC
It seems that upstream oc doesn't have the fix, the temp fix isn't working either.
This issue will not be resolved in 2.5. we need to add documentation on how to mitigate it.
The user should create the mirror registry using `oc adm release mirror`
I'm not sure if we should support mirror created with oc-mirror and how

Comment 10 Nelson Jean 2022-04-27 17:34:36 UTC
Hi @ercohen , the latest comment in https://bugzilla.redhat.com/show_bug.cgi?id=1823143 suggests that bug is now fixed.  In your comment back on 3/30, Bug 1823143 was essentially a pre-req to resolve this particular issue but do you now have some other dependency on the oc upstream (ref to your comment above on 4/26) that prevents this issue from being resolved in the RHACM 2.5 release?

Is this a regression from earlier releases or a new problem?  

Thx.

Comment 12 Eran Cohen 2022-04-28 05:58:34 UTC
njean, here is some background:
Installing a cluster using the assisted service (AKA infrastructure operator) in a disconnected environment requires creating a mirror registry with the release payload.
In the past year the assisted service was using a custom build of oc client that enabled the installation in a disconnected env.
Why the assisted service needs oc?
 1. The assisted-service is using oc for extracting the installer binary, used for generating manifests and ignition configs oc adm release extract --command=openshift-install). 
 2. The assisted-service also use oc for getting the MCO image for the release (used for image-availability validation and for applying the ignition config on the live ISO)

Assisted-service is using a custom oc because upstream oc doesn't respect ICSP (Bug 1823143) so in disconnected environment the assisted-service pod is failing to get the installer binary and the installation can't start.
Note that this custom oc doesn't really enable ICSP but it makes assumptions about where to look for the mirrored image (and so far it worked for the assisted-service use case)

The bug we see in this BZ (Bug 2069976) is that the assisted-service oc is returning the wrong image for the MCO.
The issue relates to the fact that the mirror was created using the new oc-mirror tool, which splits the mirrored images to 2 repos release (for the release payload) and release-images (for the release image itself) - https://github.com/openshift/oc-mirror/pull/343

About a week ago Bug 1823143 moved to QE and I updated the assisted-service to use upstream oc (that had the PR that enables ICSP). 
I tested it on amorgant environment and it seemed that the issue described in this BZ is fixed.
But not long after it turned out that the oc update actually cause a regression in a real disconnected environment, so the oc update was reverted and the assisted-service is using the same old custom oc build.
Bug 1823143 also failed on WE validation and requires more work, once it's verified I'll update the assisted-service to use it (and test it better). 
So in short this bug isn't a regression, it's a new problem that creats a limitation about how the user should create the mirror registry for the installation.

Comment 13 Nelson Jean 2022-04-29 14:22:00 UTC
@ercohen , thank-you for the excellent background and summary. RHACM 2.5 will code freeze May 12th, so that's the remaining window to fix anything. Do you think you'd be able to update your instance of oc to adapt to the new way images are mirrored, or is there some workaround that needs to be doc'd?

Comment 14 Eran Cohen 2022-05-03 12:05:04 UTC
I'll try to update the custom oc the assisted use to adapt to the new way images are mirrored, I'll update once I have something working (that isn't causing regression)

Comment 15 Eran Cohen 2022-05-04 09:50:26 UTC
I created a patch that should solve this issue and it pass the disconnected regression tests.
The code is merged to master and I created a merge request to update the downstream dockerfile.
I'll move the BZ to QE once it's part of downstream acm

Comment 16 Nelson Jean 2022-05-04 15:45:21 UTC
Thanks Eran!

Comment 17 Chad Crum 2022-05-05 18:22:48 UTC
Validated fix using MCE ds snapshot 2.0.0-DOWNANDBACK-2022-05-04-15-49-55

Validated both `oc adm release mirror` and `oc-mirror` method of mirroring ocp releases is working (Deployed spoke node from each successfully)

############################
# oc-mirror validation steps
############################
## Mirrored latest stable 4.10 release using latest oc-mirror (built from master source)

(1) Create imageset for oc-mirror
  cat /root/oc-mirror_working/oc-mirror/imageset.yml 
  apiVersion: mirror.openshift.io/v1alpha2
  kind: ImageSetConfiguration
  mirror:
    platform:
	  channels:
	    - name: stable-4.10

(2) Mirror release to local registry with oc-mirror
  ./bin/oc-mirror --dir=/var/tmp/build  --config imageset.yml  --dest-skip-tls docker://registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/latest

(3) Apply oc-mirror generated icsp (example is below)
  cat imageContentSourcePolicy.yaml 
  ---
  apiVersion: operator.openshift.io/v1alpha1
  kind: ImageContentSourcePolicy
  metadata:
    name: release-0
  spec:
    repositoryDigestMirrors:
    - mirrors:
      - registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/latest/openshift/release-images
      source: quay.io/openshift-release-dev/ocp-release
    - mirrors:
      - registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/latest/openshift/release
      source: quay.io/openshift-release-dev/ocp-v4.0-art-dev

(4) Update 4.10 clusterimageset to use mirrored release (release url is displayed in oc-mirror output )
  registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/latest/openshift/release-images:4.10.11-x86_64

(5) Remove image-policy-0 icsp if it exists as it will conflict

(6) Edit assisted service mirror-registry configmap
  - Remove entries that match image-policy-0 
  - Add entries that match oc-mirror generated icsp

(7) Delete assisted-service pod to recreate it (Possibly not necessary)

(8) Deployed spoke cluster successfully using 4.10 clusterimageset (or whichever clusterimageset contains the oc-mirror mirrored release)


(9) Also confirmed -
  # Assisted internal oc adm release info can return a correct release payload image
  [root@sealusa11 ~]# oc get clusterimageset                                                                                                                                                                                                                                      
  NAME   RELEASE                                                                                                                                                                                                                                                                  
  4.10   registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/latest/openshift/release-images:4.10.11-
                                                                                                                                                          
  [root@sealusa11 ~]# oc rsh assisted-service-66bfdf6785-9h5pv                                                                                                                                                                                                                    
    Defaulted container "assisted-service" out of: assisted-service, postgres                                                                                                                                                                                                       
    sh-4.4$ oc adm release info --image-for=machine-config-operator registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/latest/openshift/release-images:4.10.11-x86_64                                                                                                      

  registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/latest/openshift/release@sha256:8e98944cb0a7fe849f310964aadac4c06afb577dd87ce62d0b609102a32ff05b


  # Release payload image can be pulled with podman
  podman pull registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/latest/openshift/release@sha256:8e98944cb0a7fe849f310964aadac4c06afb577dd87ce62d0b609102a32ff05b
  Trying to pull registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/latest/openshift/release@sha256:8e98944cb0a7fe849f310964aadac4c06afb577dd87ce62d0b609102a32ff05b...
  Getting image source signatures
  Copying blob 39382676eb30 done
  Copying blob 9ef334919d82 [==================>-------------------] 8.4MiB / 16.5MiB
  Copying blob 237bfbffb5f2 [===>----------------------------------] 8.3MiB / 79.5MiB


Note You need to log in before you can comment on or make changes to this bug.