Bug 2083299
| Summary: | SRO does not fetch mirrored DTK images in disconnected clusters | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Quentin Barrand <quba> | |
| Component: | Special Resource Operator | Assignee: | Yoni Bettan <ybettan> | |
| Status: | CLOSED ERRATA | QA Contact: | Constantin Vultur <cvultur> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 4.10 | CC: | bblock, bthurber, bzvonar, grajaiya, keyoung, mamccoma, mlammon, ybettan, yliu1, yshnaidm | |
| Target Milestone: | --- | Keywords: | TestBlocker | |
| Target Release: | 4.11.0 | Flags: | ybettan:
needinfo-
|
|
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | No Doc Update | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2100039 (view as bug list) | Environment: | ||
| Last Closed: | 2022-08-10 11:10:43 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2100039 | |||
|
Description
Quentin Barrand
2022-05-09 15:59:10 UTC
During today's troubleshooting session, we understood why SRO was failing when trying to pull the DTK image [1] and not the OpenShift Release Image [0], which happens beforehand. The OpenShift release image is public and can be pulled from quay.io without authentication, while DTK requires authentication. It appears that the cluster is in fact not really disconnected and can reach quay.io, which makes SRO able to public public images. However, it fails to pull DTK as no pull secret is configured for quay.io. We asked the Veritas team to verify their setup and make the cluster actually disconnected as we work on code changes to make SRO aware of image mirroring. [0] quay.io/openshift-release-dev/ocp-release@sha256:0696e249622b4d07d8f4501504b6c568ed6ba92416176a01a12b7f1882707117 [1] quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:52038f11156a3eac9700be73c6aef7121839727d93a8775c81e17de0ebd15732 There are 3 things to take into account when dealing with disconnected registry: registry cert, pull secret, and registries.conf where image content source policies were reflected. Quentin, could you please confirm all these will be resolved in this bz? (We encountered failure with registry cert, and would not duplicate bz if this is the place to address all above items) SRO already gets the pull secret. The fix for this BZ will mount the two remaining items you mentioned: registries.conf and custom CAs. Thanks for confirming! In another environment we also encountered the following error from SRO logs. My understanding is this BZ will resolve as well.
{"level":"error","ts":1652898031.5360885,"logger":"controller.specialresourcemodule","msg":"Reconciler error","reconciler group":"sro.openshift.io","reconciler kind":"SpecialResourceModule","name":"acm-ice","namespace":"","error":"failed to get OCP versions: could not get version info from image 'registry.ocp-edge-cluster-rdu2-0.qe.lab.redhat.com:5000/openshift-release-dev/ocp-release@sha256:5f6a8321f8abfa06209f7abe22a5c1fc72e3d1269941bdbfbf12ec1791905c4e': failed to get manifest's last layer for image 'registry.ocp-edge-cluster-rdu2-0.qe.lab.redhat.com:5000/openshift-release-dev/ocp-release@sha256:5f6a8321f8abfa06209f7abe22a5c1fc72e3d1269941bdbfbf12ec1791905c4e': failed to get layers digests of the image registry.ocp-edge-cluster-rdu2-0.qe.lab.redhat.com:5000/openshift-release-dev/ocp-release@sha256:5f6a8321f8abfa06209f7abe22a5c1fc72e3d1269941bdbfbf12ec1791905c4e: failed to get manifest stream from image registry.ocp-edge-cluster-rdu2-0.qe.lab.redhat.com:5000/openshift-release-dev/ocp-release@sha256:5f6a8321f8abfa06209f7abe22a5c1fc72e3d1269941bdbfbf12ec1791905c4e: failed to get crane manifest from image registry.ocp-edge-cluster-rdu2-0.qe.lab.redhat.com:5000/openshift-release-dev/ocp-release@sha256:5f6a8321f8abfa06209f7abe22a5c1fc72e3d1269941bdbfbf12ec1791905c4e: Get \"https://registry.ocp-edge-cluster-rdu2-0.qe.lab.redhat.com:5000/v2/\": x509: certificate signed by unknown authority","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}
Work ongoing in https://github.com/openshift/special-resource-operator/pull/212. Assigning @ybettan as I am going in PTO. Verified it with a bundle build in disconnected environment. Using custom examples of simple-kmod and acm-ice the build/daemonset worked fine: # oc get all -n simple-kmod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod/simple-kmod-driver-build-99249f257707c0c3-1-build 0/1 Completed 0 18h fd01:0:0:1::495 master-0-2.ocp-edge-cluster-hub-0.qe.lab.redhat.com <none> <none> pod/simple-kmod-driver-container-99249f257707c0c3-2ctkz 1/1 Running 0 18h fd01:0:0:3::9a master-0-0.ocp-edge-cluster-hub-0.qe.lab.redhat.com <none> <none> pod/simple-kmod-driver-container-99249f257707c0c3-9c9w5 1/1 Running 0 18h fd01:0:0:1::494 master-0-2.ocp-edge-cluster-hub-0.qe.lab.redhat.com <none> <none> pod/simple-kmod-driver-container-99249f257707c0c3-mvjzh 1/1 Running 0 18h fd01:0:0:2::87 master-0-1.ocp-edge-cluster-hub-0.qe.lab.redhat.com <none> <none> NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR daemonset.apps/simple-kmod-driver-container-99249f257707c0c3 3 3 3 3 3 feature.node.kubernetes.io/kernel-version.full=4.18.0-305.49.1.el8_4.x86_64,node-role.kubernetes.io/worker= 18h simple-kmod-driver-container image-registry.openshift-image-registry.svc:5000/simple-kmod/simple-kmod-driver-container:v4.18.0-305.49.1.el8_4.x86_64 app=simple-kmod-driver-container-99249f257707c0c3 NAME TYPE FROM LATEST buildconfig.build.openshift.io/simple-kmod-driver-build-99249f257707c0c3 Docker Dockerfile 1 NAME TYPE FROM STATUS STARTED DURATION build.build.openshift.io/simple-kmod-driver-build-99249f257707c0c3-1 Docker Dockerfile Complete 19 hours ago 1m36s NAME IMAGE REPOSITORY TAGS UPDATED imagestream.image.openshift.io/simple-kmod-driver-container image-registry.openshift-image-registry.svc:5000/simple-kmod/simple-kmod-driver-container v4.18.0-305.49.1.el8_4.x86_64 19 hours ago acm-ice example # oc get all -n acm-ice -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod/acm-ice-4-8-2-1-build 0/1 Completed 0 124m fd01:0:0:1::4de master-0-2.ocp-edge-cluster-hub-0.qe.lab.redhat.com <none> <none> NAME TYPE FROM LATEST buildconfig.build.openshift.io/acm-ice-4-8-2 Docker Dockerfile 1 NAME TYPE FROM STATUS STARTED DURATION build.build.openshift.io/acm-ice-4-8-2-1 Docker Dockerfile Complete 2 hours ago 4m12s Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days |