Bug 2308446

Summary: Disconnected deployment is failing because rook-ceph-operator image is referenced via version tag instead of checksum
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Daniel Horák <dahorak>
Component: buildAssignee: Nikhil Ladha <nladha>
Status: CLOSED ERRATA QA Contact: Daniel Horák <dahorak>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.17CC: kramdoss, muagarwa, nladha, odf-bz-bot
Target Milestone: ---Keywords: Regression
Target Release: ODF 4.17.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.17.0-94 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-10-30 14:32:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Daniel Horák 2024-08-29 06:31:33 UTC
Description of problem (please be detailed as possible and provide log
snippests):

  Disconnected deployment from the latest version is failing, because
  rook-ceph-operator-* pod is stuck in ImagePullBackOff state.
  The reason seems to be the fact, that rook-ceph-operator image is referenced
  via version tag instead of checksum

  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  $ oc get deployment -n openshift-storage rook-ceph-operator -o yaml | grep image:
        image: docker.io/rook/ceph:v1.15.0
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Version of all relevant components (if applicable):
  OCP Version:
    Client Version: 4.17.0-0.nightly-2024-08-28-171119
    Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
    Server Version: 4.17.0-0.nightly-2024-08-28-171119
    Kubernetes Version: v1.30.3

  ODF Version:
    ocs-registry:4.17.0-87


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
  ODF Deployment on disconnected cluster is not possible.


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
  3


Can this issue reproducible?
  Yes


Can this issue reproduce from the UI?
  N/A


If this is a regression, please provide more details to justify this:
  Yes, disconnected deployment was working in previous versions.


Steps to Reproduce:
1. Prepare environment for disconnected deployment (mirror required images, etc.)
2. Start ODF deployment on the disconnected environment.
3. Check status of rook-ceph-operator CSV and rook-ceph-operator pod.
4. Check the image definition for rook-ceph-operator.


Actual results:
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  $ oc get csv -n openshift-storage rook-ceph-operator.v4.17.0-87.stable
  NAME                                   DISPLAY     VERSION            REPLACES   PHASE
  rook-ceph-operator.v4.17.0-87.stable   Rook-Ceph   4.17.0-87.stable              Failed

  $ oc get pod -n openshift-storage --selector app=rook-ceph-operator
  NAME                                  READY   STATUS             RESTARTS   AGE
  rook-ceph-operator-7b67976c46-d4j78   0/1     ImagePullBackOff   0          59m

  $ oc get deployment -n openshift-storage rook-ceph-operator -o yaml | grep image:
          image: docker.io/rook/ceph:v1.15.0
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expected results:
  Rook-ceph-operator CSV is in Succeeded, all pods are running
  (rook-ceph-operator image is referenced via checksum).

Additional info:

Comment 3 Nikhil Ladha 2024-09-04 07:33:09 UTC
Hi,

This started to happen recently due to the change in the upstream rook code to include registry names in the images[1], as a result it surfaced a hidden bug in the csv-gen script[2] of the rook-ceph-operator bundle that is responsible for updating the images correctly in the csv for downstream codebase.

I will be sending a PR soon to fix this issue, and once we have a new build with the fix please use that.

1. https://github.com/rook/rook/pull/14550
2. https://github.com/red-hat-storage/rook/blob/master/build/csv/csv-gen.sh

Thanks
Nikhil

Comment 7 errata-xmlrpc 2024-10-30 14:32:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.17.0 Security, Enhancement, & Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:8676