Bug 2308446 - Disconnected deployment is failing because rook-ceph-operator image is referenced via version tag instead of checksum
Summary: Disconnected deployment is failing because rook-ceph-operator image is refere...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: build
Version: 4.17
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.17.0
Assignee: Nikhil Ladha
QA Contact: Daniel Horák
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-08-29 06:31 UTC by Daniel Horák
Modified: 2024-10-30 14:32 UTC (History)
4 users (show)

Fixed In Version: 4.17.0-94
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-10-30 14:32:38 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage rook pull 719 0 None open Bug 2308446:[release-4.17] Fix csv-gen script to correctly replace ROOK_IMAGE env var 2024-09-04 10:07:36 UTC
Red Hat Issue Tracker OCSBZM-8889 0 None None None 2024-08-29 06:33:59 UTC
Red Hat Product Errata RHSA-2024:8676 0 None None None 2024-10-30 14:32:41 UTC

Description Daniel Horák 2024-08-29 06:31:33 UTC
Description of problem (please be detailed as possible and provide log
snippests):

  Disconnected deployment from the latest version is failing, because
  rook-ceph-operator-* pod is stuck in ImagePullBackOff state.
  The reason seems to be the fact, that rook-ceph-operator image is referenced
  via version tag instead of checksum

  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  $ oc get deployment -n openshift-storage rook-ceph-operator -o yaml | grep image:
        image: docker.io/rook/ceph:v1.15.0
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Version of all relevant components (if applicable):
  OCP Version:
    Client Version: 4.17.0-0.nightly-2024-08-28-171119
    Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
    Server Version: 4.17.0-0.nightly-2024-08-28-171119
    Kubernetes Version: v1.30.3

  ODF Version:
    ocs-registry:4.17.0-87


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
  ODF Deployment on disconnected cluster is not possible.


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
  3


Can this issue reproducible?
  Yes


Can this issue reproduce from the UI?
  N/A


If this is a regression, please provide more details to justify this:
  Yes, disconnected deployment was working in previous versions.


Steps to Reproduce:
1. Prepare environment for disconnected deployment (mirror required images, etc.)
2. Start ODF deployment on the disconnected environment.
3. Check status of rook-ceph-operator CSV and rook-ceph-operator pod.
4. Check the image definition for rook-ceph-operator.


Actual results:
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  $ oc get csv -n openshift-storage rook-ceph-operator.v4.17.0-87.stable
  NAME                                   DISPLAY     VERSION            REPLACES   PHASE
  rook-ceph-operator.v4.17.0-87.stable   Rook-Ceph   4.17.0-87.stable              Failed

  $ oc get pod -n openshift-storage --selector app=rook-ceph-operator
  NAME                                  READY   STATUS             RESTARTS   AGE
  rook-ceph-operator-7b67976c46-d4j78   0/1     ImagePullBackOff   0          59m

  $ oc get deployment -n openshift-storage rook-ceph-operator -o yaml | grep image:
          image: docker.io/rook/ceph:v1.15.0
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expected results:
  Rook-ceph-operator CSV is in Succeeded, all pods are running
  (rook-ceph-operator image is referenced via checksum).

Additional info:

Comment 3 Nikhil Ladha 2024-09-04 07:33:09 UTC
Hi,

This started to happen recently due to the change in the upstream rook code to include registry names in the images[1], as a result it surfaced a hidden bug in the csv-gen script[2] of the rook-ceph-operator bundle that is responsible for updating the images correctly in the csv for downstream codebase.

I will be sending a PR soon to fix this issue, and once we have a new build with the fix please use that.

1. https://github.com/rook/rook/pull/14550
2. https://github.com/red-hat-storage/rook/blob/master/build/csv/csv-gen.sh

Thanks
Nikhil

Comment 7 errata-xmlrpc 2024-10-30 14:32:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.17.0 Security, Enhancement, & Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:8676


Note You need to log in before you can comment on or make changes to this bug.