Bug 2036018 - ROOK_CSI_* overrides missing from the CSV in 4.10
Summary: ROOK_CSI_* overrides missing from the CSV in 4.10
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.10
Hardware: All
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ODF 4.10.0
Assignee: Jose A. Rivera
QA Contact: Jilju Joy
URL:
Whiteboard:
: 2037718 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-29 10:20 UTC by Aditi
Modified: 2023-08-09 17:00 UTC (History)
17 users (show)

Fixed In Version: ocs-registry:4.10.0-70
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-04-13 18:50:46 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2022:1372 0 None None None 2022-04-13 18:53:23 UTC

Description Aditi 2021-12-29 10:20:28 UTC
Description of problem: ODF 4.10 deployment fails with all PVCs in openshift-monitoring namespace stuck in pending state.

[aditi@nx142 4.10-4]$ oc get pvc -n openshift-monitoring
NAME                                        STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
my-alertmanager-claim-alertmanager-main-0   Pending                                      ocs-storagecluster-ceph-rbd   26m
my-alertmanager-claim-alertmanager-main-1   Pending                                      ocs-storagecluster-ceph-rbd   26m
my-prometheus-claim-prometheus-k8s-0        Pending                                      ocs-storagecluster-ceph-rbd   26m
my-prometheus-claim-prometheus-k8s-1        Pending                                      ocs-storagecluster-ceph-rbd   26m
[aditi@nx142 4.10-4]$

Found the  pod - rook-ceph-csi-detect-version also stuck in pending state as it is not able to pull quay.io/cephcsi/cephcsi:v3.4.0 for ppc64le.

  Normal   Pulling         3m33s (x4 over 5m6s)  kubelet            Pulling image "quay.io/cephcsi/cephcsi:v3.4.0"
  Warning  Failed          3m30s (x4 over 5m3s)  kubelet            Failed to pull image "quay.io/cephcsi/cephcsi:v3.4.0": rpc error: code = Unknown desc = choosing image instance: no image found in manifest list for architecture ppc64le, variant "", OS linux
  Warning  Failed          3m30s (x4 over 5m3s)  kubelet            Error: ErrImagePull




Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 4 Boris Ranto 2022-01-04 07:17:33 UTC
I took a look here and this is actually an issue with the ocs-operator repo that no longer generates the ROOK_CSI_* overrides in the CSV for some reason so we are using upstream images (which are built for amd64 only) in our downstream builds.

Just to reiterate -- this is not just cephcsi image missing the override, there is no override for any of the sidecar images in the ocs-operator-bundle CSV in ODF 4.10.

Retargetting.

Comment 7 Boris Ranto 2022-01-04 07:47:41 UTC
Looking at the git log, this has been the case at least since 4.10.0-32 which we built on December 7th. The build 4.10.0-29 still had the ROOK_CSI_* overrides in the CSV so I suspect the regression to occur somewhere around that time.

The commit-wise this is:

2667179611d88ec00bf136b74a9f424e9d76be1d -- working
bf0946e7b4d47b799d209c491f0fcfbd71a4295c -- broken

I can only see one commit in that range and it does not seem to be related to the CSV generation. I suspect this started occurring because of changes in rook around that time.

Comment 8 Subham Rai 2022-01-04 08:10:05 UTC
The changes in this https://github.com/red-hat-storage/ocs-operator/commit/bf0946e7b4d47b799d209c491f0fcfbd71a4295c were required in SCC in Openshift. I don't idea about CSV generation but I don't think it should impact CSV.

Comment 9 umanga 2022-01-04 08:38:10 UTC
@branto We probably stopped adding env vars like "ROOK_CSI_SNAPSHOTTER_IMAGE" etc while generating CSV.
I tried generating CSV with master and release-4.9 branches and it's working fine for me. Can you check if something changed with downstream script?

Comment 10 umanga 2022-01-04 10:17:02 UTC
Testing the latest rook image, it looks like the issue is due to change of CSV name in Rook.
Which is causing environment variable injection to fail.

It's a quick fix. Providing devel_ack+ .

Comment 12 Sébastien Han 2022-01-06 17:10:41 UTC
I have fixed the issue in Rook and resynced 4.10 with latest rook v1.8 here https://github.com/red-hat-storage/rook/pull/325 yesterday.

Comment 13 umanga 2022-01-07 07:01:41 UTC
*** Bug 2037718 has been marked as a duplicate of this bug. ***

Comment 14 Aditi 2022-01-10 08:02:49 UTC
I am able to deploy ODF 4.10(4.10.0-73) now. Thanks

Comment 15 Jilju Joy 2022-01-11 08:13:35 UTC
Thanks Aditi for confirming.

Verified that downstream csi builds are present.
Verified in version:
ODF 4.10.0-79
OCP 4.10.0-0.nightly-2022-01-10-144202

Comment 20 errata-xmlrpc 2022-04-13 18:50:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1372

Comment 21 errata-xmlrpc 2022-04-13 18:53:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1372


Note You need to log in before you can comment on or make changes to this bug.