Bug 2211592 - [ODF 4.12] [GSS] unknown parameter name "FORCE_OSD_REMOVAL"
Summary: [ODF 4.12] [GSS] unknown parameter name "FORCE_OSD_REMOVAL"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.12
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ODF 4.12.5
Assignee: Malay Kumar parida
QA Contact: Elad
URL:
Whiteboard:
Depends On: 2143944 2211595
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-01 07:45 UTC by Malay Kumar parida
Modified: 2023-08-09 17:00 UTC (History)
9 users (show)

Fixed In Version: 4.12.5-1
Doc Type: No Doc Update
Doc Text:
Clone Of: 2143944
Environment:
Last Closed: 2023-07-26 16:57:57 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage ocs-operator pull 2069 0 None open Bug 2211592: [release-4.12] Always update the template parameters & objects to keep it up-to-date 2023-06-01 11:28:25 UTC
Red Hat Product Errata RHSA-2023:4287 0 None None None 2023-07-26 16:58:07 UTC

Description Malay Kumar parida 2023-06-01 07:45:31 UTC
+++ This bug was initially created as a clone of Bug #2143944 +++

Description of problem (please be detailed as possible and provide log
snippests):

When the customer tries to replace a osd the command gives this error

$ oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=${osd_id_to_remove} -p FORCE_OSD_REMOVAL=true |oc create -n openshift-storage -f -
error: unknown parameter name "FORCE_OSD_REMOVAL"
error: no objects passed to create


Version of all relevant components (if applicable):

ODF 4.9 and ODF 4.10

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

No

Is there any workaround available to the best of your knowledge?

Yes, delete the template ocs-osd-removal forces it to reconcile and the option appears


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

2


Can this issue reproducible?

Yes, installing an ODF version previous to 4.9.11 and upgrade it, the template is not updated so the option is not added to it.

Can this issue reproduce from the UI?

Steps to Reproduce:
1. Install ODF in a version previous to 4.9.11
2. Upgrade releases 
3. Try to replace an osd or review the template in a version above to 4.9.11


Actual results:

Template doesn’t have the option so the command fails


Expected results:

Command working


Additional info:

--- Additional comment from RHEL Program Management on 2022-11-18 12:58:56 UTC ---

This bug having no release flag set previously, is now set with release flag 'odf‑4.12.0' to '?', and so is being proposed to be fixed at the ODF 4.12.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag.

--- Additional comment from amansan on 2022-11-18 13:19:31 UTC ---

Good afternoon,

This bug has been opened due the email thread in ocs-tech-list

[ODF 4.10][OSD Replacement][Sev 4][Case#03320642] Option FORCE_OSD_REMOVAL doesn't appear in ocs-osd-removal template

both customer has solved the issue, 

- the customer on case 03320642 deleted the template ocs-osd-removal as we were talking on the thread and it worked
- the customer on case 03363378 is using a disconnected env so they did the follow

The pod cannot start as it still referers to an (old)OCS image that is no longer available being synced when syncing images for the disconnected operators images. 

The image the template was reffering to was:
registry.redhat.io/ocs4/rook-ceph-rhel8-operator@sha256:d964d5675d70e7b4b0dae1cab78d717005bbf8cae25613d553d201749da4d5ac

I changed it to the image of the deployment/rook-ceph-operator, and it worked fine.

Can you please correct the issue to be sure the template is reconcile during the upgrade?

Thanks and regards,

Alicia

--- Additional comment from Mudit Agarwal on 2022-11-18 13:30:07 UTC ---

Sure, we will work on it.
It is not a blocker for 4.12.0, moving it to 4.13. Will backport once the fix is available there.

--- Additional comment from amansan on 2022-11-22 07:26:44 UTC ---

Good morning,

This is regarding the case in the disconnect environment

We are running ODF 4.10.7 -> but it happens when upgrading from OCS to ODF.

Will there also be a bug for the wrong image in the template?  <-- is my understand the root cause is the same so it will be manage on this bug but let me know if you think you need a different bug for this and I will open for you

=== this question comes from this comment 

The pod cannot start as it still referers to an (old)OCS image that is no longer available being synced when syncing images for the disconnected operators images. 

The image the template was reffering to was:

registry.redhat.io/ocs4/rook-ceph-rhel8-operator@sha256:d964d5675d70e7b4b0dae1cab78d717005bbf8cae25613d553d201749da4d5ac

I changed it to the image of the deployment/rook-ceph-operator, and it worked fine.

====

The image used in the template in ODF version 4.10.7:

registry.redhat.io/ocs4/rook-ceph-rhel8-operator@sha256:d964d5675d70e7b4b0dae1cab78d717005bbf8cae25613d553d201749da4d5ac

I assume it should be odf4 since ocs4 image is no longer available in a disconnected environment (we clean up our registry often):

registry.redhat.io/odf4/rook-ceph-rhel8-operator@sha256:7a2ae2b9ed06b6f529e2fa72cf7221725f395849dd3fb657f41a376a06f3d1e7

Regards,

Alicia

--- Additional comment from Malay Kumar parida on 2022-11-22 13:09:43 UTC ---

I think let's keep this one bug only as I also think the root cause is the same for them

--- Additional comment from Red Hat Bugzilla on 2022-12-31 19:35:13 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2022-12-31 22:33:08 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2022-12-31 22:37:05 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 08:43:40 UTC ---

Account disabled by LDAP Audit

--- Additional comment from Red Hat Bugzilla on 2023-01-31 23:38:11 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-31 23:40:26 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from amansan on 2023-02-06 07:20:53 UTC ---

Good morning,

Have you had time to take a look at this bug?

Regards,

Alicia

--- Additional comment from Malay Kumar parida on 2023-02-14 05:20:12 UTC ---

Hi Alicia, Running a little busy due to the Feature Development cycle for 4.13 as just a couple of weeks are left. But I can assure you this bug is on my radar & I have already made some investigations into the root cause & I expect to look at it more deeply after the feature freeze for 4.13 which is on Feb 28. If there is some customer dependency or waiting on the issue please do let me know I can move things around, in that case, to have prioritized attention on this.

--- Additional comment from amansan on 2023-02-17 07:36:53 UTC ---

Hi Malay,

Ok, thanks. I´ll wait your news.

Regards,

Alicia

--- Additional comment from RHEL Program Management on 2023-03-30 15:40:59 UTC ---

This BZ is being approved for ODF 4.13.0 release, upon receipt of the 3 ACKs (PM,Devel,QA) for the release flag 'odf‑4.13.0

--- Additional comment from RHEL Program Management on 2023-03-30 15:40:59 UTC ---

Since this bug has been approved for ODF 4.13.0 release, through release flag 'odf-4.13.0+', the Target Release is being set to 'ODF 4.13.0

--- Additional comment from errata-xmlrpc on 2023-04-04 12:59:27 UTC ---

This bug has been added to advisory RHBA-2023:108078 by Boris Ranto (branto)

--- Additional comment from errata-xmlrpc on 2023-04-04 13:00:11 UTC ---

Bug report changed to ON_QA status by Errata System.
A QE request has been submitted for advisory RHBA-2023:108078-01
https://errata.devel.redhat.com/advisory/108078

--- Additional comment from Sunil Kumar Acharya on 2023-04-10 12:22:18 UTC ---

Please have the RDT(requires_doc_text) flag/text updated accordingly.

--- Additional comment from amansan on 2023-04-11 10:08:33 UTC ---

Hi Malay,

I was reading 

https://github.com/red-hat-storage/ocs-operator/pull/1959

my understand is that the template has been updated for the versions? I´m doubting because version 4.13 is marked, can you please confirm it to me?

Thanks, 

Alicia

--- Additional comment from Malay Kumar parida on 2023-04-17 04:56:56 UTC ---

Hi Alicia, Basically earlier when the template was created once it was not getting updated afterwards. Which was creating problem.
For ex, if someone installs odf 4.10 then the template is created at that time with a rook-ceph-image in the template job spec. Later on customer goes to upgrade odf from 4.10 to 4.11, 4.11 to 4.12 and so on. But as the template was not reconciled, the rook-ceph-image on the template job spec will remain the old one (4.10 one in this case) even though you are now on some newer version of odf like may be 4.12. 

With this fix the template will get reconciled, So the rook ceph image on the template job spec will remain the correct one.

--- Additional comment from Red Hat Bugzilla on 2023-04-19 14:10:58 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from amansan on 2023-04-21 06:25:17 UTC ---

Hi Malay,

Thanks so much for the answer.

Regards,

Alicia

--- Additional comment from Itzhak on 2023-05-29 11:22:14 UTC ---

What should be the updated steps? should we try to update from 4.12 to 4.13? Or just deploy a cluster with 4.13 and execute the command: 
$ oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=${osd_id_to_remove} -p FORCE_OSD_REMOVAL=true |oc create -n openshift-storage -f -

Comment 8 Itzhak 2023-07-19 17:53:41 UTC
I tested the BZ with a vSphere cluster with OCP4.10 and ODF 4.9.10(lower than 4.9.11).

I performed the following steps:

1. Checked the ocs osd removal job command, which resulted in the expected error: 
$ oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=0 -p FORCE_OSD_REMOVAL=false |oc create -n openshift-storage -f -
error: unknown parameter name "FORCE_OSD_REMOVAL"
error: no objects passed to create

2. Upgrade the ODF from 4.9 to 4.10.
3. Check again ocs osd removal job command, which shows the expected output: 
$ oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=0 -p FORCE_OSD_REMOVAL=false |oc create -n openshift-storage -f -
job.batch/ocs-osd-removal-job created
$ oc get jobs ocs-osd-removal-job 
NAME                  COMPLETIONS   DURATION   AGE
ocs-osd-removal-job   1/1           32s        136m

4. Upgrade the OCP version from 4.10 to 4.11.
5. Upgrade the ODF from 4.10 to 4.11.

6. Check again ocs osd removal job command, which shows the expected output: 
$ oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=0 -p FORCE_OSD_REMOVAL=false |oc create -n openshift-storage -f -
Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "operator" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "operator" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "operator" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "operator" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
job.batch/ocs-osd-removal-job created
$ oc get jobs ocs-osd-removal-job 
NAME                  COMPLETIONS   DURATION   AGE
ocs-osd-removal-job   1/1           7s         22s

7. Upgrade the OCP from version 4.11 to 4.12.
8. Upgrade the ODF from 4.11 to 4.12.

9. Check again ocs osd removal job command, which shows the expected output: 
$ oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=0 -p FORCE_OSD_REMOVAL=false |oc create -n openshift-storage -f -
job.batch/ocs-osd-removal-job created
$ oc get jobs ocs-osd-removal-job 
NAME                  COMPLETIONS   DURATION   AGE
ocs-osd-removal-job   1/1           7s         12s

Versions:

OC version:
Client Version: 4.10.24
Server Version: 4.12.0-0.nightly-2023-07-15-021657
Kubernetes Version: v1.25.11+1485cc9

OCS version:
ocs-operator.v4.12.5-rhodf              OpenShift Container Storage   4.12.5-rhodf   ocs-operator.v4.11.9              Succeeded

Cluster version
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2023-07-15-021657   True        False         51m     Cluster version is 4.12.0-0.nightly-2023-07-15-021657

Rook version:
rook: v4.12.5-0.bc1e9806c3281090b58872e303e947ff5437c078
go: go1.18.10

Ceph version:
ceph version 16.2.10-172.el8cp (00a157ecd158911ece116ae43095de793ed9f389) pacific (stable)

Comment 12 errata-xmlrpc 2023-07-26 16:57:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.12.5 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:4287


Note You need to log in before you can comment on or make changes to this bug.