+++ This bug was initially created as a clone of Bug #2143944 +++ Description of problem (please be detailed as possible and provide log snippests): When the customer tries to replace a osd the command gives this error $ oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=${osd_id_to_remove} -p FORCE_OSD_REMOVAL=true |oc create -n openshift-storage -f - error: unknown parameter name "FORCE_OSD_REMOVAL" error: no objects passed to create Version of all relevant components (if applicable): ODF 4.9 and ODF 4.10 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? No Is there any workaround available to the best of your knowledge? Yes, delete the template ocs-osd-removal forces it to reconcile and the option appears Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? Yes, installing an ODF version previous to 4.9.11 and upgrade it, the template is not updated so the option is not added to it. Can this issue reproduce from the UI? Steps to Reproduce: 1. Install ODF in a version previous to 4.9.11 2. Upgrade releases 3. Try to replace an osd or review the template in a version above to 4.9.11 Actual results: Template doesn’t have the option so the command fails Expected results: Command working Additional info: --- Additional comment from RHEL Program Management on 2022-11-18 12:58:56 UTC --- This bug having no release flag set previously, is now set with release flag 'odf‑4.12.0' to '?', and so is being proposed to be fixed at the ODF 4.12.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag. --- Additional comment from amansan on 2022-11-18 13:19:31 UTC --- Good afternoon, This bug has been opened due the email thread in ocs-tech-list [ODF 4.10][OSD Replacement][Sev 4][Case#03320642] Option FORCE_OSD_REMOVAL doesn't appear in ocs-osd-removal template both customer has solved the issue, - the customer on case 03320642 deleted the template ocs-osd-removal as we were talking on the thread and it worked - the customer on case 03363378 is using a disconnected env so they did the follow The pod cannot start as it still referers to an (old)OCS image that is no longer available being synced when syncing images for the disconnected operators images. The image the template was reffering to was: registry.redhat.io/ocs4/rook-ceph-rhel8-operator@sha256:d964d5675d70e7b4b0dae1cab78d717005bbf8cae25613d553d201749da4d5ac I changed it to the image of the deployment/rook-ceph-operator, and it worked fine. Can you please correct the issue to be sure the template is reconcile during the upgrade? Thanks and regards, Alicia --- Additional comment from Mudit Agarwal on 2022-11-18 13:30:07 UTC --- Sure, we will work on it. It is not a blocker for 4.12.0, moving it to 4.13. Will backport once the fix is available there. --- Additional comment from amansan on 2022-11-22 07:26:44 UTC --- Good morning, This is regarding the case in the disconnect environment We are running ODF 4.10.7 -> but it happens when upgrading from OCS to ODF. Will there also be a bug for the wrong image in the template? <-- is my understand the root cause is the same so it will be manage on this bug but let me know if you think you need a different bug for this and I will open for you === this question comes from this comment The pod cannot start as it still referers to an (old)OCS image that is no longer available being synced when syncing images for the disconnected operators images. The image the template was reffering to was: registry.redhat.io/ocs4/rook-ceph-rhel8-operator@sha256:d964d5675d70e7b4b0dae1cab78d717005bbf8cae25613d553d201749da4d5ac I changed it to the image of the deployment/rook-ceph-operator, and it worked fine. ==== The image used in the template in ODF version 4.10.7: registry.redhat.io/ocs4/rook-ceph-rhel8-operator@sha256:d964d5675d70e7b4b0dae1cab78d717005bbf8cae25613d553d201749da4d5ac I assume it should be odf4 since ocs4 image is no longer available in a disconnected environment (we clean up our registry often): registry.redhat.io/odf4/rook-ceph-rhel8-operator@sha256:7a2ae2b9ed06b6f529e2fa72cf7221725f395849dd3fb657f41a376a06f3d1e7 Regards, Alicia --- Additional comment from Malay Kumar parida on 2022-11-22 13:09:43 UTC --- I think let's keep this one bug only as I also think the root cause is the same for them --- Additional comment from Red Hat Bugzilla on 2022-12-31 19:35:13 UTC --- remove performed by PnT Account Manager <pnt-expunge> --- Additional comment from Red Hat Bugzilla on 2022-12-31 22:33:08 UTC --- remove performed by PnT Account Manager <pnt-expunge> --- Additional comment from Red Hat Bugzilla on 2022-12-31 22:37:05 UTC --- remove performed by PnT Account Manager <pnt-expunge> --- Additional comment from Red Hat Bugzilla on 2023-01-01 08:43:40 UTC --- Account disabled by LDAP Audit --- Additional comment from Red Hat Bugzilla on 2023-01-31 23:38:11 UTC --- remove performed by PnT Account Manager <pnt-expunge> --- Additional comment from Red Hat Bugzilla on 2023-01-31 23:40:26 UTC --- remove performed by PnT Account Manager <pnt-expunge> --- Additional comment from amansan on 2023-02-06 07:20:53 UTC --- Good morning, Have you had time to take a look at this bug? Regards, Alicia --- Additional comment from Malay Kumar parida on 2023-02-14 05:20:12 UTC --- Hi Alicia, Running a little busy due to the Feature Development cycle for 4.13 as just a couple of weeks are left. But I can assure you this bug is on my radar & I have already made some investigations into the root cause & I expect to look at it more deeply after the feature freeze for 4.13 which is on Feb 28. If there is some customer dependency or waiting on the issue please do let me know I can move things around, in that case, to have prioritized attention on this. --- Additional comment from amansan on 2023-02-17 07:36:53 UTC --- Hi Malay, Ok, thanks. I´ll wait your news. Regards, Alicia --- Additional comment from RHEL Program Management on 2023-03-30 15:40:59 UTC --- This BZ is being approved for ODF 4.13.0 release, upon receipt of the 3 ACKs (PM,Devel,QA) for the release flag 'odf‑4.13.0 --- Additional comment from RHEL Program Management on 2023-03-30 15:40:59 UTC --- Since this bug has been approved for ODF 4.13.0 release, through release flag 'odf-4.13.0+', the Target Release is being set to 'ODF 4.13.0 --- Additional comment from errata-xmlrpc on 2023-04-04 12:59:27 UTC --- This bug has been added to advisory RHBA-2023:108078 by Boris Ranto (branto) --- Additional comment from errata-xmlrpc on 2023-04-04 13:00:11 UTC --- Bug report changed to ON_QA status by Errata System. A QE request has been submitted for advisory RHBA-2023:108078-01 https://errata.devel.redhat.com/advisory/108078 --- Additional comment from Sunil Kumar Acharya on 2023-04-10 12:22:18 UTC --- Please have the RDT(requires_doc_text) flag/text updated accordingly. --- Additional comment from amansan on 2023-04-11 10:08:33 UTC --- Hi Malay, I was reading https://github.com/red-hat-storage/ocs-operator/pull/1959 my understand is that the template has been updated for the versions? I´m doubting because version 4.13 is marked, can you please confirm it to me? Thanks, Alicia --- Additional comment from Malay Kumar parida on 2023-04-17 04:56:56 UTC --- Hi Alicia, Basically earlier when the template was created once it was not getting updated afterwards. Which was creating problem. For ex, if someone installs odf 4.10 then the template is created at that time with a rook-ceph-image in the template job spec. Later on customer goes to upgrade odf from 4.10 to 4.11, 4.11 to 4.12 and so on. But as the template was not reconciled, the rook-ceph-image on the template job spec will remain the old one (4.10 one in this case) even though you are now on some newer version of odf like may be 4.12. With this fix the template will get reconciled, So the rook ceph image on the template job spec will remain the correct one. --- Additional comment from Red Hat Bugzilla on 2023-04-19 14:10:58 UTC --- remove performed by PnT Account Manager <pnt-expunge> --- Additional comment from amansan on 2023-04-21 06:25:17 UTC --- Hi Malay, Thanks so much for the answer. Regards, Alicia --- Additional comment from Itzhak on 2023-05-29 11:22:14 UTC --- What should be the updated steps? should we try to update from 4.12 to 4.13? Or just deploy a cluster with 4.13 and execute the command: $ oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=${osd_id_to_remove} -p FORCE_OSD_REMOVAL=true |oc create -n openshift-storage -f -
I tested the BZ with a vSphere cluster with OCP4.10 and ODF 4.9.10(lower than 4.9.11). I performed the following steps: 1. Checked the ocs osd removal job command, which resulted in the expected error: $ oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=0 -p FORCE_OSD_REMOVAL=false |oc create -n openshift-storage -f - error: unknown parameter name "FORCE_OSD_REMOVAL" error: no objects passed to create 2. Upgrade the ODF from 4.9 to 4.10. 3. Check again ocs osd removal job command, which shows the expected output: $ oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=0 -p FORCE_OSD_REMOVAL=false |oc create -n openshift-storage -f - job.batch/ocs-osd-removal-job created $ oc get jobs ocs-osd-removal-job NAME COMPLETIONS DURATION AGE ocs-osd-removal-job 1/1 32s 136m Additional info: Versions: OC version: Client Version: 4.10.24 Server Version: 4.10.0-0.nightly-2023-07-14-163924 Kubernetes Version: v1.23.17+16bcd69 OCS version: ocs-operator.v4.10.14 OpenShift Container Storage 4.10.14 ocs-operator.v4.9.15 Succeeded Cluster version NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2023-07-14-163924 True False 14h Cluster version is 4.10.0-0.nightly-2023-07-14-163924 Rook version: rook: v4.10.14-0.e37f8ca9f2a5aa1576a1b75d888322f4f948b27d go: go1.16.12 Ceph version: ceph version 16.2.7-126.el8cp (fe0af61d104d48cb9d116cde6e593b5fc8c197e4) pacific (stable) Link to the Jenkins job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/27056/
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.10.14 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:4241