Bug 2319102 - [RDR] After testing Brownfield scenario for osd migration cephcluster is reporting ReconcileFailed error
Summary: [RDR] After testing Brownfield scenario for osd migration cephcluster is repo...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.17
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ODF 4.17.0
Assignee: Santosh Pillai
QA Contact: Pratik Surve
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-10-16 08:47 UTC by Pratik Surve
Modified: 2024-10-30 14:36 UTC (History)
5 users (show)

Fixed In Version: 4.17.0-126
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-10-30 14:36:24 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage rook pull 756 0 None open Bug 2319102: core: fix deletion of the osd-replace-config cm 2024-10-16 13:55:02 UTC
Github rook rook pull 14862 0 None open core: fix deletion of the osd-replace-config cm 2024-10-16 10:52:19 UTC
Red Hat Issue Tracker OCSBZM-9389 0 None None None 2024-10-16 08:48:10 UTC
Red Hat Product Errata RHSA-2024:8676 0 None None None 2024-10-30 14:36:27 UTC

Description Pratik Surve 2024-10-16 08:47:58 UTC
Description of problem (please be detailed as possible and provide log
snippests):

[RDR] After testing Brownfield scenario for osd migration, the cephcluster is reporting ReconcileFailed error

Version of all relevant components (if applicable):

OCP version:- 4.17.0-0.nightly-2024-10-15-061952
ODF version:- 4.17.0-124
CEPH version:- ceph version 18.2.1-229.el9cp (ef652b206f2487adfc86613646a4cac946f6b4e0) reef (stable)
ACM version:- 2.12.0
SUBMARINER version:- v0.19.0
VOLSYNC version:-
OADP version:- 1.4.1
VOLSYNC method:- destinationCopyMethod: Direct

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
yes

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.Deploy ODF with bluestore osd 
2.Migrate osd to bluestore-rdr from UI
3.Check for cephcluster


Actual results:

Events:
  Type     Reason           Age                  From                          Message
  ----     ------           ----                 ----                          -------
  Warning  ReconcileFailed  108m (x3 over 110m)  rook-ceph-cluster-controller  failed to reconcile CephCluster "openshift-storage/ocs-storagecluster-cephcluster". failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to create cluster: failed to start ceph osds: reconcile operator to replace OSDs that are pending migration
  Warning  ReconcileFailed  90m (x15 over 108m)  rook-ceph-cluster-controller  failed to reconcile CephCluster "openshift-storage/ocs-storagecluster-cephcluster". failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to create cluster: failed to start ceph osds: failed to delete the "osd-replace-config" configmap: failed to delete ConfigMap osd-replace-config; it does not exist. configmaps "osd-replace-config" not found
  Warning  ReconcileFailed  15m (x21 over 82m)   rook-ceph-cluster-controller  failed to reconcile CephCluster "openshift-storage/ocs-storagecluster-cephcluster". failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to create cluster: failed to start ceph osds: failed to delete the "osd-replace-config" configmap: failed to delete ConfigMap osd-replace-config; it does not exist. configmaps "osd-replace-config" not found

Expected results:

There should not be any error or warning message.


Additional info:

Migration was successful. ceph status is healthy

$cephstatus
  cluster:
    id:     b2d68682-deff-49a1-a3cd-b469ebf3d808
    health: HEALTH_OK

  services:
    mon:        3 daemons, quorum d,e,f (age 14h)
    mgr:        b(active, since 10m), standbys: a
    mds:        1/1 daemons up, 1 hot standby
    osd:        3 osds: 3 up (since 2h), 3 in (since 14h)
    rbd-mirror: 1 daemon active (1 hosts)
    rgw:        1 daemon active (1 hosts, 1 zones)

  data:
    volumes: 1/1 healthy
    pools:   12 pools, 169 pgs
    objects: 515 objects, 237 MiB
    usage:   809 MiB used, 6.0 TiB / 6 TiB avail
    pgs:     169 active+clean

  io:
    client:   4.3 KiB/s rd, 1.7 KiB/s wr, 5 op/s rd, 0 op/s wr


$oc get pods -l app=rook-ceph-osd
NAME                               READY   STATUS    RESTARTS   AGE
rook-ceph-osd-0-66958f9ffb-5httn   2/2     Running   0          124m
rook-ceph-osd-1-5684564fcc-dgrm4   2/2     Running   0          123m
rook-ceph-osd-2-5748b89dbc-mkrxw   2/2     Running   0          122m

Comment 11 errata-xmlrpc 2024-10-30 14:36:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.17.0 Security, Enhancement, & Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:8676


Note You need to log in before you can comment on or make changes to this bug.