Bug 2319102

Summary:	[RDR] After testing Brownfield scenario for osd migration cephcluster is reporting ReconcileFailed error
Product:	[Red Hat Storage] Red Hat OpenShift Data Foundation	Reporter:	Pratik Surve <prsurve>
Component:	rook	Assignee:	Santosh Pillai <sapillai>
Status:	CLOSED ERRATA	QA Contact:	Pratik Surve <prsurve>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	4.17	CC:	nberry, odf-bz-bot, sapillai, sheggodu, tnielsen
Target Milestone:	---
Target Release:	ODF 4.17.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	4.17.0-126	Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2024-10-30 14:36:24 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Pratik Surve 2024-10-16 08:47:58 UTC

Description of problem (please be detailed as possible and provide log
snippests):

[RDR] After testing Brownfield scenario for osd migration, the cephcluster is reporting ReconcileFailed error

Version of all relevant components (if applicable):

OCP version:- 4.17.0-0.nightly-2024-10-15-061952
ODF version:- 4.17.0-124
CEPH version:- ceph version 18.2.1-229.el9cp (ef652b206f2487adfc86613646a4cac946f6b4e0) reef (stable)
ACM version:- 2.12.0
SUBMARINER version:- v0.19.0
VOLSYNC version:-
OADP version:- 1.4.1
VOLSYNC method:- destinationCopyMethod: Direct

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
yes

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.Deploy ODF with bluestore osd 
2.Migrate osd to bluestore-rdr from UI
3.Check for cephcluster


Actual results:

Events:
  Type     Reason           Age                  From                          Message
  ----     ------           ----                 ----                          -------
  Warning  ReconcileFailed  108m (x3 over 110m)  rook-ceph-cluster-controller  failed to reconcile CephCluster "openshift-storage/ocs-storagecluster-cephcluster". failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to create cluster: failed to start ceph osds: reconcile operator to replace OSDs that are pending migration
  Warning  ReconcileFailed  90m (x15 over 108m)  rook-ceph-cluster-controller  failed to reconcile CephCluster "openshift-storage/ocs-storagecluster-cephcluster". failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to create cluster: failed to start ceph osds: failed to delete the "osd-replace-config" configmap: failed to delete ConfigMap osd-replace-config; it does not exist. configmaps "osd-replace-config" not found
  Warning  ReconcileFailed  15m (x21 over 82m)   rook-ceph-cluster-controller  failed to reconcile CephCluster "openshift-storage/ocs-storagecluster-cephcluster". failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to create cluster: failed to start ceph osds: failed to delete the "osd-replace-config" configmap: failed to delete ConfigMap osd-replace-config; it does not exist. configmaps "osd-replace-config" not found

Expected results:

There should not be any error or warning message.


Additional info:

Migration was successful. ceph status is healthy

$cephstatus
  cluster:
    id:     b2d68682-deff-49a1-a3cd-b469ebf3d808
    health: HEALTH_OK

  services:
    mon:        3 daemons, quorum d,e,f (age 14h)
    mgr:        b(active, since 10m), standbys: a
    mds:        1/1 daemons up, 1 hot standby
    osd:        3 osds: 3 up (since 2h), 3 in (since 14h)
    rbd-mirror: 1 daemon active (1 hosts)
    rgw:        1 daemon active (1 hosts, 1 zones)

  data:
    volumes: 1/1 healthy
    pools:   12 pools, 169 pgs
    objects: 515 objects, 237 MiB
    usage:   809 MiB used, 6.0 TiB / 6 TiB avail
    pgs:     169 active+clean

  io:
    client:   4.3 KiB/s rd, 1.7 KiB/s wr, 5 op/s rd, 0 op/s wr


$oc get pods -l app=rook-ceph-osd
NAME                               READY   STATUS    RESTARTS   AGE
rook-ceph-osd-0-66958f9ffb-5httn   2/2     Running   0          124m
rook-ceph-osd-1-5684564fcc-dgrm4   2/2     Running   0          123m
rook-ceph-osd-2-5748b89dbc-mkrxw   2/2     Running   0          122m

Comment 11 errata-xmlrpc 2024-10-30 14:36:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.17.0 Security, Enhancement, & Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:8676