2309444 – [RDR] When cluster was upgraded from 4.16 to 4.17 osd migrated to new ceph osd id's

Bug 2309444 - [RDR] When cluster was upgraded from 4.16 to 4.17 osd migrated to new ceph osd id's

Summary: [RDR] When cluster was upgraded from 4.16 to 4.17 osd migrated to new ceph os...

Keywords:
Status:	POST
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	ceph
Sub Component:
Version:	4.18
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Guillaume Abrioux
QA Contact:	Elad
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2309719
TreeView+	depends on / blocked

Reported:	2024-09-03 13:45 UTC by Pratik Surve
Modified:	2024-10-30 21:42 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	2309719 (view as bug list)
Environment:
Last Closed:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	ceph ceph pull 59604	0	None	Merged	ceph-volume: pass self.osd_id to create_id() call	2025-04-28 14:29:45 UTC
Red Hat Issue Tracker	OCSBZM-8931	0	None	None	None	2024-09-03 13:46:28 UTC

Description Pratik Surve 2024-09-03 13:45:24 UTC

Description of problem (please be detailed as possible and provide log
snippests):

[RDR] When cluster was upgraded from 4.16 to 4.17, osd migrated to new ceph osd id's 


Version of all relevant components (if applicable):

OCP version:- 4.17.0-0.nightly-2024-09-02-044025
ODF version:- 4.17.0-90
CEPH version:- ceph version 19.1.0-42.el9cp (03ae7f7ffec5e7796d2808064c4766b35c4b5ffb) squid (rc)
ACM version:- 2.11.2
SUBMARINER version:- v0.18.0
VOLSYNC version:- volsync-product.v0.10.0
OADP version:- 1.4.0
VOLSYNC method:- destinationCopyMethod: Direct
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue be reproducible?

yes

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.Deploy RDR 4.16 
2.Upgrade RDR cluster to 4.17
3. check ceph status
4. Migration for osd will be started automatically 


Actual results:

rook-ceph-osd-3-ff888c5f5-wqjfx                                   2/2     Running     0             89m
rook-ceph-osd-4-8c7886d68-n9mbj                                   2/2     Running     0             64m
rook-ceph-osd-5-696975c56-sl4mj                                   2/2     Running     0             35m


Expected results:
We should not change osd id's
openshift-storage                                  rook-ceph-osd-1-7f59c986c7-hv55p                                  2/2     Running     0               6h42m
openshift-storage                                  rook-ceph-osd-2-866d4787bf-trnqp                                  2/2     Running     0               6h48m
openshift-storage                                  rook-ceph-osd-3-ff888c5f5-wqjfx                                   2/2     Running     0               11m

Additional info:

cephstatus
  cluster:
    id:     d378191d-7fe9-469c-9385-7cb128679367
    health: HEALTH_WARN
            1 osds down
            Degraded data redundancy: 79462/564801 objects degraded (14.069%), 73 pgs degraded, 80 pgs undersized

  services:
    mon:        3 daemons, quorum d,e,f (age 92m)
    mgr:        a(active, since 9m), standbys: b
    mds:        1/1 daemons up, 1 hot standby
    osd:        6 osds: 3 up (since 36m), 4 in (since 37m)
    rbd-mirror: 1 daemon active (1 hosts)
    rgw:        1 daemon active (1 hosts, 1 zones)

  data:
    volumes: 1/1 healthy
    pools:   12 pools, 169 pgs
    objects: 188.27k objects, 67 GiB
    usage:   163 GiB used, 5.8 TiB / 6 TiB avail
    pgs:     79462/564801 objects degraded (14.069%)
             89 active+clean
             73 active+undersized+degraded
             7  active+undersized

  io:
    client:   20 MiB/s rd, 6.1 MiB/s wr, 1.73k op/s rd, 28 op/s wr

Note You need to log in before you can comment on or make changes to this bug.