Bug 2300310

Summary:	[cee/sd] mclock_scheduler slow backfill
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Tomas Petr <tpetr>
Component:	RADOS	Assignee:	Sridhar Seshasayee <sseshasa>
Status:	CLOSED DUPLICATE	QA Contact:	skanta
Severity:	medium	Docs Contact:
Priority:	medium
Version:	6.1	CC:	bhubbard, ceph-eng-bugs, cephqe-warriors, kelwhite, mcaldeir, nojha, pdhiran, rsachere, sseshasa, trchakra, vumrao
Target Milestone:	---	Flags:	sseshasa: needinfo? (tpetr) sseshasa: needinfo? sseshasa: needinfo? (tpetr) kelwhite: needinfo? (tpetr)
Target Release:	6.2
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2025-03-28 07:26:15 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Tomas Petr 2024-07-29 08:43:00 UTC

Description of problem:
There is an issue with slow backfill when mclock_scheduler is used, the osd_mclock_override_recovery_settings=true + osd_max_backfills=8 does not seem to have any effect on amount of PGs being in backfilling state:
---
data:
    volumes: 1/1 healthy
    pools:   42 pools, 10488 pgs
    objects: 710.17M objects, 2.1 PiB
    usage:   5.4 PiB used, 16 PiB / 21 PiB avail
    pgs:     21327554/4573948158 objects misplaced (0.466%)
             10341 active+clean
             68    active+remapped+backfill_wait
             61    active+clean+scrubbing+deep
             14    active+clean+scrubbing
             4     active+remapped+backfilling
 
  io:
    client:   551 MiB/s rd, 23 MiB/s wr, 501 op/s rd, 611 op/s wr
    recovery: 21 MiB/s, 6 objects/s
---
ceph_config_dump:osd                                                                                                         advanced  osd_max_backfills                              8                                                                                                                                    
ceph_config_dump:osd                                                                                                         advanced  osd_mclock_override_recovery_settings          true     
---

The data moved are cephfs data pool EC profile = 8+3, the reason for data movement is to move data from old HDD OSDs to new HDD_ECC OSDs (both with block.db on SSD)

Version-Release number of selected component (if applicable):
RHCS 6.1z6
ceph version 17.2.6-216.el9cp

How reproducible:
Always in this environment

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Storage PM bot 2024-07-29 08:43:13 UTC

Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 10 kelwhite 2024-08-21 22:10:35 UTC

Putting need info on Tomas for c#9