Bug 2300310

Summary: [cee/sd] mclock_scheduler slow backfill
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Tomas Petr <tpetr>
Component: RADOSAssignee: Sridhar Seshasayee <sseshasa>
Status: CLOSED DUPLICATE QA Contact: skanta
Severity: medium Docs Contact:
Priority: medium    
Version: 6.1CC: bhubbard, ceph-eng-bugs, cephqe-warriors, kelwhite, mcaldeir, nojha, pdhiran, rsachere, sseshasa, trchakra, vumrao
Target Milestone: ---Flags: sseshasa: needinfo? (tpetr)
sseshasa: needinfo?
sseshasa: needinfo? (tpetr)
kelwhite: needinfo? (tpetr)
Target Release: 6.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2025-03-28 07:26:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tomas Petr 2024-07-29 08:43:00 UTC
Description of problem:
There is an issue with slow backfill when mclock_scheduler is used, the osd_mclock_override_recovery_settings=true + osd_max_backfills=8 does not seem to have any effect on amount of PGs being in backfilling state:
---
data:
    volumes: 1/1 healthy
    pools:   42 pools, 10488 pgs
    objects: 710.17M objects, 2.1 PiB
    usage:   5.4 PiB used, 16 PiB / 21 PiB avail
    pgs:     21327554/4573948158 objects misplaced (0.466%)
             10341 active+clean
             68    active+remapped+backfill_wait
             61    active+clean+scrubbing+deep
             14    active+clean+scrubbing
             4     active+remapped+backfilling
 
  io:
    client:   551 MiB/s rd, 23 MiB/s wr, 501 op/s rd, 611 op/s wr
    recovery: 21 MiB/s, 6 objects/s
---
ceph_config_dump:osd                                                                                                         advanced  osd_max_backfills                              8                                                                                                                                    
ceph_config_dump:osd                                                                                                         advanced  osd_mclock_override_recovery_settings          true     
---

The data moved are cephfs data pool EC profile = 8+3, the reason for data movement is to move data from old HDD OSDs to new HDD_ECC OSDs (both with block.db on SSD)

Version-Release number of selected component (if applicable):
RHCS 6.1z6
ceph version 17.2.6-216.el9cp

How reproducible:
Always in this environment

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Storage PM bot 2024-07-29 08:43:13 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 10 kelwhite 2024-08-21 22:10:35 UTC
Putting need info on Tomas for c#9