2300310 – [cee/sd] mclock_scheduler slow backfill

Bug 2300310 - [cee/sd] mclock_scheduler slow backfill [NEEDINFO]

Summary: [cee/sd] mclock_scheduler slow backfill

Keywords:
Status:	CLOSED DUPLICATE of bug 2299482
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RADOS
Sub Component:
Version:	6.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	6.2
Assignee:	Sridhar Seshasayee
QA Contact:	skanta
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-07-29 08:43 UTC by Tomas Petr
Modified:	2025-04-01 04:24 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2025-03-28 07:26:15 UTC
Embargoed:
Dependent Products:
Flags:	sseshasa: needinfo? (tpetr) sseshasa: needinfo? sseshasa: needinfo? (tpetr) kelwhite: needinfo? (tpetr)

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	68224	None	None	None	2024-11-07 05:28:43 UTC
Red Hat Issue Tracker	RHCEPH-9425	None	None	None	2024-07-29 08:45:57 UTC
Red Hat Knowledge Base (Solution)	7092973	None	None	None	2024-10-25 16:52:45 UTC

Description Tomas Petr 2024-07-29 08:43:00 UTC

Description of problem:
There is an issue with slow backfill when mclock_scheduler is used, the osd_mclock_override_recovery_settings=true + osd_max_backfills=8 does not seem to have any effect on amount of PGs being in backfilling state:
---
data:
    volumes: 1/1 healthy
    pools:   42 pools, 10488 pgs
    objects: 710.17M objects, 2.1 PiB
    usage:   5.4 PiB used, 16 PiB / 21 PiB avail
    pgs:     21327554/4573948158 objects misplaced (0.466%)
             10341 active+clean
             68    active+remapped+backfill_wait
             61    active+clean+scrubbing+deep
             14    active+clean+scrubbing
             4     active+remapped+backfilling
 
  io:
    client:   551 MiB/s rd, 23 MiB/s wr, 501 op/s rd, 611 op/s wr
    recovery: 21 MiB/s, 6 objects/s
---
ceph_config_dump:osd                                                                                                         advanced  osd_max_backfills                              8                                                                                                                                    
ceph_config_dump:osd                                                                                                         advanced  osd_mclock_override_recovery_settings          true     
---

The data moved are cephfs data pool EC profile = 8+3, the reason for data movement is to move data from old HDD OSDs to new HDD_ECC OSDs (both with block.db on SSD)

Version-Release number of selected component (if applicable):
RHCS 6.1z6
ceph version 17.2.6-216.el9cp

How reproducible:
Always in this environment

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Storage PM bot 2024-07-29 08:43:13 UTC

Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 10 kelwhite 2024-08-21 22:10:35 UTC

Putting need info on Tomas for c#9

Note You need to log in before you can comment on or make changes to this bug.