Bug 1473028 - [RFE] Increase priority for inactive PGs 'backfill/recovery' as compare to active+<some other states> PGs
[RFE] Increase priority for inactive PGs 'backfill/recovery' as compare to ac...
Status: NEW
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RADOS (Show other bugs)
x86_64 Linux
medium Severity medium
: rc
: 3.1
Assigned To: Josh Durgin
: FutureFeature
Depends On:
  Show dependency treegraph
Reported: 2017-07-19 17:45 EDT by Vikhyat Umrao
Modified: 2017-10-21 12:56 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Vikhyat Umrao 2017-07-19 17:45:09 EDT
Description of problem:
[RFE] Increase priority for inactive PGs 'backfill/recovery' as compare to active+<some other state> PGs

For example:

1 active+clean+scrubbing, 
1 undersized+degraded+remapped+wait_backfill+peered, 
33 active+degraded+remapped+backfilling, 
24 active+recovery_wait+degraded+remapped, 
11 active+remapped+backfill_toofull, 
53954 active+clean

If we check in above example we have 33 PGs getting backfilled which are in active+ <some other state> but we have one PG which is inactive(undersized+degraded+remapped+wait_backfill+peered) and blocking client IO because it is inactive. 

Ceph should prioritize this PG as compared to other 33 PGs which are in active+<some other state>.

Version-Release number of selected component (if applicable):
Red Hat Ceph Storage 2.3

There is a lot of work going on this feature in upstream in Luminous and in next point releases of Luminous.

This RFE bug is used to track all that work and then backport to Red Hat Ceph Storage 3.y.

We do have a backport[1] available in jewel to fix this issue but this fix is not complete resolution this needs more work to achieve this feature.

I had a discussion with Josh for this current fix and for future work.

This current fix:

- it's just adjusting the priority of the backfill ops
- it's not a perfect fix, but it does improve things

- with osds choosing independently, there could be one osd that finished all the high priority ones and started some lower priority backfills before all the high priority ones from other osds have finished means some osds could have a bunch of inactive pgs to backfill, and other osds may have none. This could give above state what we see here in above example.

- so when the osds with no inactive pgs are primary and start backfill, they'll have to start low priority backfills

- it's possible we could improve this in luminous since we can cancel backfills now. e.g. when a higher-priority one needs the reservation, cancel a much lower priority backfill to get it. This might increase total recovery time since you need to restart some backfills but it may help availability.

There is another PR work in progress for master branch:

In a large, live cluster, it may be desirable to have particular PGs recovered before others. An actual example includes a recovery after a rack failure, where a lot of PGs must be recovered and some of them host data for live VMs with SLA higher than other VMs, in which case we'd like to have high-SLA VMs to be restored to full health and performance as fast as possible. 

This PR adds four new commands:

1. ceph pg force-recovery
2. ceph pg force-backfill
3. ceph pg cancel-force-recovery
4. ceph pg cancel-force-backfill

which mark one or more specified PGs as "forced", and thus having their recovery or backfill priority maximized. This PR also alters the priorities of default recovery (reduces max priority to 254), so any other PG won't get in the way. The user can restore default priorities with "cancel-force-*" commands at any time.

[1] https://github.com/ceph/ceph/pull/13232/commits/2f2032814189a4ecbf8dc01b59bebfae8ab3f524

$ git tag --contains 2f2032814189a4ecbf8dc01b59bebfae8ab3f524

Note You need to log in before you can comment on or make changes to this bug.