Bug 2111224 - [RFE] ceph orch upgrade should set noout, nodeep-scrub, and noscrub and unset when the upgrade will complete
Summary: [RFE] ceph orch upgrade should set noout, nodeep-scrub, and noscrub and unset...
Keywords:
Status: NEW
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Cephadm
Version: 5.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 7.1
Assignee: Adam King
QA Contact: Manasa
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-26 19:06 UTC by Vikhyat Umrao
Modified: 2023-08-15 14:01 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 56670 0 None None None 2022-07-26 19:06:48 UTC
Red Hat Issue Tracker RHCEPH-4934 0 None None None 2022-07-26 19:07:31 UTC

Description Vikhyat Umrao 2022-07-26 19:06:49 UTC
Description of problem:
[RFE] ceph orch upgrade should set noout, nodeep-scrub, and noscrub and unset when the upgrade will complete

Version-Release number of selected component (if applicable):
RHCS 5 and above

Upstream tracker - https://tracker.ceph.com/issues/56670

- This was the case when we used to use ceph-ansible
- This is a kind of feature parity b/w ceph-ansible and cephadm

- This feature can be designed as optional with default as True so if some users/admins do not want then can set it to false.

Benefits:

1. Less load from scrubbing during the upgrade when we expect to have recovery in the cluster
2. If an OSD is taking longer to reboot -> boot due to different issues1 or slow boot

[1] For example, PG dups issue - https://tracker.ceph.com/issues/53729 - it takes approx 7 to 8 minutes for an NVMe OSD to boot with 50M dups and approx 12-15 minutes for hybrid HDD OSDs
and if an OSD takes more than 10 minutes the Monitor marks the down OSD out and we will have backfill/recovery in the cluster when the upgrade is running and we do not want that
There can be multiple examples hence running the upgrade with the following flags is recommended:

noscrub
nodeep-scrub
noout

Comment 2 Vikhyat Umrao 2022-08-17 18:44:44 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1982056 - another RFE on the same line to take care of pg_autoscaler and balancer during upgrade!


Note You need to log in before you can comment on or make changes to this bug.