a) Description of problem:
The rolling upgrade rules in rolling_update.yml set and unset cluster flags (noout, noscrub, nodeep-scrub)) in between each OSD upgrade. This causes problems in case the PGs are on the verge of scrub or an ongoing scrub.
If scrubbing is either happening or is supposed to start shortly, setting the cluster flags will not stop the scrub immediately, it will wait till the scrub finishes on the locked chunk. Once the upgrade of one OSD is finished, the flags are removed, which will trigger the pending scrub. The next OSD upgrade will require setting the flags again, but since the PGs are being scrubbed at that time, the upgrade process won't continue. It will have to wait till the scrub finishes.
The upgrade process can take considerable time in finishing in this situation. Setting the cluster flags once, upgrading all the OSDs properly, and removing the flags should be less intrusive in the upgrade process.
Upstream commit at https://github.com/ceph/ceph-ansible/pull/1517.
b) Version-Release number of selected component (if applicable):
c) How reproducible:
Reproducible when the upgrade is done amidst scrubbing.
Thanks, PR here: https://github.com/ceph/ceph-ansible/pull/1517
Work in progress
Created attachment 1335163 [details]
rolling update log
Parikshith, the flags are set at the end of the mon upgrade and unset at the end of the OSDs upgrade, when the last one finishes.
So the behaviour is correct, please move this to VERIFIED.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.