1450754 – RHCS ceph-ansible rolling upgrade sets/unsets cluster flags between every OSD upgrade

Bug 1450754 - RHCS ceph-ansible rolling upgrade sets/unsets cluster flags between every OSD upgrade

Summary: RHCS ceph-ansible rolling upgrade sets/unsets cluster flags between every OSD...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	3.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	3.0
Assignee:	seb
QA Contact:	Parikshith
Docs Contact:	Bara Ancincova
URL:
Whiteboard:
Depends On:
Blocks:	1494421
TreeView+	depends on / blocked

Reported:	2017-05-15 06:33 UTC by Vimal Kumar
Modified:	2021-03-11 15:12 UTC (History)
CC List:	14 users (show)
Fixed In Version:	RHEL: ceph-ansible-3.0.0-0.1.rc4.el7cp Ubuntu: ceph-ansible_3.0.0~rc4-2redhat1
Doc Type:	Bug Fix
Doc Text:	.`rolling_update` no longer sets and unsets flags in between each OSD upgrade The `rolling_update` playbook of the `ceph-ansible` utility set and unset the `noout`, `noscrub`, and `nodeep-scrub` flags in between each OSD upgrade. If a scrubbing process was scheduled to start shortly or was in progress, setting these flags did not stop scrubbing immediately, and `rolling_update` waited until scrubbing was finished. This process was repeated on each OSD with scheduled scrubbing or scrubbing in progress. This behavior caused the upgrade process to take considerable time to finish. This update ensures that the flags are set before upgrading all OSDs, and are unset after all OSDs are upgraded.
Clone Of:
Environment:
Last Closed:	2017-12-05 23:33:43 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
rolling update log (1.09 MB, text/plain) 2017-10-06 08:34 UTC, Parikshith	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	ceph ceph-ansible pull 1517	0	None	None	None	2017-05-15 13:18:38 UTC
Red Hat Product Errata	RHBA-2017:3387	0	normal	SHIPPED_LIVE	Red Hat Ceph Storage 3.0 bug fix and enhancement update	2017-12-06 03:03:45 UTC

Description Vimal Kumar 2017-05-15 06:33:47 UTC

a) Description of problem:

The rolling upgrade rules in rolling_update.yml set and unset cluster flags (noout, noscrub, nodeep-scrub)) in between each OSD upgrade. This causes problems in case the PGs are on the verge of scrub or an ongoing scrub. 

If scrubbing is either happening or is supposed to start shortly, setting the cluster flags will not stop the scrub immediately, it will wait till the scrub finishes on the locked chunk. Once the upgrade of one OSD is finished, the flags are removed, which will trigger the pending scrub. The next OSD upgrade will require setting the flags again, but since the PGs are being scrubbed at that time, the upgrade process won't continue. It will have to wait till the scrub finishes.

The upgrade process can take considerable time in finishing in this situation. Setting the cluster flags once, upgrading all the OSDs properly, and removing the flags should be less intrusive in the upgrade process.

Upstream commit at https://github.com/ceph/ceph-ansible/pull/1517.

b) Version-Release number of selected component (if applicable):

RHCS 2.x

c) How reproducible:

Reproducible when the upgrade is done amidst scrubbing.

Comment 3 seb 2017-05-15 16:20:33 UTC

Thanks, PR here: https://github.com/ceph/ceph-ansible/pull/1517
Work in progress

Comment 8 Parikshith 2017-10-06 08:34:25 UTC

Created attachment 1335163 [details]
rolling update log

Comment 9 Sébastien Han 2017-10-06 20:54:30 UTC

Parikshith, the flags are set at the end of the mon upgrade and unset at the end of the OSDs upgrade, when the last one finishes.

So the behaviour is correct, please move this to VERIFIED.
Thanks.

Comment 11 Sébastien Han 2017-10-24 12:48:56 UTC

lgtm!

Comment 14 errata-xmlrpc 2017-12-05 23:33:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3387

Note You need to log in before you can comment on or make changes to this bug.