1632157 – OSD restart should attempt the same amount of time for each OSD restart

Bug 1632157 - OSD restart should attempt the same amount of time for each OSD restart

Summary: OSD restart should attempt the same amount of time for each OSD restart

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	z1
Target Release:	3.1
Assignee:	Guillaume Abrioux
QA Contact:	subhash
Docs Contact:	Bara Ancincova
URL:
Whiteboard:
Duplicates (1):	1632160 (view as bug list)
Depends On:
Blocks:	1584264
TreeView+	depends on / blocked

Reported:	2018-09-24 09:02 UTC by Guillaume Abrioux
Modified:	2018-11-09 01:01 UTC (History)
CC List:	9 users (show)
Fixed In Version:	RHEL: ceph-ansible-3.1.7-1.el7cp Ubuntu: ceph-ansible_3.1.7-2redhat1
Doc Type:	Bug Fix
Doc Text:	.The restarting script now tries the same amount of time for every OSD restart The 'RETRIES' counter in the `restart_osd_daemon.sh` script was set at the start of the script and never reset between each call of the `check_pgs()` function. Consequently, the counter, which is set to 40 by default, was never reset between each restart of an OSD and was trying 40 times for all OSDs on a node. With this update, the counter is now reset between each call of the `check_pgs()` function, and the script tries the same amount of time for every OSD restart.
Clone Of:
Environment:
Last Closed:	2018-11-09 01:00:34 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	ceph ceph-ansible pull 3159	0	None	closed	Automatic backport of pull request #3155	2020-07-22 13:05:51 UTC
Red Hat Product Errata	RHBA-2018:3530	0	None	None	None	2018-11-09 01:01:29 UTC

Description Guillaume Abrioux 2018-09-24 09:02:57 UTC

Description of problem:

The 'RETRIES' counter is not reset after each call of check_pgs in the restart_osd_daemon.sh script.
It's set with a default of 40 attempts, it means that it would wait for up to 40 lots of 30s across *all* the OSDs on a host.


How reproducible:
100%


Steps to Reproduce:
1. Deploy a cluster.
2. Make a change so the 'restart osds daemon' handler is triggered.
3. Relaunch the playbook.

Actual results:
The playbook will retry up to 40 times across all the OSDs on a node.

Expected results:
We should retry for the same amount of time after each OSD restart.

Comment 3 Guillaume Abrioux 2018-09-24 09:46:41 UTC

*** Bug 1632160 has been marked as a duplicate of this bug. ***

Comment 11 errata-xmlrpc 2018-11-09 01:00:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3530

Note You need to log in before you can comment on or make changes to this bug.