Bug 1632160

Summary:	OSD restart should attempt the same amount of time for each OSD restart
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Guillaume Abrioux <gabrioux>
Component:	Ceph-Ansible	Assignee:	Guillaume Abrioux <gabrioux>
Status:	CLOSED DUPLICATE	QA Contact:	ceph-qe-bugs <ceph-qe-bugs>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	3.0	CC:	aschoen, ceph-eng-bugs, gmeno, nthomas, sankarshan
Target Milestone:	z5
Target Release:	3.*
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: the 'RETRIES' counter in restart_osd_daemon.sh was set at the start of the script and never reset between each call of check_pgs() function. Consequence: The counter which is set to 40 by default was never reset between each OSD restart, it means it was trying 40 times for all OSD on a node. Fix: The counter is now reset between each call of `check_pgs()` function. Result: The script tries the same amount of time for every OSD restart.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-09-24 09:46:41 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Guillaume Abrioux 2018-09-24 09:09:32 UTC

Description of problem:

The 'RETRIES' counter is not reset after each call of check_pgs in the restart_osd_daemon.sh script.
It's set with a default of 40 attempts, it means that it would wait for up to 40 lots of 30s across *all* the OSDs on a host.


How reproducible:
100%


Steps to Reproduce:
1. Deploy a cluster.
2. Make a change so the 'restart osds daemon' handler is triggered.
3. Relaunch the playbook.

Actual results:
The playbook will retry up to 40 times across all the OSDs on a node.

Expected results:
We should retry for the same amount of time after each OSD restart.

Comment 5 Guillaume Abrioux 2018-09-24 09:46:41 UTC


*** This bug has been marked as a duplicate of bug 1632157 ***