1262974 – upstart: make config less generous about restarts

Bug 1262974 - upstart: make config less generous about restarts

Summary: upstart: make config less generous about restarts

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RADOS
Sub Component:
Version:	1.2.3
Hardware:	Unspecified
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	1.2.4
Assignee:	Ken Dreyer (Red Hat)
QA Contact:	ceph-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-09-14 19:24 UTC by Samuel Just
Modified:	2022-02-21 18:37 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Prior to this update, the upstart init system would restart Ceph's daemons too frequently, up to five times in 30 seconds. This could lead to startup respawn loops that mask other issues, such as disk state problems. This update adjusts Ceph's upstart settings to restart daemons less aggressively, three times in 30 minutes.
Clone Of:
Clones:	1262976 (view as bug list)
Environment:
Last Closed:	2015-10-01 21:01:06 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	11798	None	None	None	Never
Red Hat Bugzilla	1262977	unspecified	CLOSED	[RFE] verify that systemd config prevents repeatedly restarting daemons	2022-02-21 18:44:07 UTC
Red Hat Issue Tracker	RHCEPH-3499	None	None	None	2022-02-21 18:37:52 UTC
Red Hat Product Errata	RHBA-2015:1572	normal	SHIPPED_LIVE	ceph bug fix update	2016-02-02 22:25:22 UTC

Internal Links: 1262977

Description Samuel Just 2015-09-14 19:24:35 UTC

Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Samuel Just 2015-09-14 19:27:54 UTC

Description of problem:

upstart is too generous about restarting broken daemons

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. start cluster
2. /var/log/ceph directory mount read only on a mon node
3. watch ceph-mon repeatedly restart

Actual results:

ceph-mon repeatedly restarts

Expected results:

ceph-mon repeatedly restarts for a bit, and then remains dead

Additional info:

Comment 4 Ken Dreyer (Red Hat) 2015-09-15 21:52:33 UTC

Fix will be in non-RHEL Ceph v0.80.8.5

Comment 6 Tamil 2015-09-18 21:45:54 UTC

verified and the fix works fine.

1. sudo pkill -9 -f 'ceph -i 0' - kill osd.0
2. wait for 30 seconds
3. look for upstart restarting the daemons

repeat the above steps 2 more times and then upstart will stop restarting the daemon. 

later, to bring up the osd.0, use "sudo start ceph-osd id=0".


upstart should not restart daemons, when killed more than 3 times within 30 minute time frame.

Comment 7 Tamil 2015-09-18 21:46:55 UTC

if after upgrading from rh ceph 1.2.3 to 1.2.3-2 or 1.2.3 to 1.2.3-1 to 1.2.3-2 , the fix doesnt work, reboot the cluster once and retry.

Comment 9 errata-xmlrpc 2015-10-01 21:01:06 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:1572

Note You need to log in before you can comment on or make changes to this bug.