1262977 – [RFE] verify that systemd config prevents repeatedly restarting daemons

Bug 1262977 - [RFE] verify that systemd config prevents repeatedly restarting daemons

Summary: [RFE] verify that systemd config prevents repeatedly restarting daemons

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RADOS
Sub Component:
Version:	1.2.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	2.0
Assignee:	Boris Ranto
QA Contact:	Rachana Patel
Docs Contact:	Bara Ancincova
URL:
Whiteboard:
Depends On:
Blocks:	1322504
TreeView+	depends on / blocked

Reported:	2015-09-14 19:31 UTC by Samuel Just
Modified:	2022-02-21 18:39 UTC (History)
CC List:	8 users (show)
Fixed In Version:	ceph-10.2.1-12.el7cp.x86_64
Doc Type:	Enhancement
Doc Text:	.`systemd` now restarts failed Ceph services When a Ceph service, such as `ceph-mon` or `ceph-osd`, fails to start, the `systemd` daemon now attempts to restart the service. Prior to this update, Ceph services remained in the failed state.
Clone Of:
Environment:
Last Closed:	2016-08-23 19:27:08 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1262974	unspecified	CLOSED	upstart: make config less generous about restarts	2022-02-21 18:55:13 UTC
Red Hat Issue Tracker	RHCEPH-3509	None	None	None	2022-02-21 18:39:38 UTC
Red Hat Product Errata	RHBA-2016:1755	normal	SHIPPED_LIVE	Red Hat Ceph Storage 2.0 bug fix and enhancement update	2016-08-23 23:23:52 UTC

Internal Links: 1262974

Description Samuel Just 2015-09-14 19:31:47 UTC

Description of problem:

Linked bug describes the situation for upstart, verify that systemd behaves properly.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Samuel Just 2015-09-14 19:33:09 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=1262974

Comment 3 Ken Dreyer (Red Hat) 2015-09-14 19:40:54 UTC

systemd support will land in Infernalis/Jewel -> re-targeting

Comment 4 Ken Dreyer (Red Hat) 2016-02-29 19:34:05 UTC

Boris, I think we need to add the following to the systemd unit files:

  Restart=on-failure
  StartLimitBurst=3
  StartLimitInterval=1800

to match Upstart's "respawn" and "respawn limit" settings? Would you please submit a PR for that?

Comment 5 Boris Ranto 2016-03-15 21:17:18 UTC

Ken, we should definitely add the

Restart=on-failure

line if we want systemd to actually attempt to restart the services on failure. However, I'm not sure we want to override the defaults for the restarts -- systemd does have its own defaults for how often can a process fail in a period of time for it to give up.

Comment 6 Greg Farnum 2016-03-15 21:42:16 UTC

Those restart limits were chosen reasonably carefully based on characteristics of Ceph OSDs as IO-consuming beasts, of Ceph clusters as a whole, and the interaction between those two. The SystemD default process limits are unlikely to be useful in that regard, and we have those custom limits based on issues customers have run into with different rules. ;)

Comment 7 Boris Ranto 2016-03-17 17:58:03 UTC

OK, that makes sense. The upstream PR:

https://github.com/ceph/ceph/pull/8188

Comment 12 Rachana Patel 2016-06-13 23:19:11 UTC

verified with version 10.2.1-12.el7cp.x86_64.



[root@magna084 ubuntu]# ps auxww | grep ceph
ceph       34998  0.0  0.0 336964 24056 ?        Ssl  22:19   0:00 /usr/bin/ceph-mon -f --cluster ceph --id magna084 --setuser ceph --setgroup ceph
ceph       35108  0.2  0.0 877708 25644 ?        Ssl  22:19   0:00 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph
root       40539  0.0  0.0 112648   976 pts/1    S+   22:24   0:00 grep --color=auto ceph

[root@magna084 ubuntu]# kill -9 35108

[root@magna084 ubuntu]# ps auxww | grep ceph
ceph       34998  0.0  0.0 339012 25248 ?        Ssl  22:19   0:00 /usr/bin/ceph-mon -f --cluster ceph --id magna084 --setuser ceph --setgroup ceph
ceph       40620  1.7  0.0 862240 28204 ?        Ssl  22:26   0:00 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph
root       40786  0.0  0.0 112648   972 pts/1    S+   22:26   0:00 grep --color=auto ceph

[root@magna084 ubuntu]# kill -9 40620

[root@magna084 ubuntu]# ps auxww | grep ceph
ceph       34998  0.0  0.0 339012 25920 ?        Ssl  22:19   0:00 /usr/bin/ceph-mon -f --cluster ceph --id magna084 --setuser ceph --setgroup ceph
ceph       40841  4.6  0.0 862260 25584 ?        Ssl  22:27   0:00 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph
root       40983  0.0  0.0 112648   976 pts/1    S+   22:27   0:00 grep --color=auto ceph


[root@magna084 ubuntu]# kill -9 40841
[root@magna084 ubuntu]# ps auxww | grep ceph
ceph       34998  0.0  0.0 340036 25572 ?        Ssl  22:19   0:00 /usr/bin/ceph-mon -f --cluster ceph --id magna084 --setuser ceph --setgroup ceph
ceph       41038  3.3  0.0 869244 20764 ?        Ssl  22:27   0:00 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph
root       41184  0.0  0.0 112648   972 pts/1    S+   22:27   0:00 grep --color=auto ceph

[root@magna084 ubuntu]# kill -9 41038
[root@magna084 ubuntu]# ps auxww | grep ceph
ceph       34998  0.0  0.0 340036 26512 ?        Ssl  22:19   0:00 /usr/bin/ceph-mon -f --cluster ceph --id magna084 --setuser ceph --setgroup ceph
root       41193  0.0  0.0 112648   976 pts/1    S+   22:28   0:00 grep --color=auto ceph


Hence moving to verified

Comment 14 errata-xmlrpc 2016-08-23 19:27:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1755

Note You need to log in before you can comment on or make changes to this bug.