Bug 1262977 - [RFE] verify that systemd config prevents repeatedly restarting daemons
Summary: [RFE] verify that systemd config prevents repeatedly restarting daemons
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 1.2.3
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: 2.0
Assignee: Boris Ranto
QA Contact: Rachana Patel
Bara Ancincova
URL:
Whiteboard:
Depends On:
Blocks: 1322504
TreeView+ depends on / blocked
 
Reported: 2015-09-14 19:31 UTC by Samuel Just
Modified: 2022-02-21 18:39 UTC (History)
8 users (show)

Fixed In Version: ceph-10.2.1-12.el7cp.x86_64
Doc Type: Enhancement
Doc Text:
.`systemd` now restarts failed Ceph services When a Ceph service, such as `ceph-mon` or `ceph-osd`, fails to start, the `systemd` daemon now attempts to restart the service. Prior to this update, Ceph services remained in the failed state.
Clone Of:
Environment:
Last Closed: 2016-08-23 19:27:08 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1262974 0 unspecified CLOSED upstart: make config less generous about restarts 2022-02-21 18:55:13 UTC
Red Hat Issue Tracker RHCEPH-3509 0 None None None 2022-02-21 18:39:38 UTC
Red Hat Product Errata RHBA-2016:1755 0 normal SHIPPED_LIVE Red Hat Ceph Storage 2.0 bug fix and enhancement update 2016-08-23 23:23:52 UTC

Internal Links: 1262974

Description Samuel Just 2015-09-14 19:31:47 UTC
Description of problem:

Linked bug describes the situation for upstart, verify that systemd behaves properly.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Ken Dreyer (Red Hat) 2015-09-14 19:40:54 UTC
systemd support will land in Infernalis/Jewel -> re-targeting

Comment 4 Ken Dreyer (Red Hat) 2016-02-29 19:34:05 UTC
Boris, I think we need to add the following to the systemd unit files:

  Restart=on-failure
  StartLimitBurst=3
  StartLimitInterval=1800

to match Upstart's "respawn" and "respawn limit" settings? Would you please submit a PR for that?

Comment 5 Boris Ranto 2016-03-15 21:17:18 UTC
Ken, we should definitely add the

Restart=on-failure

line if we want systemd to actually attempt to restart the services on failure. However, I'm not sure we want to override the defaults for the restarts -- systemd does have its own defaults for how often can a process fail in a period of time for it to give up.

Comment 6 Greg Farnum 2016-03-15 21:42:16 UTC
Those restart limits were chosen reasonably carefully based on characteristics of Ceph OSDs as IO-consuming beasts, of Ceph clusters as a whole, and the interaction between those two. The SystemD default process limits are unlikely to be useful in that regard, and we have those custom limits based on issues customers have run into with different rules. ;)

Comment 7 Boris Ranto 2016-03-17 17:58:03 UTC
OK, that makes sense. The upstream PR:

https://github.com/ceph/ceph/pull/8188

Comment 12 Rachana Patel 2016-06-13 23:19:11 UTC
verified with version 10.2.1-12.el7cp.x86_64.



[root@magna084 ubuntu]# ps auxww | grep ceph
ceph       34998  0.0  0.0 336964 24056 ?        Ssl  22:19   0:00 /usr/bin/ceph-mon -f --cluster ceph --id magna084 --setuser ceph --setgroup ceph
ceph       35108  0.2  0.0 877708 25644 ?        Ssl  22:19   0:00 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph
root       40539  0.0  0.0 112648   976 pts/1    S+   22:24   0:00 grep --color=auto ceph

[root@magna084 ubuntu]# kill -9 35108

[root@magna084 ubuntu]# ps auxww | grep ceph
ceph       34998  0.0  0.0 339012 25248 ?        Ssl  22:19   0:00 /usr/bin/ceph-mon -f --cluster ceph --id magna084 --setuser ceph --setgroup ceph
ceph       40620  1.7  0.0 862240 28204 ?        Ssl  22:26   0:00 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph
root       40786  0.0  0.0 112648   972 pts/1    S+   22:26   0:00 grep --color=auto ceph

[root@magna084 ubuntu]# kill -9 40620

[root@magna084 ubuntu]# ps auxww | grep ceph
ceph       34998  0.0  0.0 339012 25920 ?        Ssl  22:19   0:00 /usr/bin/ceph-mon -f --cluster ceph --id magna084 --setuser ceph --setgroup ceph
ceph       40841  4.6  0.0 862260 25584 ?        Ssl  22:27   0:00 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph
root       40983  0.0  0.0 112648   976 pts/1    S+   22:27   0:00 grep --color=auto ceph


[root@magna084 ubuntu]# kill -9 40841
[root@magna084 ubuntu]# ps auxww | grep ceph
ceph       34998  0.0  0.0 340036 25572 ?        Ssl  22:19   0:00 /usr/bin/ceph-mon -f --cluster ceph --id magna084 --setuser ceph --setgroup ceph
ceph       41038  3.3  0.0 869244 20764 ?        Ssl  22:27   0:00 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph
root       41184  0.0  0.0 112648   972 pts/1    S+   22:27   0:00 grep --color=auto ceph

[root@magna084 ubuntu]# kill -9 41038
[root@magna084 ubuntu]# ps auxww | grep ceph
ceph       34998  0.0  0.0 340036 26512 ?        Ssl  22:19   0:00 /usr/bin/ceph-mon -f --cluster ceph --id magna084 --setuser ceph --setgroup ceph
root       41193  0.0  0.0 112648   976 pts/1    S+   22:28   0:00 grep --color=auto ceph


Hence moving to verified

Comment 14 errata-xmlrpc 2016-08-23 19:27:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1755


Note You need to log in before you can comment on or make changes to this bug.