Bug 1262977 - [RFE] verify that systemd config prevents repeatedly restarting daemons
[RFE] verify that systemd config prevents repeatedly restarting daemons
Status: CLOSED ERRATA
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RADOS (Show other bugs)
1.2.3
Unspecified Unspecified
unspecified Severity unspecified
: rc
: 2.0
Assigned To: Boris Ranto
Rachana Patel
Bara Ancincova
: FutureFeature
Depends On:
Blocks: 1322504
  Show dependency treegraph
 
Reported: 2015-09-14 15:31 EDT by Samuel Just
Modified: 2017-07-30 11:09 EDT (History)
8 users (show)

See Also:
Fixed In Version: ceph-10.2.1-12.el7cp.x86_64
Doc Type: Enhancement
Doc Text:
.`systemd` now restarts failed Ceph services When a Ceph service, such as `ceph-mon` or `ceph-osd`, fails to start, the `systemd` daemon now attempts to restart the service. Prior to this update, Ceph services remained in the failed state.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-08-23 15:27:08 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Samuel Just 2015-09-14 15:31:47 EDT
Description of problem:

Linked bug describes the situation for upstart, verify that systemd behaves properly.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 3 Ken Dreyer (Red Hat) 2015-09-14 15:40:54 EDT
systemd support will land in Infernalis/Jewel -> re-targeting
Comment 4 Ken Dreyer (Red Hat) 2016-02-29 14:34:05 EST
Boris, I think we need to add the following to the systemd unit files:

  Restart=on-failure
  StartLimitBurst=3
  StartLimitInterval=1800

to match Upstart's "respawn" and "respawn limit" settings? Would you please submit a PR for that?
Comment 5 Boris Ranto 2016-03-15 17:17:18 EDT
Ken, we should definitely add the

Restart=on-failure

line if we want systemd to actually attempt to restart the services on failure. However, I'm not sure we want to override the defaults for the restarts -- systemd does have its own defaults for how often can a process fail in a period of time for it to give up.
Comment 6 Greg Farnum 2016-03-15 17:42:16 EDT
Those restart limits were chosen reasonably carefully based on characteristics of Ceph OSDs as IO-consuming beasts, of Ceph clusters as a whole, and the interaction between those two. The SystemD default process limits are unlikely to be useful in that regard, and we have those custom limits based on issues customers have run into with different rules. ;)
Comment 7 Boris Ranto 2016-03-17 13:58:03 EDT
OK, that makes sense. The upstream PR:

https://github.com/ceph/ceph/pull/8188
Comment 12 Rachana Patel 2016-06-13 19:19:11 EDT
verified with version 10.2.1-12.el7cp.x86_64.



[root@magna084 ubuntu]# ps auxww | grep ceph
ceph       34998  0.0  0.0 336964 24056 ?        Ssl  22:19   0:00 /usr/bin/ceph-mon -f --cluster ceph --id magna084 --setuser ceph --setgroup ceph
ceph       35108  0.2  0.0 877708 25644 ?        Ssl  22:19   0:00 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph
root       40539  0.0  0.0 112648   976 pts/1    S+   22:24   0:00 grep --color=auto ceph

[root@magna084 ubuntu]# kill -9 35108

[root@magna084 ubuntu]# ps auxww | grep ceph
ceph       34998  0.0  0.0 339012 25248 ?        Ssl  22:19   0:00 /usr/bin/ceph-mon -f --cluster ceph --id magna084 --setuser ceph --setgroup ceph
ceph       40620  1.7  0.0 862240 28204 ?        Ssl  22:26   0:00 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph
root       40786  0.0  0.0 112648   972 pts/1    S+   22:26   0:00 grep --color=auto ceph

[root@magna084 ubuntu]# kill -9 40620

[root@magna084 ubuntu]# ps auxww | grep ceph
ceph       34998  0.0  0.0 339012 25920 ?        Ssl  22:19   0:00 /usr/bin/ceph-mon -f --cluster ceph --id magna084 --setuser ceph --setgroup ceph
ceph       40841  4.6  0.0 862260 25584 ?        Ssl  22:27   0:00 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph
root       40983  0.0  0.0 112648   976 pts/1    S+   22:27   0:00 grep --color=auto ceph


[root@magna084 ubuntu]# kill -9 40841
[root@magna084 ubuntu]# ps auxww | grep ceph
ceph       34998  0.0  0.0 340036 25572 ?        Ssl  22:19   0:00 /usr/bin/ceph-mon -f --cluster ceph --id magna084 --setuser ceph --setgroup ceph
ceph       41038  3.3  0.0 869244 20764 ?        Ssl  22:27   0:00 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph
root       41184  0.0  0.0 112648   972 pts/1    S+   22:27   0:00 grep --color=auto ceph

[root@magna084 ubuntu]# kill -9 41038
[root@magna084 ubuntu]# ps auxww | grep ceph
ceph       34998  0.0  0.0 340036 26512 ?        Ssl  22:19   0:00 /usr/bin/ceph-mon -f --cluster ceph --id magna084 --setuser ceph --setgroup ceph
root       41193  0.0  0.0 112648   976 pts/1    S+   22:28   0:00 grep --color=auto ceph


Hence moving to verified
Comment 14 errata-xmlrpc 2016-08-23 15:27:08 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1755

Note You need to log in before you can comment on or make changes to this bug.