Bug 1262977
Summary: | [RFE] verify that systemd config prevents repeatedly restarting daemons | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Samuel Just <sjust> |
Component: | RADOS | Assignee: | Boris Ranto <branto> |
Status: | CLOSED ERRATA | QA Contact: | Rachana Patel <racpatel> |
Severity: | medium | Docs Contact: | Bara Ancincova <bancinco> |
Priority: | unspecified | ||
Version: | 1.2.3 | CC: | branto, ceph-eng-bugs, dzafman, gfarnum, hnallurv, kchai, kdreyer, nlevine |
Target Milestone: | rc | Keywords: | FutureFeature |
Target Release: | 2.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | ceph-10.2.1-12.el7cp.x86_64 | Doc Type: | Enhancement |
Doc Text: |
.`systemd` now restarts failed Ceph services
When a Ceph service, such as `ceph-mon` or `ceph-osd`, fails to start, the `systemd` daemon now attempts to restart the service. Prior to this update, Ceph services remained in the failed state.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2016-08-23 19:27:08 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1322504 |
Description
Samuel Just
2015-09-14 19:31:47 UTC
systemd support will land in Infernalis/Jewel -> re-targeting Boris, I think we need to add the following to the systemd unit files: Restart=on-failure StartLimitBurst=3 StartLimitInterval=1800 to match Upstart's "respawn" and "respawn limit" settings? Would you please submit a PR for that? Ken, we should definitely add the Restart=on-failure line if we want systemd to actually attempt to restart the services on failure. However, I'm not sure we want to override the defaults for the restarts -- systemd does have its own defaults for how often can a process fail in a period of time for it to give up. Those restart limits were chosen reasonably carefully based on characteristics of Ceph OSDs as IO-consuming beasts, of Ceph clusters as a whole, and the interaction between those two. The SystemD default process limits are unlikely to be useful in that regard, and we have those custom limits based on issues customers have run into with different rules. ;) OK, that makes sense. The upstream PR: https://github.com/ceph/ceph/pull/8188 verified with version 10.2.1-12.el7cp.x86_64. [root@magna084 ubuntu]# ps auxww | grep ceph ceph 34998 0.0 0.0 336964 24056 ? Ssl 22:19 0:00 /usr/bin/ceph-mon -f --cluster ceph --id magna084 --setuser ceph --setgroup ceph ceph 35108 0.2 0.0 877708 25644 ? Ssl 22:19 0:00 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph root 40539 0.0 0.0 112648 976 pts/1 S+ 22:24 0:00 grep --color=auto ceph [root@magna084 ubuntu]# kill -9 35108 [root@magna084 ubuntu]# ps auxww | grep ceph ceph 34998 0.0 0.0 339012 25248 ? Ssl 22:19 0:00 /usr/bin/ceph-mon -f --cluster ceph --id magna084 --setuser ceph --setgroup ceph ceph 40620 1.7 0.0 862240 28204 ? Ssl 22:26 0:00 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph root 40786 0.0 0.0 112648 972 pts/1 S+ 22:26 0:00 grep --color=auto ceph [root@magna084 ubuntu]# kill -9 40620 [root@magna084 ubuntu]# ps auxww | grep ceph ceph 34998 0.0 0.0 339012 25920 ? Ssl 22:19 0:00 /usr/bin/ceph-mon -f --cluster ceph --id magna084 --setuser ceph --setgroup ceph ceph 40841 4.6 0.0 862260 25584 ? Ssl 22:27 0:00 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph root 40983 0.0 0.0 112648 976 pts/1 S+ 22:27 0:00 grep --color=auto ceph [root@magna084 ubuntu]# kill -9 40841 [root@magna084 ubuntu]# ps auxww | grep ceph ceph 34998 0.0 0.0 340036 25572 ? Ssl 22:19 0:00 /usr/bin/ceph-mon -f --cluster ceph --id magna084 --setuser ceph --setgroup ceph ceph 41038 3.3 0.0 869244 20764 ? Ssl 22:27 0:00 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph root 41184 0.0 0.0 112648 972 pts/1 S+ 22:27 0:00 grep --color=auto ceph [root@magna084 ubuntu]# kill -9 41038 [root@magna084 ubuntu]# ps auxww | grep ceph ceph 34998 0.0 0.0 340036 26512 ? Ssl 22:19 0:00 /usr/bin/ceph-mon -f --cluster ceph --id magna084 --setuser ceph --setgroup ceph root 41193 0.0 0.0 112648 976 pts/1 S+ 22:28 0:00 grep --color=auto ceph Hence moving to verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1755 |