Bug 1262974
Summary: | upstart: make config less generous about restarts | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Samuel Just <sjust> | |
Component: | RADOS | Assignee: | Ken Dreyer (Red Hat) <kdreyer> | |
Status: | CLOSED ERRATA | QA Contact: | ceph-qe-bugs <ceph-qe-bugs> | |
Severity: | medium | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 1.2.3 | CC: | ceph-eng-bugs, dzafman, kchai, kdreyer, nlevine, tmuthami | |
Target Milestone: | rc | |||
Target Release: | 1.2.4 | |||
Hardware: | Unspecified | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Prior to this update, the upstart init system would restart Ceph's daemons too frequently, up to five times in 30 seconds. This could lead to startup respawn loops that mask other issues, such as disk state problems. This update adjusts Ceph's upstart settings to restart daemons less aggressively, three times in 30 minutes.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1262976 (view as bug list) | Environment: | ||
Last Closed: | 2015-10-01 21:01:06 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: |
Description
Samuel Just
2015-09-14 19:24:35 UTC
Description of problem: upstart is too generous about restarting broken daemons Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. start cluster 2. /var/log/ceph directory mount read only on a mon node 3. watch ceph-mon repeatedly restart Actual results: ceph-mon repeatedly restarts Expected results: ceph-mon repeatedly restarts for a bit, and then remains dead Additional info: Fix will be in non-RHEL Ceph v0.80.8.5 verified and the fix works fine. 1. sudo pkill -9 -f 'ceph -i 0' - kill osd.0 2. wait for 30 seconds 3. look for upstart restarting the daemons repeat the above steps 2 more times and then upstart will stop restarting the daemon. later, to bring up the osd.0, use "sudo start ceph-osd id=0". upstart should not restart daemons, when killed more than 3 times within 30 minute time frame. if after upgrading from rh ceph 1.2.3 to 1.2.3-2 or 1.2.3 to 1.2.3-1 to 1.2.3-2 , the fix doesnt work, reboot the cluster once and retry. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2015:1572 |