Bug 1262976
Summary: | upstart: make config less generous about restarts | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Samuel Just <sjust> |
Component: | RADOS | Assignee: | Ken Dreyer (Red Hat) <kdreyer> |
Status: | CLOSED ERRATA | QA Contact: | ceph-qe-bugs <ceph-qe-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 1.2.3 | CC: | ceph-eng-bugs, ceph-qe-bugs, dzafman, flucifre, hnallurv, kchai, kdreyer, nlevine, shmohan, sjust, tmuthami |
Target Milestone: | rc | ||
Target Release: | 1.3.0 | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Ceph v0.94.1.8 (Ubuntu) | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | 1262974 | Environment: | |
Last Closed: | 2015-10-08 18:39:52 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1253803, 1262054 |
Comment 2
Ken Dreyer (Red Hat)
2015-09-18 03:09:06 UTC
This BZ does not apply to RHEL or CentOS, since the RHEL/CentOS packages still use the SysV init script. This BZ (for RHCS 1.3) only applies to Ubuntu Trusty. Ken, can you please move this defect to ON_QA if this is fixed in 1.3.0? Sure Hi Sam, Based on discussion with you I ran the following script to kill ceph-mon in different kill intervals. #!/bin/bash killn=1 restartn=1 while [ true ] do echo "kill no =" echo $killn sudo pkill ceph-mon sleep 2 pgrep ceph-mon if [ $? != "0" ]; then echo "Mon not running" exit else echo "restart no =" echo $restartn restartn=$(($restartn + 1)) killn=$((killn + 1)) fi sleep $interval done where $interval = [480, 420,300,45] seconds Here is the result I got 8 minutes ======== ubuntu@magna105:~$ ./mon-generic.sh kill no =1 23762 restart no =1 kill no =2 26024 restart no =2 kill no =3 28265 restart no =3 kill no =4 Mon not running 7 min ============== ubuntu@magna105:~$ ./mon-generic.sh kill no =1 5944 restart no =1 kill no =2 7876 restart no =2 kill no =3 9836 restart no =3 kill no =4 11796 restart no =4 kill no =5 Mon not running 5 mins ======== ubuntu@magna105:~$ sleep 1800; ./mon-generic.sh kill no =1 23175 restart no =1 kill no =2 24800 restart no =2 kill no =3 26428 restart no =3 kill no =4 Mon not running 45 seconds ======== ubuntu@magna105:~$ ./mon-generic.sh kill no =1 1393 restart no =1 kill no =2 1726 restart no =2 kill no =3 1945 restart no =3 kill no =4 Mon not running I see discrepancy in 7min kill interval run. Could you please confirm that is this the right behaviour. Plese not that there was atleast 30 mins gap between each of the category of runs. I'm not worried about that, looks fine to me. This bug has two parts 1) As part of release note probably sam's comment Comment11 has to be included which will talk about what is the actual change. 2) As per Comment6 of this bug the number of restarts before upstart saturation for different kill intervals is not consistent , so this point has to be included in known issues so that user is aware of this.Hence I will be creating another defect for the part 2 . Any concerns please let me know. (In reply to shylesh from comment #13) > This bug has two parts > > 1) As part of release note probably sam's comment Comment11 has to be > included which will talk about what is the actual change. > > > 2) As per Comment6 of this bug the number of restarts before upstart > saturation for different kill intervals is not consistent , so this point > has to be included in known issues so that user is aware of this.Hence I > will be creating another defect for the part 2 . > > Any concerns please let me know. I have created tracker https://bugzilla.redhat.com/show_bug.cgi?id=1269048, to track part 2 to be made as known issue Moving this defect to verified state based on the comment 7. For the issue described in comment 6, we have already opened BZ 1269048. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2015:1883 Upstart respawn limit changes: the following note will be added to our 1.3.1 release notes. Release notes: "The upstart respawn limit has been changed from 5 restarts in 30 seconds to 3 restarts in 30 minutes for the OSD and MON daemons". |