Bug 1262976 - upstart: make config less generous about restarts
Summary: upstart: make config less generous about restarts
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 1.2.3
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: rc
: 1.3.0
Assignee: Ken Dreyer (Red Hat)
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks: 1253803 ceph131rn
TreeView+ depends on / blocked
 
Reported: 2015-09-14 19:30 UTC by Samuel Just
Modified: 2022-02-21 18:33 UTC (History)
11 users (show)

Fixed In Version: Ceph v0.94.1.8 (Ubuntu)
Doc Type: Bug Fix
Doc Text:
Clone Of: 1262974
Environment:
Last Closed: 2015-10-08 18:39:52 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 11798 0 None None None Never
Red Hat Issue Tracker RHCEPH-3473 0 None None None 2022-02-21 18:33:40 UTC
Red Hat Product Errata RHBA-2015:1883 0 normal SHIPPED_LIVE Red Hat Ceph Storage 1.3.0 bug fix and enhancement update 2016-02-02 21:58:28 UTC

Comment 2 Ken Dreyer (Red Hat) 2015-09-18 03:09:06 UTC
We are fixing this in 1.2.3.2 on Ubuntu (bz 1262974), so it also needs to be in the 1.3.0 GA Ubuntu build to avoid regressions for customers.

Comment 3 Ken Dreyer (Red Hat) 2015-09-18 18:28:45 UTC
This BZ does not apply to RHEL or CentOS, since the RHEL/CentOS packages still use the SysV init script. This BZ (for RHCS 1.3) only applies to Ubuntu Trusty.

Comment 4 Harish NV Rao 2015-09-23 06:51:54 UTC
Ken, can you please move this defect to ON_QA if this is fixed in 1.3.0?

Comment 5 Ken Dreyer (Red Hat) 2015-09-23 13:07:15 UTC
Sure

Comment 6 shylesh 2015-10-01 12:07:37 UTC
Hi Sam,

Based on discussion with you I ran the following script to kill ceph-mon in different kill intervals.


#!/bin/bash

killn=1
restartn=1

while [ true ]
do
        echo "kill no ="
        echo $killn     
        sudo pkill ceph-mon 
        sleep 2
        pgrep ceph-mon
        if [ $? != "0" ]; then
                echo "Mon not running"
                exit
        else
                echo "restart no ="
                echo $restartn
                restartn=$(($restartn + 1))
                killn=$((killn + 1))
        fi
        sleep $interval
done


where $interval = [480, 420,300,45] seconds

Here is the result I got

8 minutes
========

ubuntu@magna105:~$ ./mon-generic.sh
kill no =1
23762
restart no =1

kill no =2
26024
restart no =2
kill no =3
28265
restart no =3
kill no =4
Mon not running


7 min
==============
ubuntu@magna105:~$ ./mon-generic.sh
kill no =1
5944
restart no =1
kill no =2
7876
restart no =2
kill no =3
9836
restart no =3
kill no =4
11796
restart no =4

kill no =5
Mon not running



5 mins
========
ubuntu@magna105:~$ sleep 1800; ./mon-generic.sh

kill no =1
23175
restart no =1
kill no =2
24800
restart no =2

kill no =3
26428
restart no =3
kill no =4
Mon not running


45 seconds
========
ubuntu@magna105:~$ ./mon-generic.sh
kill no =1
1393
restart no =1
kill no =2
1726
restart no =2
kill no =3
1945
restart no =3
kill no =4
Mon not running


I see discrepancy in 7min kill interval run. Could you please confirm that is this the right behaviour. Plese not that there was atleast 30 mins gap between each of the category of runs.

Comment 7 Samuel Just 2015-10-02 16:38:05 UTC
I'm not worried about that, looks fine to me.

Comment 13 shylesh 2015-10-06 07:05:42 UTC
This bug has two parts 

1) As part of release note probably sam's comment Comment11 has to be included which will talk about what is the actual change.


2) As per Comment6 of this bug the number of restarts before upstart saturation for different kill intervals is not consistent , so this point has to be included in known issues so that user is aware of this.Hence I will be creating another defect for the part 2 .

Any concerns please let me know.

Comment 14 shylesh 2015-10-06 07:07:44 UTC
(In reply to shylesh from comment #13)
> This bug has two parts 
> 
> 1) As part of release note probably sam's comment Comment11 has to be
> included which will talk about what is the actual change.
> 
> 
> 2) As per Comment6 of this bug the number of restarts before upstart
> saturation for different kill intervals is not consistent , so this point
> has to be included in known issues so that user is aware of this.Hence I
> will be creating another defect for the part 2 .
> 
> Any concerns please let me know.

I have created tracker https://bugzilla.redhat.com/show_bug.cgi?id=1269048, to track part 2 to be made as known issue

Comment 15 Harish NV Rao 2015-10-08 17:27:34 UTC
Moving this defect to verified state based on the comment 7. For the issue described in comment 6, we have already opened BZ 1269048.

Comment 17 errata-xmlrpc 2015-10-08 18:39:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:1883

Comment 18 Federico Lucifredi 2015-11-10 01:58:44 UTC
Upstart respawn limit changes: the following note will be added to our 1.3.1 release notes.

Release notes: "The upstart respawn limit has been changed from 5 restarts in 30 seconds to 3 restarts in 30 minutes for the OSD and MON daemons".


Note You need to log in before you can comment on or make changes to this bug.