This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1262976 - upstart: make config less generous about restarts
upstart: make config less generous about restarts
Status: CLOSED ERRATA
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RADOS (Show other bugs)
1.2.3
Unspecified Linux
unspecified Severity unspecified
: rc
: 1.3.0
Assigned To: Ken Dreyer (Red Hat)
ceph-qe-bugs
:
Depends On:
Blocks: 1253803 ceph131rn
  Show dependency treegraph
 
Reported: 2015-09-14 15:30 EDT by Samuel Just
Modified: 2017-07-30 11:09 EDT (History)
11 users (show)

See Also:
Fixed In Version: Ceph v0.94.1.8 (Ubuntu)
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1262974
Environment:
Last Closed: 2015-10-08 14:39:52 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Ceph Project Bug Tracker 11798 None None None Never

  None (edit)
Comment 2 Ken Dreyer (Red Hat) 2015-09-17 23:09:06 EDT
We are fixing this in 1.2.3.2 on Ubuntu (bz 1262974), so it also needs to be in the 1.3.0 GA Ubuntu build to avoid regressions for customers.
Comment 3 Ken Dreyer (Red Hat) 2015-09-18 14:28:45 EDT
This BZ does not apply to RHEL or CentOS, since the RHEL/CentOS packages still use the SysV init script. This BZ (for RHCS 1.3) only applies to Ubuntu Trusty.
Comment 4 Harish NV Rao 2015-09-23 02:51:54 EDT
Ken, can you please move this defect to ON_QA if this is fixed in 1.3.0?
Comment 5 Ken Dreyer (Red Hat) 2015-09-23 09:07:15 EDT
Sure
Comment 6 shylesh 2015-10-01 08:07:37 EDT
Hi Sam,

Based on discussion with you I ran the following script to kill ceph-mon in different kill intervals.


#!/bin/bash

killn=1
restartn=1

while [ true ]
do
        echo "kill no ="
        echo $killn     
        sudo pkill ceph-mon 
        sleep 2
        pgrep ceph-mon
        if [ $? != "0" ]; then
                echo "Mon not running"
                exit
        else
                echo "restart no ="
                echo $restartn
                restartn=$(($restartn + 1))
                killn=$((killn + 1))
        fi
        sleep $interval
done


where $interval = [480, 420,300,45] seconds

Here is the result I got

8 minutes
========

ubuntu@magna105:~$ ./mon-generic.sh
kill no =1
23762
restart no =1

kill no =2
26024
restart no =2
kill no =3
28265
restart no =3
kill no =4
Mon not running


7 min
==============
ubuntu@magna105:~$ ./mon-generic.sh
kill no =1
5944
restart no =1
kill no =2
7876
restart no =2
kill no =3
9836
restart no =3
kill no =4
11796
restart no =4

kill no =5
Mon not running



5 mins
========
ubuntu@magna105:~$ sleep 1800; ./mon-generic.sh

kill no =1
23175
restart no =1
kill no =2
24800
restart no =2

kill no =3
26428
restart no =3
kill no =4
Mon not running


45 seconds
========
ubuntu@magna105:~$ ./mon-generic.sh
kill no =1
1393
restart no =1
kill no =2
1726
restart no =2
kill no =3
1945
restart no =3
kill no =4
Mon not running


I see discrepancy in 7min kill interval run. Could you please confirm that is this the right behaviour. Plese not that there was atleast 30 mins gap between each of the category of runs.
Comment 7 Samuel Just 2015-10-02 12:38:05 EDT
I'm not worried about that, looks fine to me.
Comment 13 shylesh 2015-10-06 03:05:42 EDT
This bug has two parts 

1) As part of release note probably sam's comment Comment11 has to be included which will talk about what is the actual change.


2) As per Comment6 of this bug the number of restarts before upstart saturation for different kill intervals is not consistent , so this point has to be included in known issues so that user is aware of this.Hence I will be creating another defect for the part 2 .

Any concerns please let me know.
Comment 14 shylesh 2015-10-06 03:07:44 EDT
(In reply to shylesh from comment #13)
> This bug has two parts 
> 
> 1) As part of release note probably sam's comment Comment11 has to be
> included which will talk about what is the actual change.
> 
> 
> 2) As per Comment6 of this bug the number of restarts before upstart
> saturation for different kill intervals is not consistent , so this point
> has to be included in known issues so that user is aware of this.Hence I
> will be creating another defect for the part 2 .
> 
> Any concerns please let me know.

I have created tracker https://bugzilla.redhat.com/show_bug.cgi?id=1269048, to track part 2 to be made as known issue
Comment 15 Harish NV Rao 2015-10-08 13:27:34 EDT
Moving this defect to verified state based on the comment 7. For the issue described in comment 6, we have already opened BZ 1269048.
Comment 17 errata-xmlrpc 2015-10-08 14:39:52 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:1883
Comment 18 Federico Lucifredi 2015-11-09 20:58:44 EST
Upstart respawn limit changes: the following note will be added to our 1.3.1 release notes.

Release notes: "The upstart respawn limit has been changed from 5 restarts in 30 seconds to 3 restarts in 30 minutes for the OSD and MON daemons".

Note You need to log in before you can comment on or make changes to this bug.