Bug 1267035 - Multiple ceph-osd daemons get started
Multiple ceph-osd daemons get started
Status: CLOSED DUPLICATE of bug 1299409
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RADOS (Show other bugs)
All Linux
medium Severity medium
: rc
: 1.3.3
Assigned To: Kefu Chai
Depends On:
  Show dependency treegraph
Reported: 2015-09-28 17:21 EDT by Kyle Squizzato
Modified: 2017-07-30 11:12 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2016-02-01 00:15:22 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

External Trackers
Tracker ID Priority Status Summary Last Updated
Ceph Project Bug Tracker 13238 None None None Never
Ceph Project Bug Tracker 13422 None None None 2016-02-01 00:07 EST
Red Hat Knowledge Base (Solution) 1756313 None None None Never
Red Hat Bugzilla 1299409 None None None 2016-02-01 00:15 EST

  None (edit)
Description Kyle Squizzato 2015-09-28 17:21:13 EDT
Description of problem:
This may be a possible race condition in the starting of the ceph-osd daemons.  From what I can see

 * When we start an osd daemon we create a pid file under /var/run/ceph 
 * The initscript uses the pid file to prevent second starts of the daemon and kill the pid file during stop sequences 
 * The script then performs a mounting process to mount the data store, during this phase if the pid file was not created quick enough in the above step the daemon may actually get started a second time.  

Version-Release number of selected component (if applicable):
ceph 0.80.9 

How reproducible:

Steps to Reproduce:
I haven't been able to reproduce, but http://tracker.ceph.com/issues/13238 states "this problem can be reproduced easily if we add some sleep, like a second or so, before pidfile_write in global_init_postfork_start."

Actual results:
Duplicate ceph-osd daemons get started and result in "lock_fsid failed to lock" messages in logs

Expected results:
Only one ceph-osd should get started
Comment 1 Kyle Squizzato 2015-10-08 17:02:17 EDT
It appears that this behavior does not exist in hammer and was fixed there
Comment 2 Ken Dreyer (Red Hat) 2015-10-19 11:53:47 EDT
From a recent comment in the upstream ticket, this might not be fixed in Hammer after all. (It's not clear to me what a proper fix in /etc/init.d/ceph would look like)
Comment 5 Kefu Chai 2016-02-01 00:15:22 EST
BZ#1299409 is a dup lof

*** This bug has been marked as a duplicate of bug 1299409 ***

Note You need to log in before you can comment on or make changes to this bug.