Bug 1267035 - Multiple ceph-osd daemons get started
Multiple ceph-osd daemons get started
Status: CLOSED DUPLICATE of bug 1299409
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RADOS (Show other bugs)
1.2.3
All Linux
medium Severity medium
: rc
: 1.3.3
Assigned To: Kefu Chai
ceph-qe-bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-09-28 17:21 EDT by Kyle Squizzato
Modified: 2017-07-30 11:12 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-02-01 00:15:22 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Bugzilla 1299409 None None None 2016-02-01 00:15 EST
Red Hat Knowledge Base (Solution) 1756313 None None None Never
Ceph Project Bug Tracker 13238 None None None Never
Ceph Project Bug Tracker 13422 None None None 2016-02-01 00:07 EST

  None (edit)
Description Kyle Squizzato 2015-09-28 17:21:13 EDT
Description of problem:
This may be a possible race condition in the starting of the ceph-osd daemons.  From what I can see

 * When we start an osd daemon we create a pid file under /var/run/ceph 
 * The initscript uses the pid file to prevent second starts of the daemon and kill the pid file during stop sequences 
 * The script then performs a mounting process to mount the data store, during this phase if the pid file was not created quick enough in the above step the daemon may actually get started a second time.  

Version-Release number of selected component (if applicable):
ceph 0.80.9 

How reproducible:
Rare

Steps to Reproduce:
I haven't been able to reproduce, but http://tracker.ceph.com/issues/13238 states "this problem can be reproduced easily if we add some sleep, like a second or so, before pidfile_write in global_init_postfork_start."

Actual results:
Duplicate ceph-osd daemons get started and result in "lock_fsid failed to lock" messages in logs

Expected results:
Only one ceph-osd should get started
Comment 1 Kyle Squizzato 2015-10-08 17:02:17 EDT
It appears that this behavior does not exist in hammer and was fixed there
Comment 2 Ken Dreyer (Red Hat) 2015-10-19 11:53:47 EDT
From a recent comment in the upstream ticket, this might not be fixed in Hammer after all. (It's not clear to me what a proper fix in /etc/init.d/ceph would look like)
Comment 5 Kefu Chai 2016-02-01 00:15:22 EST
BZ#1299409 is a dup lof

*** This bug has been marked as a duplicate of bug 1299409 ***

Note You need to log in before you can comment on or make changes to this bug.