Description of problem: This may be a possible race condition in the starting of the ceph-osd daemons. From what I can see * When we start an osd daemon we create a pid file under /var/run/ceph * The initscript uses the pid file to prevent second starts of the daemon and kill the pid file during stop sequences * The script then performs a mounting process to mount the data store, during this phase if the pid file was not created quick enough in the above step the daemon may actually get started a second time. Version-Release number of selected component (if applicable): ceph 0.80.9 How reproducible: Rare Steps to Reproduce: I haven't been able to reproduce, but http://tracker.ceph.com/issues/13238 states "this problem can be reproduced easily if we add some sleep, like a second or so, before pidfile_write in global_init_postfork_start." Actual results: Duplicate ceph-osd daemons get started and result in "lock_fsid failed to lock" messages in logs Expected results: Only one ceph-osd should get started
It appears that this behavior does not exist in hammer and was fixed there
From a recent comment in the upstream ticket, this might not be fixed in Hammer after all. (It's not clear to me what a proper fix in /etc/init.d/ceph would look like)
BZ#1299409 is a dup lof *** This bug has been marked as a duplicate of bug 1299409 ***