1267035 – Multiple ceph-osd daemons get started

Bug 1267035 - Multiple ceph-osd daemons get started

Summary: Multiple ceph-osd daemons get started

Keywords:
Status:	CLOSED DUPLICATE of bug 1299409
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RADOS
Sub Component:
Version:	1.2.3
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	1.3.3
Assignee:	Kefu Chai
QA Contact:	ceph-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-09-28 21:21 UTC by Kyle Squizzato
Modified:	2019-12-16 04:58 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-02-01 05:15:22 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	13238	None	None	None	Never
Ceph Project Bug Tracker	13422	None	None	None	2016-02-01 05:07:07 UTC
Red Hat Bugzilla	1299409	high	CLOSED	OSD processes doesn't have a PID and hence sysvinit fails to restart the OSD processes	2022-07-09 09:59:31 UTC
Red Hat Knowledge Base (Solution)	1756313	None	None	None	Never

Description Kyle Squizzato 2015-09-28 21:21:13 UTC

Description of problem:
This may be a possible race condition in the starting of the ceph-osd daemons.  From what I can see

 * When we start an osd daemon we create a pid file under /var/run/ceph 
 * The initscript uses the pid file to prevent second starts of the daemon and kill the pid file during stop sequences 
 * The script then performs a mounting process to mount the data store, during this phase if the pid file was not created quick enough in the above step the daemon may actually get started a second time.  

Version-Release number of selected component (if applicable):
ceph 0.80.9 

How reproducible:
Rare

Steps to Reproduce:
I haven't been able to reproduce, but http://tracker.ceph.com/issues/13238 states "this problem can be reproduced easily if we add some sleep, like a second or so, before pidfile_write in global_init_postfork_start."

Actual results:
Duplicate ceph-osd daemons get started and result in "lock_fsid failed to lock" messages in logs

Expected results:
Only one ceph-osd should get started

Comment 1 Kyle Squizzato 2015-10-08 21:02:17 UTC

It appears that this behavior does not exist in hammer and was fixed there

Comment 2 Ken Dreyer (Red Hat) 2015-10-19 15:53:47 UTC

From a recent comment in the upstream ticket, this might not be fixed in Hammer after all. (It's not clear to me what a proper fix in /etc/init.d/ceph would look like)

Comment 5 Kefu Chai 2016-02-01 05:15:22 UTC

BZ#1299409 is a dup lof

*** This bug has been marked as a duplicate of bug 1299409 ***

Note You need to log in before you can comment on or make changes to this bug.