Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1394928

Summary:	rolling upgrade on ubuntu can leave osd processes running, but not marked up
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Samuel Just <sjust>
Component:	RADOS	Assignee:	Samuel Just <sjust>
Status:	CLOSED ERRATA	QA Contact:	shylesh <shmohan>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	2.1	CC:	ceph-eng-bugs, dzafman, gmeno, hnallurv, kchai, sjust, tchandra
Target Milestone:	rc
Target Release:	2.2
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	RHEL: ceph-10.2.5-7.el7cp Ubuntu: ceph_10.2.5-3redhat1xenial	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-03-14 15:46:57 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Samuel Just 2016-11-14 19:13:22 UTC

Description of problem:

Restarting an osd twice quickly enough can result in it starting up and sending an MOSDBoot message with the same epoch as the epoch in which it was marked up_from.  The mon then ignores the boot message and leaves the osd in limbo since it won't resend the boot.

I'm not leaving reproduction steps for this.  There are two other ansible bugs blocked on this one which reliably reproduce it, please verify those and consider this one verified if they are fixed.

Comment 3 Christina Meno 2016-11-15 18:23:17 UTC

Harish and Sam,

We got past this issue by changing ceph-ansible to not double start OSDs. https://bugzilla.redhat.com/show_bug.cgi?id=1394929

This fix is nice to have and as such I'm going to target it to 2.2 and fix the dependencies for the other BZs.

cheers,
G

Comment 5 Christina Meno 2017-01-17 17:04:42 UTC

Sam, Is this present in the latest RHCeph 2.2 build? i.e. can wee move the state to ON_QA ?

Comment 8 Tejas 2017-02-18 17:26:29 UTC

Verified this as part of the rolling_update tests in ceph 2.2 in build 10.2.5-26 on RHEl and 10.2.5-17 on ubuntu.
This issue is not seen anymore.
Moving to Verified.

Comment 10 errata-xmlrpc 2017-03-14 15:46:57 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0514.html