Bug 1394928 - rolling upgrade on ubuntu can leave osd processes running, but not marked up
Summary: rolling upgrade on ubuntu can leave osd processes running, but not marked up
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 2.1
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: 2.2
Assignee: Samuel Just
QA Contact: shylesh
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-14 19:13 UTC by Samuel Just
Modified: 2022-02-21 18:17 UTC (History)
7 users (show)

Fixed In Version: RHEL: ceph-10.2.5-7.el7cp Ubuntu: ceph_10.2.5-3redhat1xenial
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-14 15:46:57 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 17899 0 None None None 2016-11-14 19:15:02 UTC
Red Hat Product Errata RHBA-2017:0514 0 normal SHIPPED_LIVE Red Hat Ceph Storage 2.2 bug fix and enhancement update 2017-03-21 07:24:26 UTC

Description Samuel Just 2016-11-14 19:13:22 UTC
Description of problem:

Restarting an osd twice quickly enough can result in it starting up and sending an MOSDBoot message with the same epoch as the epoch in which it was marked up_from.  The mon then ignores the boot message and leaves the osd in limbo since it won't resend the boot.

I'm not leaving reproduction steps for this.  There are two other ansible bugs blocked on this one which reliably reproduce it, please verify those and consider this one verified if they are fixed.

Comment 3 Christina Meno 2016-11-15 18:23:17 UTC
Harish and Sam,

We got past this issue by changing ceph-ansible to not double start OSDs. https://bugzilla.redhat.com/show_bug.cgi?id=1394929

This fix is nice to have and as such I'm going to target it to 2.2 and fix the dependencies for the other BZs.

cheers,
G

Comment 5 Christina Meno 2017-01-17 17:04:42 UTC
Sam, Is this present in the latest RHCeph 2.2 build? i.e. can wee move the state to ON_QA ?

Comment 8 Tejas 2017-02-18 17:26:29 UTC
Verified this as part of the rolling_update tests in ceph 2.2 in build 10.2.5-26 on RHEl and 10.2.5-17 on ubuntu.
This issue is not seen anymore.
Moving to Verified.

Comment 10 errata-xmlrpc 2017-03-14 15:46:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0514.html


Note You need to log in before you can comment on or make changes to this bug.