Bug 1394928

Summary: rolling upgrade on ubuntu can leave osd processes running, but not marked up
Product: Red Hat Ceph Storage Reporter: Samuel Just <sjust>
Component: RADOSAssignee: Samuel Just <sjust>
Status: CLOSED ERRATA QA Contact: shylesh <shmohan>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.1CC: ceph-eng-bugs, dzafman, gmeno, hnallurv, kchai, sjust, tchandra
Target Milestone: rc   
Target Release: 2.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-10.2.5-7.el7cp Ubuntu: ceph_10.2.5-3redhat1xenial Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-14 15:46:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Samuel Just 2016-11-14 19:13:22 UTC
Description of problem:

Restarting an osd twice quickly enough can result in it starting up and sending an MOSDBoot message with the same epoch as the epoch in which it was marked up_from.  The mon then ignores the boot message and leaves the osd in limbo since it won't resend the boot.

I'm not leaving reproduction steps for this.  There are two other ansible bugs blocked on this one which reliably reproduce it, please verify those and consider this one verified if they are fixed.

Comment 3 Christina Meno 2016-11-15 18:23:17 UTC
Harish and Sam,

We got past this issue by changing ceph-ansible to not double start OSDs. https://bugzilla.redhat.com/show_bug.cgi?id=1394929

This fix is nice to have and as such I'm going to target it to 2.2 and fix the dependencies for the other BZs.

cheers,
G

Comment 5 Christina Meno 2017-01-17 17:04:42 UTC
Sam, Is this present in the latest RHCeph 2.2 build? i.e. can wee move the state to ON_QA ?

Comment 8 Tejas 2017-02-18 17:26:29 UTC
Verified this as part of the rolling_update tests in ceph 2.2 in build 10.2.5-26 on RHEl and 10.2.5-17 on ubuntu.
This issue is not seen anymore.
Moving to Verified.

Comment 10 errata-xmlrpc 2017-03-14 15:46:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0514.html