Bug 1471939

Summary: pre-jewel "osd rm" incrementals are misinterpreted
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vikhyat Umrao <vumrao>
Component: RADOSAssignee: Josh Durgin <jdurgin>
Status: CLOSED ERRATA QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Severity: high Docs Contact: Bara Ancincova <bancinco>
Priority: medium    
Version: 1.3.3CC: anharris, ceph-eng-bugs, dzafman, icolle, jdurgin, kchai, tserlin, vakulkar
Target Milestone: rc   
Target Release: 2.4   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: RHEL: ceph-2:10.2.7-39.el7cp Ubuntu: ceph_10.2.7-38redhat1 Doc Type: Bug Fix
Doc Text:
.CRUSH calculations for removed OSDs match on kernel clients and the cluster When an OSD was removed with the `ceph osd rm` command, but was still present in the CRUSH map, the CRUSH calculations for that OSD on kernel clients and the cluster did not match. Consequently, kernel clients returned I/O errors. The mismatch between client and server behavior has been fixed and kernel clients do not return the I/O errors anymore in this situation.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-10-17 18:12:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1473436, 1479701    

Description Vikhyat Umrao 2017-07-17 17:47:46 UTC
Description of problem:
pre-jewel "osd rm" incrementals are misinterpreted
http://tracker.ceph.com/issues/19119

Upstream PR: https://github.com/ceph/ceph/pull/13730
Release notes:https://github.com/ceph/ceph/pull/13731/files


* There was a bug introduced in Jewel (#19119) that broke the mapping behavior
  when an "out" OSD that still existed in the CRUSH map was removed with 'osd rm'. This could result in 'misdirected op' and other errors.  The bug is now fixed, but the fix itself introduces the same risk because the behavior may vary between clients and OSDs.  

To avoid problems, please ensure that all OSDs are removed from the CRUSH map before deleting them.  That is, be sure to do::

   ceph osd crush rm osd.123

before::

   ceph osd rm osd.123


We have a Kernel RBD RHEL 7.4 bug https://bugzilla.redhat.com/show_bug.cgi?id=1427556 which depends on this bug.


Version-Release number of selected component (if applicable):
Red Hat Ceph Storage 1.3.3

Comment 4 Ian Colle 2017-08-16 21:49:36 UTC
Moving to 2.5 until 2.4 proper is released.

Comment 14 Josh Durgin 2017-10-12 17:46:04 UTC
Looks good, thanks Bara!

Comment 16 errata-xmlrpc 2017-10-17 18:12:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2903