Description of problem: When sending an MOSDMap message to clients with an older OSDMap, we do not reencode the CRUSH map in the incremental map. This is imprecise at best, and at worst will lead to incorrect weights if a weight-set is in use (specifically a compat weight-set) such that client IOs can stall. How reproducible: 100% Steps to Reproduce: 1. mount from non-mimic client (rbd, cephfs, whatever) 2. ceph osd crush weight-set create-compat 3. ceph osd crush weight-set reweight-compat osd.0 .5 4. do a bunch of IO, and some IOs will stall (from clients perspective; OSD will not report them stalled and ceph -s will not show slow requests).
This is a blocker because it could cause "Data service unavailability when following a prescribed methodology in the documentation " https://mojo.redhat.com/docs/DOC-1146159
Sage would you please link the upstream tracker / PR here ? Once we have that we can seek acks and potentially move to modified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3387