Hide Forgot
Description of problem: RHCS 1.3.1- 0.94.3-3.el7cp - pgs stuck in remapped after recovery on cluster with many osds down http://tracker.ceph.com/issues/18145 During a maintenance event in which many osds were removed and some existing osds were down the cluster went into recovery. Once recovery had completed ten pgs were left stuck in "remapped" state and only a restart of the primaries involved resolved the issue and allowed these pgs to complete peering successfully. ^^ Instead of restart we tried marking primary as down ($ceph osd down). Version-Release number of selected component (if applicable): Red Hat Ceph Storage 1.3.1- 0.94.3-3.el7cp
Included in 10.2.3 upstream.
(In reply to Josh Durgin from comment #5) > Included in 10.2.3 upstream. What is Josh? Root cause for this was never established and I doubt it ever will be so I suspect we can close this. We have done extensive code review to try and work out how this came about but were not able to come up with a viable theory. I'll tidy up the upstream tracker and probably close it, and this, on Monday.
We have tried extensively to reproduce this issue as well as doing an exhaustive code review and have not been able to determine the cause. Closing this for now but please feel free to reopen if more information comes to light.