Created attachment 1458031 [details] crush map Description of problem: 1 Pg stuck in active+clean+remapped state for over 12 hrs after running reweight-by-utilization Version-Release number of selected component (if applicable): ceph version 12.2.5-27.el7cp Steps to Reproduce: 1. Created cluster with only bluestore osds. Filled the cluster with 50-60% data. 2. Ran ceph osd reweight-by-utilization (default threshold) Actual results: One of the Pgs(13.5) belonging to an EC pool(overwrites enabled) is stuck in active+clean+remapped state $ceph osd erasure-code-profile get newprofile crush-device-class= crush-failure-domain=host crush-root=default jerasure-per-chunk-alignment=false k=4 m=2 plugin=jerasure technique=reed_sol_van w=8 $ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 10.91510 root default -3 2.72878 host magna106 1 hdd 0.90959 osd.1 up 1.00000 1.00000 2 hdd 0.90959 osd.2 up 1.00000 1.00000 3 hdd 0.90959 osd.3 up 1.00000 1.00000 -9 1.81918 host magna107 9 hdd 0.90959 osd.9 up 1.00000 1.00000 10 hdd 0.90959 osd.10 up 0.90002 1.00000 -7 0.90959 host magna108 8 hdd 0.90959 osd.8 up 1.00000 1.00000 -5 2.72878 host magna113 0 hdd 0.90959 osd.0 up 1.00000 1.00000 4 hdd 0.90959 osd.4 up 1.00000 1.00000 5 hdd 0.90959 osd.5 up 1.00000 1.00000 -13 1.81918 host magna114 7 hdd 0.90959 osd.7 up 1.00000 1.00000 11 hdd 0.90959 osd.11 up 1.00000 1.00000 -11 0.90959 host magna115 6 hdd 0.90959 osd.6 up 0.95001 1.00000 $ceph pg dump | grep active+clean+remapped dumped all 13.5 2691 0 0 2691 0 3178894720 9560 9560 active+clean+remapped 2018-07-11 07:59:52.847350 377'9560 509:16941 [4,3,10,11,NONE,8] 4 [4,3,10,11,8,8] 4 0'0 2018-07-09 13:11:58.937740 0'0 2018-07-09 13:11:58.937740 0 $ceph pg 13.5 query { "state": "active+clean+remapped", "snap_trimq": "[]", "snap_trimq_len": 0, "epoch": 537, "up": [ 4, 3, 10, 11, 2147483647, 8 ], "acting": [ 4, 3, 10, 11, 8, 8 ], "actingbackfill": [ "3(1)", "4(0)", "8(4)", "8(5)", "10(2)", "11(3)" ], Attached complete pg query, ceph osd dump and osd crush dump Additional info: I did not manually change crush map.
Created attachment 1458032 [details] pg query
Created attachment 1458033 [details] osd dump
being remapped isn't an error - in this case crush happens to not reach an assignment for shard 4, so the osds override it and add one.