Description of problem: reweight-by-utilization command along with the decreasing the overutilized osds it will also increase the weight of underutilized, But while choosing the underutilized there is a small glitch reweight-by-utilization works on a sorted list from higher to lower utilized osds. for ex assume utilization list looks like [0.80, 0.75,0.71, 0.60, 0.55,0.45,0.30] and assume avg_utilization=0.60 and max_osd=4. Now assume there are 2 osds which satisfy oload values and it will be picked for decreasing the weight, since max_osd is 4 we can choose 2 more osds for increasing the weight. While walking in this list first osd with utilization of 0.55(which is less than 0.60) will be considered as underutilized and thereby its weight will be increased. Instead we can pick the osd with 0.30 utilization since its the most underutilized osd. Version-Release number of selected component (if applicable): 1.3.2 Async [ubuntu@magna009 ~]$ rpm -qa| grep ceph ceph-common-0.94.5-12.el7cp.x86_64 ceph-deploy-1.5.27.4-3.el7cp.noarch ceph-selinux-0.94.5-12.el7cp.x86_64 ceph-mon-0.94.5-12.el7cp.x86_64 ceph-0.94.5-12.el7cp.x86_64 mod_fastcgi-2.4.7-1.ceph.el7.x86_64 iozone-3.424-2_ceph.el7.x86_64 ceph-debuginfo-0.94.5-12.el7cp.x86_64 Additional info: ubuntu@magna003:~/ceph-config$ sudo ceph osd df ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR 0 0.89999 0.80005 926G 782G 143G 84.50 1.10 1 0.89989 0 0 0 0 0 0 2 0.89999 0.80005 926G 649G 276G 70.11 0.91 3 0.89999 1.00000 926G 741G 185G 80.02 1.04 4 0.89999 1.00000 926G 783G 142G 84.61 1.10 5 0.89999 1.00000 926G 647G 278G 69.94 0.91 6 0.89999 0.80005 926G 678G 247G 73.24 0.95 7 0.89999 1.00000 926G 713G 212G 77.09 1.00 8 0.89999 1.00000 926G 737G 188G 79.65 1.03 9 0.89999 1.00000 926G 794G 131G 85.76 1.11 10 0.89999 1.00000 926G 670G 255G 72.41 0.94 11 0.89999 1.00000 926G 646G 279G 69.79 0.91 TOTAL 10186G 7844G 2341G 77.01 MIN/MAX VAR: 0/1.11 STDDEV: 5.95 ubuntu@magna003:~/ceph-config$ sudo ceph osd test-reweight-by-utilization 105 .05 5 no change moved 59 / 1944 (3.03498%) avg 176.727 stddev 10.1273 -> 8.28022 (expected baseline 12.6752) min osd.8 with 190 -> 183 pgs (1.0751 -> 1.03549 * mean) max osd.6 with 157 -> 168 pgs (0.888374 -> 0.950617 * mean) oload 105 max_change 0.05 max_change_osds 5 average 0.770104 overload 0.808609 osd.9 weight 1.000000 -> 0.950012 osd.4 weight 1.000000 -> 0.950012 osd.0 weight 0.800049 -> 0.750061 osd.6 weight 0.800049 -> 0.841187 osd.2 weight 0.800049 -> 0.850037 in the above output there are 2 osds [osd.6, osd.2] picked as underutilized osds. Actually osd.11 should be picked because 69.79 is the least utilization but since in the list least value next to avg_util is osd.6 hence it is picked.
http://tracker.ceph.com/issues/15686
This probably should not hold up 1.3.2 -- advisory to user would be the right thing.
Sage already has a PR for this, actually.
PR is merged to master, but still needs to be backported to jewel and hammer upstream.
Sage pushed this to the jewel branch, so it will be in the upcoming v10.2.1.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1755.html