Description of problem: While reweight-by-utilization on a cluster with max_change set to 1 could you lead to osd getting heavy weight of 65534 due to integer overflow of the variable Version-Release number of selected component (if applicable): 1.3.2 Async release. [root@magna105 ~]# rpm -qa| grep ceph ceph-mon-0.94.5-12.el7cp.x86_64 ceph-common-0.94.5-12.el7cp.x86_64 ceph-selinux-0.94.5-12.el7cp.x86_64 mod_fastcgi-2.4.7-1.ceph.el7.x86_64 iozone-3.424-2_ceph.el7.x86_64 ceph-0.94.5-12.el7cp.x86_64 How reproducible: Steps to Reproduce: 1.created a cluster with 9 osds 2.filled up data and observerd that there is some imbalance in the data distribution 3.Ran "ceph osd reweight-by-utilization 110" Actual results: [root@magna105 ~]# ceph osd df ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR 0 0.89999 1.00000 926G 692G 233G 74.79 1.02 1 0.89999 1.00000 926G 682G 243G 73.72 1.01 2 0.89999 65535.94922 926G 709G 216G 76.64 1.05 3 0.75000 1.00000 926G 626G 299G 67.62 0.93 4 0.89999 65533.94922 926G 711G 215G 76.78 1.05 5 0.89999 1.00000 926G 572G 353G 61.87 0.85 6 0.79999 1.00000 926G 685G 240G 74.03 1.01 7 0.89999 1.00000 926G 624G 301G 67.42 0.92 8 0.89999 1.00000 926G 787G 138G 85.05 1.16 TOTAL 8334G 6092G 2241G 73.10 MIN/MAX VAR: 0.85/1.16 STDDEV: 3.61 some of the osds got very high value for the reweight due to integer overflow in the calculation src/mon/OSDMonitor.cc new_weight = MAX(new_weight, weight - max_change); Additional info: [root@magna105 ~]# ceph -s cluster 6de276f4-42aa-4de9-85d7-6f879ce1faa3 health HEALTH_WARN clock skew detected on mon.magna107, mon.magna108 Monitor clock skew detected monmap e1: 3 mons at {magna105=10.8.128.105:6789/0,magna107=10.8.128.107:6789/0,magna108=10.8.128.108:6789/0} election epoch 24, quorum 0,1,2 magna105,magna107,magna108 osdmap e430: 9 osds: 9 up, 9 in pgmap v34074: 128 pgs, 9 pools, 1978 GB data, 495 kobjects 5944 GB used, 2390 GB / 8334 GB avail 128 active+clean [root@magna105 ~]# ceph osd df ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR 0 0.89999 1.00000 926G 712G 213G 76.93 1.08 1 0.89999 1.00000 926G 704G 221G 76.05 1.07 2 0.89999 0.95001 926G 771G 154G 83.26 1.17 3 0.75000 1.00000 926G 617G 308G 66.70 0.94 4 0.89999 0.95001 926G 727G 198G 78.58 1.10 5 0.89999 1.00000 926G 557G 368G 60.22 0.84 6 0.79999 1.00000 926G 668G 257G 72.21 1.01 7 0.89999 1.00000 926G 609G 316G 65.78 0.92 8 0.89999 0.84999 926G 575G 350G 62.12 0.87 TOTAL 8334G 5944G 2390G 71.32 MIN/MAX VAR: 0.84/1.17 STDDEV: 7.46 [root@magna105 ~]# ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 8334G 2390G 5944G 71.32 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS rbd 0 1978G 23.74 450G 506530 .rgw.root 1 848 0 450G 3 .rgw.control 2 0 0 450G 8 .rgw 3 704 0 450G 4 .rgw.gc 4 0 0 450G 32 .users.uid 5 324 0 450G 2 .users 6 12 0 450G 1 .rgw.buckets.index 7 0 0 450G 2 .rgw.buckets 8 976k 0 450G 1000 [root@magna105 ~]# ceph osd reweight-by-utilization 110 moved 14 / 384 (3.64583%) avg 42.6667 stddev 4.89898 -> 3.82971 (expected baseline 6.1584) min osd.4 with 51 -> 51 pgs (1.19531 -> 1.19531 * mean) max osd.8 with 33 -> 42 pgs (0.773438 -> 0.984375 * mean) oload 110 max_change 1 max_change_osds 4 average 0.713184 overload 0.784502 osd.2 weight 0.950012 -> 65535.949219 osd.4 weight 0.950012 -> 65535.949219 osd.8 weight 0.849991 -> 0.975845 [root@magna105 ~]# ceph osd reweight-by-utilization 110 moved 0 / 384 (0%) avg 42.6667 stddev 3.82971 -> 3.82971 (expected baseline 6.1584) min osd.4 with 51 -> 51 pgs (1.19531 -> 1.19531 * mean) max osd.5 with 36 -> 36 pgs (0.84375 -> 0.84375 * mean) oload 110 max_change 1 max_change_osds 4 average 0.721498 overload 0.793647 osd.4 weight 65535.949219 -> 65534.949219 osd.8 weight 0.975845 -> 1.000000 [root@magna105 ~]# ceph osd reweight-by-utilization 110 --no-increasing moved 0 / 384 (0%) avg 42.6667 stddev 3.82971 -> 3.82971 (expected baseline 6.1584) min osd.4 with 51 -> 51 pgs (1.19531 -> 1.19531 * mean) max osd.5 with 36 -> 36 pgs (0.84375 -> 0.84375 * mean) oload 110 max_change 1 max_change_osds 4 average 0.721498 overload 0.793648 osd.4 weight 65534.949219 -> 65533.949219 [root@magna105 ~]# ceph osd df ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR 0 0.89999 1.00000 926G 712G 213G 76.93 1.07 1 0.89999 1.00000 926G 704G 221G 76.05 1.05 2 0.89999 65535.94922 926G 732G 193G 79.12 1.10 3 0.75000 1.00000 926G 610G 315G 65.93 0.91 4 0.89999 65533.94922 926G 769G 156G 83.10 1.15 5 0.89999 1.00000 926G 557G 368G 60.22 0.83 6 0.79999 1.00000 926G 668G 257G 72.21 1.00 7 0.89999 1.00000 926G 609G 316G 65.78 0.91 8 0.89999 1.00000 926G 648G 277G 69.99 0.97 TOTAL 8334G 6013G 2321G 72.15 MIN/MAX VAR: 0.83/1.15 STDDEV: 9.18 [root@magna105 ~]# ceph -s cluster 6de276f4-42aa-4de9-85d7-6f879ce1faa3 health HEALTH_WARN clock skew detected on mon.magna107, mon.magna108 5 pgs backfilling 5 pgs stuck unclean recovery 66837/1571772 objects misplaced (4.252%) Monitor clock skew detected monmap e1: 3 mons at {magna105=10.8.128.105:6789/0,magna107=10.8.128.107:6789/0,magna108=10.8.128.108:6789/0} election epoch 24, quorum 0,1,2 magna105,magna107,magna108 osdmap e443: 9 osds: 9 up, 9 in; 5 remapped pgs pgmap v35822: 128 pgs, 9 pools, 1978 GB data, 495 kobjects 6065 GB used, 2269 GB / 8334 GB avail 66837/1571772 objects misplaced (4.252%) 123 active+clean 5 active+remapped+backfilling recovery io 37717 kB/s, 9 objects/s [root@magna105 ~]# ceph osd df ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR 0 0.89999 1.00000 926G 712G 213G 76.93 1.06 1 0.89999 1.00000 926G 704G 221G 76.05 1.05 2 0.89999 65535.94922 926G 732G 193G 79.12 1.09 3 0.75000 1.00000 926G 610G 315G 65.93 0.91 4 0.89999 65533.94922 926G 769G 156G 83.10 1.14 5 0.89999 1.00000 926G 557G 368G 60.22 0.83 6 0.79999 1.00000 926G 668G 257G 72.21 0.99 7 0.89999 1.00000 926G 609G 316G 65.78 0.90 8 0.89999 1.00000 926G 699G 226G 75.58 1.04 TOTAL 8334G 6065G 2269G 72.77 MIN/MAX VAR: 0.83/1.14 STDDEV: 8.57 [root@magna105 ~]# ceph osd df ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR 0 0.89999 1.00000 926G 692G 233G 74.79 1.02 1 0.89999 1.00000 926G 682G 243G 73.72 1.01 2 0.89999 65535.94922 926G 709G 216G 76.64 1.05 3 0.75000 1.00000 926G 626G 299G 67.62 0.93 4 0.89999 65533.94922 926G 711G 215G 76.78 1.05 5 0.89999 1.00000 926G 572G 353G 61.87 0.85 6 0.79999 1.00000 926G 685G 240G 74.03 1.01 7 0.89999 1.00000 926G 624G 301G 67.42 0.92 8 0.89999 1.00000 926G 787G 138G 85.05 1.16 TOTAL 8334G 6092G 2241G 73.10 MIN/MAX VAR: 0.85/1.16 STDDEV: 3.61
Actually an unsigned underflow on the previous line. FYI: max_change defaults to 0.05, it's a ratio compared with the current weight. Simple enough fix. Fixing
Just to confirm: how did you set max_change to 1?
(In reply to Samuel Just from comment #4) > Just to confirm: how did you set max_change to 1? using injectargs
In wip-sam-testing to run through upstream testing on master tonight. Backported ceph-1.3-rhel-patches-15655 in gerrit for testing in the mean time. Should be able to backport to upstream hammer/jewel on Monday teuthology permitting.
It'll delay the release to pull this in so the plan is to release as-in and provide guidance. Specifically, 1- Make sure the customer users a small max_change. This is what they will want to do anyway, FWIW. I suggest a value of .05 or smaller. 2- Advise the customer to always use test-reweight-by-utilization first to confirm that the reweight plan is sane. For example, ceph osd test-reweight-by-utilization 120 .05 10 # max .05 change for 10 osds then verify the weight changes seem small and reasonable, and a smallish number of PGs will move, and then ceph osd reweight-by-utilization 120 .05 10 Sound okay?
No underflow observed, hence marking as verified. Verified on 0.94.9-1.el7cp.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-1972.html