| Summary: | [RADOS]:- osd gets heavy weight due to reweight-by-utilization with max_change set to 1 | ||
|---|---|---|---|
| Product: | Red Hat Ceph Storage | Reporter: | shylesh <shmohan> |
| Component: | RADOS | Assignee: | Samuel Just <sjust> |
| Status: | CLOSED ERRATA | QA Contact: | shylesh <shmohan> |
| Severity: | medium | Docs Contact: | Bara Ancincova <bancinco> |
| Priority: | unspecified | ||
| Version: | 1.3.2 | CC: | ceph-eng-bugs, dzafman, hnallurv, icolle, kchai, kdreyer, nlevine, sjust, sweil, tserlin, vumrao |
| Target Milestone: | rc | ||
| Target Release: | 1.3.3 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | RHEL: ceph-0.94.7-5.el7cp Ubuntu: ceph_0.94.7-3redhat1trusty | Doc Type: | Bug Fix |
| Doc Text: |
.OSDs no longer receive unreasonably large weight during "reweight-by-utilization"
When the value of the `max_change` parameter was greater than an OSD weight, an underflow occurred. Consequently, the OSD node could receive an unreasonably large weight during the `reweight-by-utilization` process. This bug has been fixed, and OSDs no longer receive large weight in the described situation.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-09-29 12:57:58 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Bug Depends On: | 1335269 | ||
| Bug Blocks: | 1372735 | ||
Actually an unsigned underflow on the previous line. FYI: max_change defaults to 0.05, it's a ratio compared with the current weight. Simple enough fix. Fixing Just to confirm: how did you set max_change to 1? (In reply to Samuel Just from comment #4) > Just to confirm: how did you set max_change to 1? using injectargs In wip-sam-testing to run through upstream testing on master tonight. Backported ceph-1.3-rhel-patches-15655 in gerrit for testing in the mean time. Should be able to backport to upstream hammer/jewel on Monday teuthology permitting. It'll delay the release to pull this in so the plan is to release as-in and provide guidance. Specifically, 1- Make sure the customer users a small max_change. This is what they will want to do anyway, FWIW. I suggest a value of .05 or smaller. 2- Advise the customer to always use test-reweight-by-utilization first to confirm that the reweight plan is sane. For example, ceph osd test-reweight-by-utilization 120 .05 10 # max .05 change for 10 osds then verify the weight changes seem small and reasonable, and a smallish number of PGs will move, and then ceph osd reweight-by-utilization 120 .05 10 Sound okay? No underflow observed, hence marking as verified. Verified on 0.94.9-1.el7cp.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-1972.html |
Description of problem: While reweight-by-utilization on a cluster with max_change set to 1 could you lead to osd getting heavy weight of 65534 due to integer overflow of the variable Version-Release number of selected component (if applicable): 1.3.2 Async release. [root@magna105 ~]# rpm -qa| grep ceph ceph-mon-0.94.5-12.el7cp.x86_64 ceph-common-0.94.5-12.el7cp.x86_64 ceph-selinux-0.94.5-12.el7cp.x86_64 mod_fastcgi-2.4.7-1.ceph.el7.x86_64 iozone-3.424-2_ceph.el7.x86_64 ceph-0.94.5-12.el7cp.x86_64 How reproducible: Steps to Reproduce: 1.created a cluster with 9 osds 2.filled up data and observerd that there is some imbalance in the data distribution 3.Ran "ceph osd reweight-by-utilization 110" Actual results: [root@magna105 ~]# ceph osd df ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR 0 0.89999 1.00000 926G 692G 233G 74.79 1.02 1 0.89999 1.00000 926G 682G 243G 73.72 1.01 2 0.89999 65535.94922 926G 709G 216G 76.64 1.05 3 0.75000 1.00000 926G 626G 299G 67.62 0.93 4 0.89999 65533.94922 926G 711G 215G 76.78 1.05 5 0.89999 1.00000 926G 572G 353G 61.87 0.85 6 0.79999 1.00000 926G 685G 240G 74.03 1.01 7 0.89999 1.00000 926G 624G 301G 67.42 0.92 8 0.89999 1.00000 926G 787G 138G 85.05 1.16 TOTAL 8334G 6092G 2241G 73.10 MIN/MAX VAR: 0.85/1.16 STDDEV: 3.61 some of the osds got very high value for the reweight due to integer overflow in the calculation src/mon/OSDMonitor.cc new_weight = MAX(new_weight, weight - max_change); Additional info: [root@magna105 ~]# ceph -s cluster 6de276f4-42aa-4de9-85d7-6f879ce1faa3 health HEALTH_WARN clock skew detected on mon.magna107, mon.magna108 Monitor clock skew detected monmap e1: 3 mons at {magna105=10.8.128.105:6789/0,magna107=10.8.128.107:6789/0,magna108=10.8.128.108:6789/0} election epoch 24, quorum 0,1,2 magna105,magna107,magna108 osdmap e430: 9 osds: 9 up, 9 in pgmap v34074: 128 pgs, 9 pools, 1978 GB data, 495 kobjects 5944 GB used, 2390 GB / 8334 GB avail 128 active+clean [root@magna105 ~]# ceph osd df ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR 0 0.89999 1.00000 926G 712G 213G 76.93 1.08 1 0.89999 1.00000 926G 704G 221G 76.05 1.07 2 0.89999 0.95001 926G 771G 154G 83.26 1.17 3 0.75000 1.00000 926G 617G 308G 66.70 0.94 4 0.89999 0.95001 926G 727G 198G 78.58 1.10 5 0.89999 1.00000 926G 557G 368G 60.22 0.84 6 0.79999 1.00000 926G 668G 257G 72.21 1.01 7 0.89999 1.00000 926G 609G 316G 65.78 0.92 8 0.89999 0.84999 926G 575G 350G 62.12 0.87 TOTAL 8334G 5944G 2390G 71.32 MIN/MAX VAR: 0.84/1.17 STDDEV: 7.46 [root@magna105 ~]# ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 8334G 2390G 5944G 71.32 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS rbd 0 1978G 23.74 450G 506530 .rgw.root 1 848 0 450G 3 .rgw.control 2 0 0 450G 8 .rgw 3 704 0 450G 4 .rgw.gc 4 0 0 450G 32 .users.uid 5 324 0 450G 2 .users 6 12 0 450G 1 .rgw.buckets.index 7 0 0 450G 2 .rgw.buckets 8 976k 0 450G 1000 [root@magna105 ~]# ceph osd reweight-by-utilization 110 moved 14 / 384 (3.64583%) avg 42.6667 stddev 4.89898 -> 3.82971 (expected baseline 6.1584) min osd.4 with 51 -> 51 pgs (1.19531 -> 1.19531 * mean) max osd.8 with 33 -> 42 pgs (0.773438 -> 0.984375 * mean) oload 110 max_change 1 max_change_osds 4 average 0.713184 overload 0.784502 osd.2 weight 0.950012 -> 65535.949219 osd.4 weight 0.950012 -> 65535.949219 osd.8 weight 0.849991 -> 0.975845 [root@magna105 ~]# ceph osd reweight-by-utilization 110 moved 0 / 384 (0%) avg 42.6667 stddev 3.82971 -> 3.82971 (expected baseline 6.1584) min osd.4 with 51 -> 51 pgs (1.19531 -> 1.19531 * mean) max osd.5 with 36 -> 36 pgs (0.84375 -> 0.84375 * mean) oload 110 max_change 1 max_change_osds 4 average 0.721498 overload 0.793647 osd.4 weight 65535.949219 -> 65534.949219 osd.8 weight 0.975845 -> 1.000000 [root@magna105 ~]# ceph osd reweight-by-utilization 110 --no-increasing moved 0 / 384 (0%) avg 42.6667 stddev 3.82971 -> 3.82971 (expected baseline 6.1584) min osd.4 with 51 -> 51 pgs (1.19531 -> 1.19531 * mean) max osd.5 with 36 -> 36 pgs (0.84375 -> 0.84375 * mean) oload 110 max_change 1 max_change_osds 4 average 0.721498 overload 0.793648 osd.4 weight 65534.949219 -> 65533.949219 [root@magna105 ~]# ceph osd df ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR 0 0.89999 1.00000 926G 712G 213G 76.93 1.07 1 0.89999 1.00000 926G 704G 221G 76.05 1.05 2 0.89999 65535.94922 926G 732G 193G 79.12 1.10 3 0.75000 1.00000 926G 610G 315G 65.93 0.91 4 0.89999 65533.94922 926G 769G 156G 83.10 1.15 5 0.89999 1.00000 926G 557G 368G 60.22 0.83 6 0.79999 1.00000 926G 668G 257G 72.21 1.00 7 0.89999 1.00000 926G 609G 316G 65.78 0.91 8 0.89999 1.00000 926G 648G 277G 69.99 0.97 TOTAL 8334G 6013G 2321G 72.15 MIN/MAX VAR: 0.83/1.15 STDDEV: 9.18 [root@magna105 ~]# ceph -s cluster 6de276f4-42aa-4de9-85d7-6f879ce1faa3 health HEALTH_WARN clock skew detected on mon.magna107, mon.magna108 5 pgs backfilling 5 pgs stuck unclean recovery 66837/1571772 objects misplaced (4.252%) Monitor clock skew detected monmap e1: 3 mons at {magna105=10.8.128.105:6789/0,magna107=10.8.128.107:6789/0,magna108=10.8.128.108:6789/0} election epoch 24, quorum 0,1,2 magna105,magna107,magna108 osdmap e443: 9 osds: 9 up, 9 in; 5 remapped pgs pgmap v35822: 128 pgs, 9 pools, 1978 GB data, 495 kobjects 6065 GB used, 2269 GB / 8334 GB avail 66837/1571772 objects misplaced (4.252%) 123 active+clean 5 active+remapped+backfilling recovery io 37717 kB/s, 9 objects/s [root@magna105 ~]# ceph osd df ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR 0 0.89999 1.00000 926G 712G 213G 76.93 1.06 1 0.89999 1.00000 926G 704G 221G 76.05 1.05 2 0.89999 65535.94922 926G 732G 193G 79.12 1.09 3 0.75000 1.00000 926G 610G 315G 65.93 0.91 4 0.89999 65533.94922 926G 769G 156G 83.10 1.14 5 0.89999 1.00000 926G 557G 368G 60.22 0.83 6 0.79999 1.00000 926G 668G 257G 72.21 0.99 7 0.89999 1.00000 926G 609G 316G 65.78 0.90 8 0.89999 1.00000 926G 699G 226G 75.58 1.04 TOTAL 8334G 6065G 2269G 72.77 MIN/MAX VAR: 0.83/1.14 STDDEV: 8.57 [root@magna105 ~]# ceph osd df ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR 0 0.89999 1.00000 926G 692G 233G 74.79 1.02 1 0.89999 1.00000 926G 682G 243G 73.72 1.01 2 0.89999 65535.94922 926G 709G 216G 76.64 1.05 3 0.75000 1.00000 926G 626G 299G 67.62 0.93 4 0.89999 65533.94922 926G 711G 215G 76.78 1.05 5 0.89999 1.00000 926G 572G 353G 61.87 0.85 6 0.79999 1.00000 926G 685G 240G 74.03 1.01 7 0.89999 1.00000 926G 624G 301G 67.42 0.92 8 0.89999 1.00000 926G 787G 138G 85.05 1.16 TOTAL 8334G 6092G 2241G 73.10 MIN/MAX VAR: 0.85/1.16 STDDEV: 3.61