Bug 1331764 - OSDs are not selected properly while reweight-by-utilization
Summary: OSDs are not selected properly while reweight-by-utilization
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RADOS
Version: 1.3.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: 1.3.3
Assignee: Sage Weil
QA Contact: shylesh
Bara Ancincova
URL:
Whiteboard:
Depends On:
Blocks: 1372735
TreeView+ depends on / blocked
 
Reported: 2016-04-29 13:09 UTC by Tanay Ganguly
Modified: 2017-07-30 15:16 UTC (History)
10 users (show)

Fixed In Version: RHEL: ceph-0.94.7-5.el7cp Ubuntu: ceph_0.94.7-3redhat1trusty
Doc Type: Bug Fix
Doc Text:
.OSDs are now selected properly during "reweight-by-utilization" During the `reweight-by-utilization` process, some of the OSD nodes that met the criteria for reweighting were not selected. The underlying algorithm has been modified, and OSDs are now selected properly during `reweight-by-utilization`.
Clone Of:
Environment:
Last Closed: 2016-09-29 12:58:06 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:1972 normal SHIPPED_LIVE Moderate: Red Hat Ceph Storage 1.3.3 security, bug fix, and enhancement update 2016-09-29 16:51:21 UTC

Description Tanay Ganguly 2016-04-29 13:09:51 UTC
Description of problem:
Sometimes not all the OSD's are picked properly, some of the OSD's which meet the criteria to get reweighted are not getting reweighted.

Version-Release number of selected component (if applicable):
ceph version 0.94.5-6redhat1trusty

How reproducible:
Always

Steps to Reproduce:


$ sudo ceph osd df
ID WEIGHT  REWEIGHT SIZE   USE   AVAIL %USE  VAR  
 0 0.89999  0.80005   926G  782G  143G 84.50 1.10 
 1 0.89989        0      0     0     0     0    0 
 2 0.89999  0.80005   926G  649G  276G 70.11 0.91 
 3 0.89999  1.00000   926G  741G  185G 80.02 1.04 
 4 0.89999  1.00000   926G  783G  142G 84.61 1.10 
 5 0.89999  1.00000   926G  647G  278G 69.94 0.91 
 6 0.89999  0.80005   926G  678G  247G 73.24 0.95 
 7 0.89999  1.00000   926G  713G  212G 77.09 1.00 
 8 0.89999  1.00000   926G  737G  188G 79.65 1.03 
 9 0.89999  1.00000   926G  794G  131G 85.76 1.11 
10 0.89999  1.00000   926G  670G  255G 72.41 0.94 
11 0.89999  1.00000   926G  646G  279G 69.79 0.91 
              TOTAL 10186G 7844G 2341G 77.01      
MIN/MAX VAR: 0/1.11  STDDEV: 5.95



$ sudo ceph osd test-reweight-by-utilization 105 .05 5
no change
moved 59 / 1944 (3.03498%)
avg 176.727
stddev 10.1273 -> 8.28022 (expected baseline 12.6752)
min osd.8 with 190 -> 183 pgs (1.0751 -> 1.03549 * mean)
max osd.6 with 157 -> 168 pgs (0.888374 -> 0.950617 * mean)

oload 105
max_change 0.05
max_change_osds 5
average 0.770104
overload 0.808609
osd.9 weight 1.000000 -> 0.950012
osd.4 weight 1.000000 -> 0.950012
osd.0 weight 0.800049 -> 0.750061
osd.6 weight 0.800049 -> 0.841187
osd.2 weight 0.800049 -> 0.850037



Actual results:
osd.10 is having 72.41 of USAGE as compared to osd.2 having 70.11 USAGE
Still, osd.2 is getting selected ahead of osd.10

Expected results:
osd.10 should be considered ahead of osd.2

Additional info:

Comment 3 Samuel Just 2016-05-02 20:59:36 UTC
This probably should not hold up 1.3.2 -- advisory to user would be the right thing.

Comment 4 Harish NV Rao 2016-05-04 21:54:06 UTC
Sam, what advisory will be given to user in this case? please share the details.

Comment 5 Samuel Just 2016-05-04 21:56:59 UTC
I think sage would be the right person to ask.  Maybe that the user should make sure to use the test_ option first and verify that the behavior is ok?

Comment 6 Harish NV Rao 2016-05-04 22:15:20 UTC
Hi Sage,

Please let us know what advisory will be given to user in this case.

Regards,
Harish

Comment 7 Sage Weil 2016-05-05 00:06:35 UTC
Right. The user should

 ceph osd test-reweight-by-utilization ...
or
 ceph osd test-reweight-by-pg ...

prior to doing the non-test- variant to confirm that nothing drastic will happen.

They should also use small max_weight values.  E.g., 

 ceph osd test-reweight-by-utilization 120 .05 10

to update at most 10 osds with at most a change of .05.

Later, when we have backported this fix, the low-weight osds can be weighted up.  If they can wait for that, they should, but if not, it's no big deal--just a bit more data movement.

Comment 8 Harish NV Rao 2016-05-05 10:47:45 UTC
(In reply to Sage Weil from comment #7)
> Right. The user should
> 
>  ceph osd test-reweight-by-utilization ...
> or
>  ceph osd test-reweight-by-pg ...
> 
> prior to doing the non-test- variant to confirm that nothing drastic will
> happen.
> 
> They should also use small max_weight values.  E.g., 
> 
>  ceph osd test-reweight-by-utilization 120 .05 10
> 
> to update at most 10 osds with at most a change of .05.
> 
> Later, when we have backported this fix,

"this fix" here refers to the fix for BZ 1331764 or BZ 1331784 or both? Can you please confirm?

> the low-weight osds can be weighted
> up.  If they can wait for that, they should, but if not, it's no big
> deal--just a bit more data movement.

Comment 9 Sage Weil 2016-05-05 12:21:50 UTC
The fix is the same for both BZs.

Thanks!

Comment 10 Ken Dreyer (Red Hat) 2016-08-03 20:27:40 UTC
https://github.com/ceph/ceph/pull/9416 was merged to hammer after v0.94.7 was tagged, so this bug is fixed in v0.94.8 upstream.

Comment 15 shylesh 2016-09-14 08:41:04 UTC
With the introduction of new algorithm osds are chosen based on their distance from avg utilization. i.e more the distance from the avg greater chance of getting selected.

Hence marking this as verified.

Verified on 0.94.9-1.el7cp.x86_64

Comment 18 Sage Weil 2016-09-22 13:29:04 UTC
Looks good to me.  Thanks, Bara!

Comment 19 errata-xmlrpc 2016-09-29 12:58:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-1972.html


Note You need to log in before you can comment on or make changes to this bug.