Bug 1331764

Summary: OSDs are not selected properly while reweight-by-utilization
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Tanay Ganguly <tganguly>
Component: RADOSAssignee: Sage Weil <sweil>
Status: CLOSED ERRATA QA Contact: shylesh <shmohan>
Severity: high Docs Contact: Bara Ancincova <bancinco>
Priority: unspecified    
Version: 1.3.2CC: ceph-eng-bugs, dzafman, flucifre, hnallurv, kchai, kdreyer, kurs, nlevine, sjust, sweil
Target Milestone: rc   
Target Release: 1.3.3   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: RHEL: ceph-0.94.7-5.el7cp Ubuntu: ceph_0.94.7-3redhat1trusty Doc Type: Bug Fix
Doc Text:
.OSDs are now selected properly during "reweight-by-utilization" During the `reweight-by-utilization` process, some of the OSD nodes that met the criteria for reweighting were not selected. The underlying algorithm has been modified, and OSDs are now selected properly during `reweight-by-utilization`.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-29 12:58:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1372735    

Description Tanay Ganguly 2016-04-29 13:09:51 UTC
Description of problem:
Sometimes not all the OSD's are picked properly, some of the OSD's which meet the criteria to get reweighted are not getting reweighted.

Version-Release number of selected component (if applicable):
ceph version 0.94.5-6redhat1trusty

How reproducible:
Always

Steps to Reproduce:


$ sudo ceph osd df
ID WEIGHT  REWEIGHT SIZE   USE   AVAIL %USE  VAR  
 0 0.89999  0.80005   926G  782G  143G 84.50 1.10 
 1 0.89989        0      0     0     0     0    0 
 2 0.89999  0.80005   926G  649G  276G 70.11 0.91 
 3 0.89999  1.00000   926G  741G  185G 80.02 1.04 
 4 0.89999  1.00000   926G  783G  142G 84.61 1.10 
 5 0.89999  1.00000   926G  647G  278G 69.94 0.91 
 6 0.89999  0.80005   926G  678G  247G 73.24 0.95 
 7 0.89999  1.00000   926G  713G  212G 77.09 1.00 
 8 0.89999  1.00000   926G  737G  188G 79.65 1.03 
 9 0.89999  1.00000   926G  794G  131G 85.76 1.11 
10 0.89999  1.00000   926G  670G  255G 72.41 0.94 
11 0.89999  1.00000   926G  646G  279G 69.79 0.91 
              TOTAL 10186G 7844G 2341G 77.01      
MIN/MAX VAR: 0/1.11  STDDEV: 5.95



$ sudo ceph osd test-reweight-by-utilization 105 .05 5
no change
moved 59 / 1944 (3.03498%)
avg 176.727
stddev 10.1273 -> 8.28022 (expected baseline 12.6752)
min osd.8 with 190 -> 183 pgs (1.0751 -> 1.03549 * mean)
max osd.6 with 157 -> 168 pgs (0.888374 -> 0.950617 * mean)

oload 105
max_change 0.05
max_change_osds 5
average 0.770104
overload 0.808609
osd.9 weight 1.000000 -> 0.950012
osd.4 weight 1.000000 -> 0.950012
osd.0 weight 0.800049 -> 0.750061
osd.6 weight 0.800049 -> 0.841187
osd.2 weight 0.800049 -> 0.850037



Actual results:
osd.10 is having 72.41 of USAGE as compared to osd.2 having 70.11 USAGE
Still, osd.2 is getting selected ahead of osd.10

Expected results:
osd.10 should be considered ahead of osd.2

Additional info:

Comment 3 Samuel Just 2016-05-02 20:59:36 UTC
This probably should not hold up 1.3.2 -- advisory to user would be the right thing.

Comment 4 Harish NV Rao 2016-05-04 21:54:06 UTC
Sam, what advisory will be given to user in this case? please share the details.

Comment 5 Samuel Just 2016-05-04 21:56:59 UTC
I think sage would be the right person to ask.  Maybe that the user should make sure to use the test_ option first and verify that the behavior is ok?

Comment 6 Harish NV Rao 2016-05-04 22:15:20 UTC
Hi Sage,

Please let us know what advisory will be given to user in this case.

Regards,
Harish

Comment 7 Sage Weil 2016-05-05 00:06:35 UTC
Right. The user should

 ceph osd test-reweight-by-utilization ...
or
 ceph osd test-reweight-by-pg ...

prior to doing the non-test- variant to confirm that nothing drastic will happen.

They should also use small max_weight values.  E.g., 

 ceph osd test-reweight-by-utilization 120 .05 10

to update at most 10 osds with at most a change of .05.

Later, when we have backported this fix, the low-weight osds can be weighted up.  If they can wait for that, they should, but if not, it's no big deal--just a bit more data movement.

Comment 8 Harish NV Rao 2016-05-05 10:47:45 UTC
(In reply to Sage Weil from comment #7)
> Right. The user should
> 
>  ceph osd test-reweight-by-utilization ...
> or
>  ceph osd test-reweight-by-pg ...
> 
> prior to doing the non-test- variant to confirm that nothing drastic will
> happen.
> 
> They should also use small max_weight values.  E.g., 
> 
>  ceph osd test-reweight-by-utilization 120 .05 10
> 
> to update at most 10 osds with at most a change of .05.
> 
> Later, when we have backported this fix,

"this fix" here refers to the fix for BZ 1331764 or BZ 1331784 or both? Can you please confirm?

> the low-weight osds can be weighted
> up.  If they can wait for that, they should, but if not, it's no big
> deal--just a bit more data movement.

Comment 9 Sage Weil 2016-05-05 12:21:50 UTC
The fix is the same for both BZs.

Thanks!

Comment 10 Ken Dreyer (Red Hat) 2016-08-03 20:27:40 UTC
https://github.com/ceph/ceph/pull/9416 was merged to hammer after v0.94.7 was tagged, so this bug is fixed in v0.94.8 upstream.

Comment 15 shylesh 2016-09-14 08:41:04 UTC
With the introduction of new algorithm osds are chosen based on their distance from avg utilization. i.e more the distance from the avg greater chance of getting selected.

Hence marking this as verified.

Verified on 0.94.9-1.el7cp.x86_64

Comment 18 Sage Weil 2016-09-22 13:29:04 UTC
Looks good to me.  Thanks, Bara!

Comment 19 errata-xmlrpc 2016-09-29 12:58:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-1972.html