Bug 1747315

Summary: ceph mgr balancer ignoring host crush weight when using upmap
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Steve Baldwin <sbaldwin>
Component: Ceph-Mgr PluginsAssignee: Josh Durgin <jdurgin>
Status: CLOSED DUPLICATE QA Contact: Madhavi Kasturi <mkasturi>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.3CC: aavraham, ceph-eng-bugs, ceph-qe-bugs, gmeno, jdurgin, mmanjuna, mmurthy, tserlin
Target Milestone: z3   
Target Release: 3.3   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-12-19 18:54:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1741677, 1751131    
Bug Blocks:    

Comment 3 Avi Avraham 2019-09-10 12:08:06 UTC
The customer is currently migrating their 1.5 PB Ceph cluster from an existing 18 nodes to a brand new 28 nodes cluster. 
At current state, the cluster isn’t balanced, (e.g. one OSD with 37% usage and some other with 55% usage), impacting cluster performance as I/O does not spread across all osds evenly.
UPMAP should enable as automatic balancing of OSDs.
The current migration ( swap bucket) approach is very slow (taking 4 weeks to migrate 1 single node to new hardware).
Based on experience  (e.g. CERN [1][2]) we expect UPMAP to increase migration speed. 
[1] https://edenmal.moe/post/2018/Ceph-Day-Berlin-2018/#mastering-ceph-operations-upmap-and-the-mgr-balancer-dan-van-der-ster-cern).
[2] https://www.youtube.com/watch?v=niFNZN5EKvE
[3] https://www.slideshare.net/Inktank_Ceph/ceph-day-berlin-mastering-ceph-operations-upmap-and-the-mgr-balancer

After some testing is done by customer team with Red Hat consulting team assistance.
the procedure doesn't work due to upmap checks of the OSD that cancel pg pinning to the source OSD 
we suspect that the OSDMAP mechanism prevents using those OSDs in the upmap list.
Attached a bash script simulates the scenario the customer is trying to use for migration. 

How reproducible:
Reproduced in customer RHCS 3.3 staging environment.

Steps to Reproduce:
1.Change in the crushmap the weight of the osd host to 0
2.Wait until all pgs migrate from this osd
3.Turn on automatic balancing: ceph balancer on

Actual results:
while running the ceph dump command no pg upmap items. 

Expected results:
while running the ceph dump command we expect to have pg upmap items. 

###############################################
Commnad list running the scenario
###############################################

MON=1 MGR=1 MDS=0 RGW=1 OSD=5 ../src/vstart.sh -d -n
./ceph osd  -c ../ceph.conf crush move osd.0 host=osd0
./ceph osd  -c ../ceph.conf crush move osd.1 host=osd1
./ceph osd  -c ../ceph.conf crush move osd.2 host=osd2
./ceph osd  -c ../ceph.conf crush move osd.3 host=osd3
./ceph osd  -c ../ceph.conf crush move osd.4 host=osd4


./ceph osd  -c ../ceph.conf crush link osd0 root=default
./ceph osd  -c ../ceph.conf crush link osd1 root=default
./ceph osd  -c ../ceph.conf crush link osd2 root=default
./ceph osd  -c ../ceph.conf crush link osd3 root=default
./ceph osd  -c ../ceph.conf crush link osd4 root=default

./ceph osd pool create rgw 8 8 erasure  -c ../ceph.conf
./ceph osd set norecover  -c ../ceph.conf
./rados bench 10 write --no-cleanup  -p rgw -c ../ceph.conf
./ceph osd  -c ../ceph.conf crush unlink osd4

./ceph osd set-require-min-compat-client luminous  -c ../ceph.conf


./ceph osd pg-upmap-items 1.5 0 4  -c ../ceph.conf
./ceph osd pg-upmap-items 1.4 0 4  -c ../ceph.conf
./ceph osd pg-upmap-items 1.2 1 3 0 4  -c ../ceph.conf
./ceph osd pg-upmap-items 1.1 2 4  -c ../ceph.conf
./ceph osd pg-upmap-items  1.3 2 4 -c ../ceph.conf

./ceph osd dump -c ../ceph.conf

Comment 7 Josh Durgin 2019-12-19 18:54:11 UTC

*** This bug has been marked as a duplicate of bug 1751131 ***