Bug 1747315

Summary:	ceph mgr balancer ignoring host crush weight when using upmap
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Steve Baldwin <sbaldwin>
Component:	Ceph-Mgr Plugins	Assignee:	Josh Durgin <jdurgin>
Status:	CLOSED DUPLICATE	QA Contact:	Madhavi Kasturi <mkasturi>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	3.3	CC:	aavraham, ceph-eng-bugs, ceph-qe-bugs, gmeno, jdurgin, mmanjuna, mmurthy, tserlin
Target Milestone:	z3
Target Release:	3.3
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-12-19 18:54:11 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1741677, 1751131
Bug Blocks:

Comment 3 Avi Avraham 2019-09-10 12:08:06 UTC

The customer is currently migrating their 1.5 PB Ceph cluster from an existing 18 nodes to a brand new 28 nodes cluster. 
At current state, the cluster isn’t balanced, (e.g. one OSD with 37% usage and some other with 55% usage), impacting cluster performance as I/O does not spread across all osds evenly.
UPMAP should enable as automatic balancing of OSDs.
The current migration ( swap bucket) approach is very slow (taking 4 weeks to migrate 1 single node to new hardware).
Based on experience  (e.g. CERN [1][2]) we expect UPMAP to increase migration speed. 
[1] https://edenmal.moe/post/2018/Ceph-Day-Berlin-2018/#mastering-ceph-operations-upmap-and-the-mgr-balancer-dan-van-der-ster-cern).
[2] https://www.youtube.com/watch?v=niFNZN5EKvE
[3] https://www.slideshare.net/Inktank_Ceph/ceph-day-berlin-mastering-ceph-operations-upmap-and-the-mgr-balancer

After some testing is done by customer team with Red Hat consulting team assistance.
the procedure doesn't work due to upmap checks of the OSD that cancel pg pinning to the source OSD 
we suspect that the OSDMAP mechanism prevents using those OSDs in the upmap list.
Attached a bash script simulates the scenario the customer is trying to use for migration. 

How reproducible:
Reproduced in customer RHCS 3.3 staging environment.

Steps to Reproduce:
1.Change in the crushmap the weight of the osd host to 0
2.Wait until all pgs migrate from this osd
3.Turn on automatic balancing: ceph balancer on

Actual results:
while running the ceph dump command no pg upmap items. 

Expected results:
while running the ceph dump command we expect to have pg upmap items. 

###############################################
Commnad list running the scenario
###############################################

MON=1 MGR=1 MDS=0 RGW=1 OSD=5 ../src/vstart.sh -d -n
./ceph osd  -c ../ceph.conf crush move osd.0 host=osd0
./ceph osd  -c ../ceph.conf crush move osd.1 host=osd1
./ceph osd  -c ../ceph.conf crush move osd.2 host=osd2
./ceph osd  -c ../ceph.conf crush move osd.3 host=osd3
./ceph osd  -c ../ceph.conf crush move osd.4 host=osd4


./ceph osd  -c ../ceph.conf crush link osd0 root=default
./ceph osd  -c ../ceph.conf crush link osd1 root=default
./ceph osd  -c ../ceph.conf crush link osd2 root=default
./ceph osd  -c ../ceph.conf crush link osd3 root=default
./ceph osd  -c ../ceph.conf crush link osd4 root=default

./ceph osd pool create rgw 8 8 erasure  -c ../ceph.conf
./ceph osd set norecover  -c ../ceph.conf
./rados bench 10 write --no-cleanup  -p rgw -c ../ceph.conf
./ceph osd  -c ../ceph.conf crush unlink osd4

./ceph osd set-require-min-compat-client luminous  -c ../ceph.conf


./ceph osd pg-upmap-items 1.5 0 4  -c ../ceph.conf
./ceph osd pg-upmap-items 1.4 0 4  -c ../ceph.conf
./ceph osd pg-upmap-items 1.2 1 3 0 4  -c ../ceph.conf
./ceph osd pg-upmap-items 1.1 2 4  -c ../ceph.conf
./ceph osd pg-upmap-items  1.3 2 4 -c ../ceph.conf

./ceph osd dump -c ../ceph.conf

Comment 7 Josh Durgin 2019-12-19 18:54:11 UTC


*** This bug has been marked as a duplicate of bug 1751131 ***