1747315 – ceph mgr balancer ignoring host crush weight when using upmap

Bug 1747315 - ceph mgr balancer ignoring host crush weight when using upmap

Summary: ceph mgr balancer ignoring host crush weight when using upmap

Keywords:
Status:	CLOSED DUPLICATE of bug 1751131
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Mgr Plugins
Sub Component:
Version:	3.3
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	z3
Target Release:	3.3
Assignee:	Josh Durgin
QA Contact:	Madhavi Kasturi
Docs Contact:
URL:
Whiteboard:
Depends On:	1741677 1751131
Blocks:
TreeView+	depends on / blocked

Reported:	2019-08-30 06:11 UTC by Steve Baldwin
Modified:	2020-05-13 21:45 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-12-19 18:54:11 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Comment 3 Avi Avraham 2019-09-10 12:08:06 UTC

The customer is currently migrating their 1.5 PB Ceph cluster from an existing 18 nodes to a brand new 28 nodes cluster. 
At current state, the cluster isn’t balanced, (e.g. one OSD with 37% usage and some other with 55% usage), impacting cluster performance as I/O does not spread across all osds evenly.
UPMAP should enable as automatic balancing of OSDs.
The current migration ( swap bucket) approach is very slow (taking 4 weeks to migrate 1 single node to new hardware).
Based on experience  (e.g. CERN [1][2]) we expect UPMAP to increase migration speed. 
[1] https://edenmal.moe/post/2018/Ceph-Day-Berlin-2018/#mastering-ceph-operations-upmap-and-the-mgr-balancer-dan-van-der-ster-cern).
[2] https://www.youtube.com/watch?v=niFNZN5EKvE
[3] https://www.slideshare.net/Inktank_Ceph/ceph-day-berlin-mastering-ceph-operations-upmap-and-the-mgr-balancer

After some testing is done by customer team with Red Hat consulting team assistance.
the procedure doesn't work due to upmap checks of the OSD that cancel pg pinning to the source OSD 
we suspect that the OSDMAP mechanism prevents using those OSDs in the upmap list.
Attached a bash script simulates the scenario the customer is trying to use for migration. 

How reproducible:
Reproduced in customer RHCS 3.3 staging environment.

Steps to Reproduce:
1.Change in the crushmap the weight of the osd host to 0
2.Wait until all pgs migrate from this osd
3.Turn on automatic balancing: ceph balancer on

Actual results:
while running the ceph dump command no pg upmap items. 

Expected results:
while running the ceph dump command we expect to have pg upmap items. 

###############################################
Commnad list running the scenario
###############################################

MON=1 MGR=1 MDS=0 RGW=1 OSD=5 ../src/vstart.sh -d -n
./ceph osd  -c ../ceph.conf crush move osd.0 host=osd0
./ceph osd  -c ../ceph.conf crush move osd.1 host=osd1
./ceph osd  -c ../ceph.conf crush move osd.2 host=osd2
./ceph osd  -c ../ceph.conf crush move osd.3 host=osd3
./ceph osd  -c ../ceph.conf crush move osd.4 host=osd4


./ceph osd  -c ../ceph.conf crush link osd0 root=default
./ceph osd  -c ../ceph.conf crush link osd1 root=default
./ceph osd  -c ../ceph.conf crush link osd2 root=default
./ceph osd  -c ../ceph.conf crush link osd3 root=default
./ceph osd  -c ../ceph.conf crush link osd4 root=default

./ceph osd pool create rgw 8 8 erasure  -c ../ceph.conf
./ceph osd set norecover  -c ../ceph.conf
./rados bench 10 write --no-cleanup  -p rgw -c ../ceph.conf
./ceph osd  -c ../ceph.conf crush unlink osd4

./ceph osd set-require-min-compat-client luminous  -c ../ceph.conf


./ceph osd pg-upmap-items 1.5 0 4  -c ../ceph.conf
./ceph osd pg-upmap-items 1.4 0 4  -c ../ceph.conf
./ceph osd pg-upmap-items 1.2 1 3 0 4  -c ../ceph.conf
./ceph osd pg-upmap-items 1.1 2 4  -c ../ceph.conf
./ceph osd pg-upmap-items  1.3 2 4 -c ../ceph.conf

./ceph osd dump -c ../ceph.conf

Comment 7 Josh Durgin 2019-12-19 18:54:11 UTC


*** This bug has been marked as a duplicate of bug 1751131 ***

Note You need to log in before you can comment on or make changes to this bug.