Description of problem:
After upgrading from 3.3z5 to 4.2 the balancer is failing with the following errors
2021-01-22 00:11:23.362 7f89aad3f700 0 mgr[py] `config set mgrmgr/rbd_support/cephmon1/mirror_snapshot_schedule --` failed: (22) Invalid argument <-- config set - missing space (mgrmgr)
2021-01-22 00:11:23.362 7f89aad3f700 0 mgr[py] mon returned -22: unrecognized config target ''
2021-01-22 00:11:23.412 7f89b6655700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'balancer' while running on mgr.cephmon1: (44,)
2021-01-22 00:11:23.412 7f89b6655700 -1 balancer.serve:
2021-01-22 00:11:23.412 7f89b6655700 -1 Traceback (most recent call last):
File "/usr/share/ceph/mgr/balancer/module.py", line 657, in serve
r, detail = self.optimize(plan)
File "/usr/share/ceph/mgr/balancer/module.py", line 928, in optimize
return self.do_crush_compat(plan)
File "/usr/share/ceph/mgr/balancer/module.py", line 1089, in do_crush_compat
weight = best_ws[osd]
KeyError: 44
This results in ceph status reporting cluster health error
health: HEALTH_ERR
Module 'balancer' has failed: (44,)
Version-Release number of selected component (if applicable):
14.2.11-95 (4.2)
How reproducible:
Unknown encountered once after the upgrade.
Steps to Reproduce:
1. Upgrade from 3.3z5 to 4.2
Actual results:
Balancer fails post-upgrade with errors in Problem description.
Expected results:
Balancer should not fail after upgrade to 4.2
Additional info:
This was a manual upgrade from 3.3z5 -> 4.2
This is fixed in 4.2: https://tracker.ceph.com/issues/42721
Going forward the upmap mode of the balancer is the preferred method. Switching to it requires removing the weightsets, and will cause some data movement.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Important: Red Hat Ceph Storage 4.2 Security and Bug Fix Update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2021:2445
Description of problem: After upgrading from 3.3z5 to 4.2 the balancer is failing with the following errors 2021-01-22 00:11:23.362 7f89aad3f700 0 mgr[py] `config set mgrmgr/rbd_support/cephmon1/mirror_snapshot_schedule --` failed: (22) Invalid argument <-- config set - missing space (mgrmgr) 2021-01-22 00:11:23.362 7f89aad3f700 0 mgr[py] mon returned -22: unrecognized config target '' 2021-01-22 00:11:23.412 7f89b6655700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'balancer' while running on mgr.cephmon1: (44,) 2021-01-22 00:11:23.412 7f89b6655700 -1 balancer.serve: 2021-01-22 00:11:23.412 7f89b6655700 -1 Traceback (most recent call last): File "/usr/share/ceph/mgr/balancer/module.py", line 657, in serve r, detail = self.optimize(plan) File "/usr/share/ceph/mgr/balancer/module.py", line 928, in optimize return self.do_crush_compat(plan) File "/usr/share/ceph/mgr/balancer/module.py", line 1089, in do_crush_compat weight = best_ws[osd] KeyError: 44 This results in ceph status reporting cluster health error health: HEALTH_ERR Module 'balancer' has failed: (44,) Version-Release number of selected component (if applicable): 14.2.11-95 (4.2) How reproducible: Unknown encountered once after the upgrade. Steps to Reproduce: 1. Upgrade from 3.3z5 to 4.2 Actual results: Balancer fails post-upgrade with errors in Problem description. Expected results: Balancer should not fail after upgrade to 4.2 Additional info: This was a manual upgrade from 3.3z5 -> 4.2