.The Progress module is no longer stuck for an indefinite time
Previously, the progress vents in Ceph status were stuck for an indefinite time. This was caused by the Progress module checking the PG state early and not syncing with the epoch of the OSDMap. With this release, progress events now pop up as expected.
Description of problem:
Progress section in ceph status is stuck for indefinite time
Version-Release number of selected component (if applicable):
14.2.8-35.el8cp
How reproducible:
Had faced it in earlier version of nautilus (built for downstream 4.0)
Steps followed:
1. Upgraded from luminous to nautilus (registry.access.redhat.com/rhceph/rhceph-3-rhel7 to ceph-4.1-rhel-8-containers-candidate-37018-20200413024316) using ceph-ansible.
2. Migrated FS OSDs to BS.
Actual results:
data:
pools: 8 pools, 560 pgs
objects: 50.24k objects, 117 GiB
usage: 2.4 TiB used, 18 TiB / 20 TiB avail
pgs: 560 active+clean
progress:
Rebalancing after osd.16 marked out
[========================......]
Rebalancing after osd.20 marked out
[==================............]
Rebalancing after osd.27 marked out
[=======================.......]
Rebalancing after osd.24 marked out
[============================..]
Rebalancing after osd.26 marked out
[========================......]
Rebalancing after osd.18 marked out
[===================...........]
Rebalancing after osd.14 marked out
[==========================....]
Rebalancing after osd.22 marked out
[=================.............]
Rebalancing after osd.12 marked out
[======================........]
Rebalancing after osd.28 marked out
[=====================.........]
Rebalancing after osd.10 marked out
[===========================...]
Expected results:
progress section must be updated and accurate.
Additional info:
Please let us know if any log is needed in particular.
Comment 15Kamoltat (Junior) Sirivadhna
2020-10-07 05:21:04 UTC
Solution that seems to fix the problem.
- Recreated problem with https://github.com/ceph/ceph/tree/v14.2.8
- Test with marking 2/3 OSDs out. Result suggest that progress bar got stuck forever (waited 5 mins)
- Apply my patch, which is everything up to https://github.com/ceph/ceph/commit/93d4d9d7044e991a7bbdb70b0aef02284e6eda22#diff-e6c8e5b8f137e32891a6ad184d076415
- Problem seems to be fixed
This is the list of commits I am trying to back port to Nautilus before patching the downstream to prevent rebasing issue:
93d4d9d7044e991a7bbdb70b0aef02284e6eda22
901a37f436143a2525d6063f64942019cc888229
2046c25362a69b1ed2c0009e9ef6a944f0d9e621
dd2c3f66a1dbd9582b7cd695efff66317b730c8a
d37e8a4d84d873b7df264c63077805be8618ad7a
f618e56c93ad82a20ab844fddc3d2ded42f2a48e
21e1caba6df9d591ebff54939d020ce0a3e57efe
Comment 16Kamoltat (Junior) Sirivadhna
2020-10-08 12:23:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Important: Red Hat Ceph Storage 4.2 Security and Bug Fix Update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2021:2445
Description of problem: Progress section in ceph status is stuck for indefinite time Version-Release number of selected component (if applicable): 14.2.8-35.el8cp How reproducible: Had faced it in earlier version of nautilus (built for downstream 4.0) Steps followed: 1. Upgraded from luminous to nautilus (registry.access.redhat.com/rhceph/rhceph-3-rhel7 to ceph-4.1-rhel-8-containers-candidate-37018-20200413024316) using ceph-ansible. 2. Migrated FS OSDs to BS. Actual results: data: pools: 8 pools, 560 pgs objects: 50.24k objects, 117 GiB usage: 2.4 TiB used, 18 TiB / 20 TiB avail pgs: 560 active+clean progress: Rebalancing after osd.16 marked out [========================......] Rebalancing after osd.20 marked out [==================............] Rebalancing after osd.27 marked out [=======================.......] Rebalancing after osd.24 marked out [============================..] Rebalancing after osd.26 marked out [========================......] Rebalancing after osd.18 marked out [===================...........] Rebalancing after osd.14 marked out [==========================....] Rebalancing after osd.22 marked out [=================.............] Rebalancing after osd.12 marked out [======================........] Rebalancing after osd.28 marked out [=====================.........] Rebalancing after osd.10 marked out [===========================...] Expected results: progress section must be updated and accurate. Additional info: Please let us know if any log is needed in particular.