Bug 1826224 - progress section in ceph status stuck for indefinite time
Summary: progress section in ceph status stuck for indefinite time
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 4.1
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.2z2
Assignee: Kamoltat (Junior) Sirivadhna
QA Contact: Pawan
Amrita
URL:
Whiteboard:
Depends On:
Blocks: 1890121
TreeView+ depends on / blocked
 
Reported: 2020-04-21 09:21 UTC by Vasishta
Modified: 2024-06-13 22:35 UTC (History)
13 users (show)

Fixed In Version: ceph-14.2.11-157.el8cp, ceph-14.2.11-157.el7cp
Doc Type: Bug Fix
Doc Text:
.The Progress module is no longer stuck for an indefinite time Previously, the progress vents in Ceph status were stuck for an indefinite time. This was caused by the Progress module checking the PG state early and not syncing with the epoch of the OSDMap. With this release, progress events now pop up as expected.
Clone Of:
Environment:
Last Closed: 2021-06-15 17:13:06 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph pull 37589 0 None closed nautilus: mgr/progress: make it so progress bar does not get stuck forever 2021-02-17 23:49:20 UTC
Red Hat Product Errata RHSA-2021:2445 0 None None None 2021-06-15 17:13:24 UTC

Description Vasishta 2020-04-21 09:21:15 UTC
Description of problem:
Progress section in ceph status is stuck for indefinite time

Version-Release number of selected component (if applicable):
14.2.8-35.el8cp

How reproducible:
Had faced it in earlier version of nautilus (built for downstream 4.0)

Steps followed:
1. Upgraded from luminous to nautilus (registry.access.redhat.com/rhceph/rhceph-3-rhel7 to ceph-4.1-rhel-8-containers-candidate-37018-20200413024316) using ceph-ansible.
2. Migrated FS OSDs to BS.


Actual results:
 data:
    pools:   8 pools, 560 pgs
    objects: 50.24k objects, 117 GiB
    usage:   2.4 TiB used, 18 TiB / 20 TiB avail
    pgs:     560 active+clean
 
  progress:
    Rebalancing after osd.16 marked out
      [========================......]
    Rebalancing after osd.20 marked out
      [==================............]
    Rebalancing after osd.27 marked out
      [=======================.......]
    Rebalancing after osd.24 marked out
      [============================..]
    Rebalancing after osd.26 marked out
      [========================......]
    Rebalancing after osd.18 marked out
      [===================...........]
    Rebalancing after osd.14 marked out
      [==========================....]
    Rebalancing after osd.22 marked out
      [=================.............]
    Rebalancing after osd.12 marked out
      [======================........]
    Rebalancing after osd.28 marked out
      [=====================.........]
    Rebalancing after osd.10 marked out
      [===========================...]


Expected results:
progress section must be updated and accurate.

Additional info:
Please let us know if any log is needed in particular.

Comment 10 Yaniv Kaul 2020-05-07 13:21:45 UTC
Adding NEEDINFO for QE to reproduce.

Comment 15 Kamoltat (Junior) Sirivadhna 2020-10-07 05:21:04 UTC

Solution that seems to fix the problem.

- Recreated problem with https://github.com/ceph/ceph/tree/v14.2.8 
- Test with marking 2/3 OSDs out. Result suggest that progress bar got stuck forever (waited 5 mins)
- Apply my patch, which is everything up to https://github.com/ceph/ceph/commit/93d4d9d7044e991a7bbdb70b0aef02284e6eda22#diff-e6c8e5b8f137e32891a6ad184d076415
- Problem seems to be fixed



This is the list of commits I am trying to back port to Nautilus before patching the downstream to prevent rebasing issue:


93d4d9d7044e991a7bbdb70b0aef02284e6eda22
901a37f436143a2525d6063f64942019cc888229
2046c25362a69b1ed2c0009e9ef6a944f0d9e621
dd2c3f66a1dbd9582b7cd695efff66317b730c8a
d37e8a4d84d873b7df264c63077805be8618ad7a
f618e56c93ad82a20ab844fddc3d2ded42f2a48e
21e1caba6df9d591ebff54939d020ce0a3e57efe

Comment 16 Kamoltat (Junior) Sirivadhna 2020-10-08 12:23:09 UTC
https://github.com/ceph/ceph/pull/37589
This is the pull request for back porting Nautilus

Comment 17 Yaniv Kaul 2020-11-25 08:17:46 UTC
(In reply to ksirivad from comment #16)
> https://github.com/ceph/ceph/pull/37589
> This is the pull request for back porting Nautilus

Excellent, how about a devel-ack then?

Comment 30 errata-xmlrpc 2021-06-15 17:13:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 4.2 Security and Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2445


Note You need to log in before you can comment on or make changes to this bug.