Bug 1956601
| Summary: | [RADOS]: Global Recovery Event is running continuously with 0 objects in the cluster | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | skanta |
| Component: | RADOS | Assignee: | Kamoltat (Junior) Sirivadhna <ksirivad> |
| Status: | CLOSED ERRATA | QA Contact: | Pawan <pdhiran> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 5.0 | CC: | akupczyk, bhubbard, ceph-eng-bugs, jdurgin, nojha, pdhiran, rzarzyns, sseshasa, tserlin, vashastr, vereddy, vumrao |
| Target Milestone: | --- | ||
| Target Release: | 5.1 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | ceph-16.2.6-11.el8cp | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-04-04 10:20:39 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
After Four Hours-
[ceph: root@magna048 ceph]# ceph -s
cluster:
id: ee3257e8-ac73-11eb-b907-002590fbc71c
health: HEALTH_OK
services:
mon: 3 daemons, quorum magna048,magna049,magna050 (age 4h)
mgr: magna048.rffxzv(active, since 4h), standbys: magna049.htxdoz
osd: 23 osds: 23 up (since 4h), 23 in (since 4h)
data:
pools: 2 pools, 33 pgs
objects: 0 objects, 0 B
usage: 424 MiB used, 66 TiB / 66 TiB avail
pgs: 33 active+clean
progress:
Global Recovery Event (4h)
[===========================.] (remaining: 8m)
[ceph: root@magna048 ceph]#
This looks like the expected behavior of the pg_autoscaler - it's reducing the number of pgs from the initial 100 to the minimum for the pool. That the progress event isn't going away is a presentation bug - since there are 0 objects and no pgs in recovery, it is purely a display issue. Thus lowering the severity and moving to 5.1. Because this a 5.0 which is equivalent to Pacific in upstream, I don't think it is related to the new pg_autoscaler behavior that scales down the PGs since this feature was reverted before it was branched off. I have an idea of what the problem is, which is to do with how the progress module is ignoring PGs that are not being reported by the OSD. I'm working on the fix and will patch this once it is backported on upstream. Hi Team,
Triggering another event made the progress section to get refreshed and inapt recovery event info disappeared.
Example -
I initiated
>> ceph orch upgrade start
Prgress section got updated
progress:
Upgrade to 16.2.0-31.el8cp (0s)
[=...........................]
*** Bug 1958037 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 5.1 Security, Enhancement, and Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1174 |
Description of problem: Global Recovery Event is running continuously with 0 objects in the cluster. Version-Release number of selected component (if applicable): [ceph: root@magna048 ceph]# ceph -v ceph version 16.2.0-26.el8cp (26d0f1958ee507a7c6dd31af72106c7006a4d0b7) pacific (stable) [ceph: root@magna048 ceph]# How reproducible: Steps to Reproduce: 1. Configure a cluster. 2. Create a pool Actual results: 1. cluster configuration output [ceph: root@magna048 /]# ceph -s cluster: id: ee3257e8-ac73-11eb-b907-002590fbc71c health: HEALTH_OK services: mon: 3 daemons, quorum magna048,magna049,magna050 (age 11m) mgr: magna048.rffxzv(active, since 25m), standbys: magna049.htxdoz osd: 23 osds: 23 up (since 6m), 23 in (since 6m) data: pools: 1 pools, 1 pgs objects: 0 objects, 0 B usage: 127 MiB used, 66 TiB / 66 TiB avail pgs: 1 active+clean 2. Created pool [ceph: root@magna048 /]# ceph osd pool create testbench 100 100 pool 'testbench' created [ceph: root@magna048 /]# 3.Output after pool creation- [ceph: root@magna048 /]# ceph -s cluster: id: ee3257e8-ac73-11eb-b907-002590fbc71c health: HEALTH_OK services: mon: 3 daemons, quorum magna048,magna049,magna050 (age 12m) mgr: magna048.rffxzv(active, since 27m), standbys: magna049.htxdoz osd: 23 osds: 23 up (since 7m), 23 in (since 7m) data: pools: 2 pools, 101 pgs objects: 0 objects, 0 B usage: 133 MiB used, 66 TiB / 66 TiB avail pgs: 101 active+clean progress: Global Recovery Event (1s) [............................] (remaining: 64s) After few hours- [ceph: root@magna048 ceph]# ceph -s cluster: id: ee3257e8-ac73-11eb-b907-002590fbc71c health: HEALTH_OK services: mon: 3 daemons, quorum magna048,magna049,magna050 (age 2h) mgr: magna048.rffxzv(active, since 2h), standbys: magna049.htxdoz osd: 23 osds: 23 up (since 2h), 23 in (since 2h) data: pools: 2 pools, 33 pgs objects: 0 objects, 0 B usage: 424 MiB used, 66 TiB / 66 TiB avail pgs: 33 active+clean progress: Global Recovery Event (2h) [===========================.] (remaining: 4m) [ceph: root@magna048 ceph]# Expected results: Additional info: