1956601 – [RADOS]: Global Recovery Event is running continuously with 0 objects in the cluster

Bug 1956601 - [RADOS]: Global Recovery Event is running continuously with 0 objects in the cluster

Summary: [RADOS]: Global Recovery Event is running continuously with 0 objects in th...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RADOS
Sub Component:
Version:	5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	5.1
Assignee:	Kamoltat (Junior) Sirivadhna
QA Contact:	Pawan
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1958037 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-05-04 03:48 UTC by skanta
Modified:	2022-04-04 10:21 UTC (History)
CC List:	12 users (show)
Fixed In Version:	ceph-16.2.6-11.el8cp
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-04-04 10:20:39 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	49988	0	None	None	None	2021-05-05 19:32:26 UTC
Red Hat Product Errata	RHSA-2022:1174	0	None	None	None	2022-04-04 10:21:06 UTC

Description skanta 2021-05-04 03:48:29 UTC

Description of problem:

   Global Recovery Event is running continuously with 0 objects in the cluster.


Version-Release number of selected component (if applicable):

[ceph: root@magna048 ceph]# ceph -v
ceph version 16.2.0-26.el8cp (26d0f1958ee507a7c6dd31af72106c7006a4d0b7) pacific (stable)
[ceph: root@magna048 ceph]# 


How reproducible:

Steps to Reproduce:
1. Configure a cluster.
2. Create a pool


Actual results:

1. cluster configuration output
        [ceph: root@magna048 /]# ceph -s
            cluster:
                   id:     ee3257e8-ac73-11eb-b907-002590fbc71c
                   health: HEALTH_OK
 
            services:
            mon: 3 daemons, quorum magna048,magna049,magna050 (age 11m)
            mgr: magna048.rffxzv(active, since 25m), standbys: magna049.htxdoz
            osd: 23 osds: 23 up (since 6m), 23 in (since 6m)
 
            data:
               pools:   1 pools, 1 pgs
               objects: 0 objects, 0 B
               usage:   127 MiB used, 66 TiB / 66 TiB avail
               pgs:     1 active+clean

2. Created pool
   [ceph: root@magna048 /]# ceph osd pool create testbench 100 100
       pool 'testbench' created
    [ceph: root@magna048 /]#

3.Output after pool creation-

  [ceph: root@magna048 /]# ceph -s
  cluster:
    id:     ee3257e8-ac73-11eb-b907-002590fbc71c
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum magna048,magna049,magna050 (age 12m)
    mgr: magna048.rffxzv(active, since 27m), standbys: magna049.htxdoz
    osd: 23 osds: 23 up (since 7m), 23 in (since 7m)
 
  data:
    pools:   2 pools, 101 pgs
    objects: 0 objects, 0 B
    usage:   133 MiB used, 66 TiB / 66 TiB avail
    pgs:     101 active+clean
 
  progress:
    Global Recovery Event (1s)
      [............................] (remaining: 64s)


After few hours-

   [ceph: root@magna048 ceph]# ceph -s
  cluster:
    id:     ee3257e8-ac73-11eb-b907-002590fbc71c
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum magna048,magna049,magna050 (age 2h)
    mgr: magna048.rffxzv(active, since 2h), standbys: magna049.htxdoz
    osd: 23 osds: 23 up (since 2h), 23 in (since 2h)
 
  data:
    pools:   2 pools, 33 pgs
    objects: 0 objects, 0 B
    usage:   424 MiB used, 66 TiB / 66 TiB avail
    pgs:     33 active+clean
 
  progress:
    Global Recovery Event (2h)
      [===========================.] (remaining: 4m)
 
[ceph: root@magna048 ceph]# 


Expected results:


Additional info:

Comment 1 skanta 2021-05-04 05:53:21 UTC

After Four Hours-

[ceph: root@magna048 ceph]# ceph -s
  cluster:
    id:     ee3257e8-ac73-11eb-b907-002590fbc71c
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum magna048,magna049,magna050 (age 4h)
    mgr: magna048.rffxzv(active, since 4h), standbys: magna049.htxdoz
    osd: 23 osds: 23 up (since 4h), 23 in (since 4h)
 
  data:
    pools:   2 pools, 33 pgs
    objects: 0 objects, 0 B
    usage:   424 MiB used, 66 TiB / 66 TiB avail
    pgs:     33 active+clean
 
  progress:
    Global Recovery Event (4h)
      [===========================.] (remaining: 8m)
 
[ceph: root@magna048 ceph]#

Comment 2 Josh Durgin 2021-05-04 14:47:04 UTC

This looks like the expected behavior of the pg_autoscaler - it's reducing the number of pgs from the initial 100 to the minimum for the pool.

That the progress event isn't going away is a presentation bug - since there are 0 objects and no pgs in recovery, it is purely a display issue.

Thus lowering the severity and moving to 5.1.

Comment 3 Kamoltat (Junior) Sirivadhna 2021-05-05 14:30:07 UTC

Because this a 5.0 which is equivalent to Pacific in upstream, I don't think it is related to the new pg_autoscaler behavior that scales down the PGs since this feature was reverted before it was branched off. I have an idea of what the problem is, which is to do with how the progress module is ignoring PGs that are not being reported by the OSD. I'm working on the fix and will patch this once it is backported on upstream.

Comment 5 Vasishta 2021-05-07 05:56:11 UTC

Hi Team,
Triggering another event made the progress section to get refreshed and inapt recovery event info disappeared.

Example -
I initiated 
>> ceph orch upgrade start
Prgress section got updated 

  progress:
    Upgrade to 16.2.0-31.el8cp (0s)
      [=...........................]

Comment 7 Neha Ojha 2021-05-07 19:30:19 UTC

*** Bug 1958037 has been marked as a duplicate of this bug. ***

Comment 19 errata-xmlrpc 2022-04-04 10:20:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 5.1 Security, Enhancement, and Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1174

Note You need to log in before you can comment on or make changes to this bug.