2303490 – ceph reports PG_DEGRADED post recovering the active mds node from node drain. Ceph health never come to OK state

Bug 2303490 - ceph reports PG_DEGRADED post recovering the active mds node from node drain. Ceph health never come to OK state [NEEDINFO]

Summary: ceph reports PG_DEGRADED post recovering the active mds node from node drain....

Keywords:
Status:	NEW
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	ceph
Sub Component:
Version:	4.17
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Venky Shankar
QA Contact:	Elad
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-08-07 16:20 UTC by Nagendra Reddy
Modified:	2024-09-17 10:38 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
Flags:	vshankar: needinfo? (nagreddy) muagarwa: needinfo? (vshankar) vshankar: needinfo? (nagreddy)

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	OCSBZM-8831	0	None	None	None	2024-08-22 07:34:04 UTC

Description Nagendra Reddy 2024-08-07 16:20:44 UTC

Description of problem (please be detailed as possible and provide log
snippests):

I observed below warning in Ceph health immediately after performing node drain where active mds was running on that drained node.

Degraded data redundancy: 1263285/8784987 objects degraded (14.380%), 1 pg degraded, 1 pg undersized

ceph status:
sh-5.1$ ceph status
  cluster:
    id:     994259aa-5177-4411-bb6d-5f41e6d2bde0
    health: HEALTH_WARN
            Degraded data redundancy: 1263285/8784987 objects degraded (14.380%), 1 pg degraded, 1 pg undersized

  services:
    mon: 3 daemons, quorum a,b,c (age 36m)
    mgr: a(active, since 37m), standbys: b
    mds: 1/1 daemons up, 1 hot standby
    osd: 3 osds: 3 up (since 36m), 3 in (since 5h); 1 remapped pgs

  data:
    volumes: 1/1 healthy
    pools:   4 pools, 4 pgs
    objects: 2.93M objects, 4.2 GiB
    usage:   50 GiB used, 250 GiB / 300 GiB avail
    pgs:     1263285/8784987 objects degraded (14.380%)
             3 active+clean
             1 active+undersized+degraded+remapped+backfilling

  io:
    client:   1.8 KiB/s rd, 107 KiB/s wr, 2 op/s rd, 109 op/s wr
    recovery: 2.7 KiB/s, 147 objects/s

-------------------------------------------------------------------------------
ceph status -w :
---------------

2024-08-07T15:31:58.474864+0000 mon.a [INF] osd.0 marked itself down and dead
2024-08-07T15:31:59.446878+0000 mon.a [WRN] Health check failed: 1 osds down (OSD_DOWN)
2024-08-07T15:31:59.446909+0000 mon.a [WRN] Health check failed: 1 host (1 osds) down (OSD_HOST_DOWN)
2024-08-07T15:31:59.446917+0000 mon.a [WRN] Health check failed: 1 zone (1 osds) down (OSD_ZONE_DOWN)
2024-08-07T15:32:08.455715+0000 mon.c [INF] mon.c calling monitor election
2024-08-07T15:32:08.463852+0000 mon.a [INF] mon.a calling monitor election
2024-08-07T15:32:13.472954+0000 mon.a [INF] mon.a is new leader, mons a,c in quorum (ranks 0,2)
2024-08-07T15:32:13.504953+0000 mon.a [WRN] Health check failed: 1/3 mons down, quorum a,c (MON_DOWN)
2024-08-07T15:32:13.507412+0000 mon.a [INF] osd.0 failed (root=default,region=us-south,zone=us-south-2,host=ocs-deviceset-1-data-0gbnf7) (connection refused reported by osd.2)
2024-08-07T15:32:13.507697+0000 mon.a [INF] Active manager daemon a restarted
2024-08-07T15:32:13.508133+0000 mon.a [WRN] Health check failed: 1 osds down (OSD_DOWN)
2024-08-07T15:32:13.508154+0000 mon.a [WRN] Health check failed: 1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set (OSD_FLAGS)
2024-08-07T15:32:13.508161+0000 mon.a [WRN] Health check failed: 1 host (1 osds) down (OSD_HOST_DOWN)
2024-08-07T15:32:13.508172+0000 mon.a [WRN] Health check failed: 1 zone (1 osds) down (OSD_ZONE_DOWN)
2024-08-07T15:32:13.508438+0000 mon.a [INF] Activating manager daemon a
2024-08-07T15:32:13.524015+0000 mon.a [WRN] Health detail: HEALTH_WARN 1 filesystem is degraded; insufficient standby MDS daemons available; 1/3 mons down, quorum a,c
2024-08-07T15:32:13.524031+0000 mon.a [WRN] [WRN] FS_DEGRADED: 1 filesystem is degraded
2024-08-07T15:32:13.524037+0000 mon.a [WRN]     fs ocs-storagecluster-cephfilesystem is degraded
2024-08-07T15:32:13.524041+0000 mon.a [WRN] [WRN] MDS_INSUFFICIENT_STANDBY: insufficient standby MDS daemons available
2024-08-07T15:32:13.524046+0000 mon.a [WRN]     have 0; want 1 more
2024-08-07T15:32:13.524050+0000 mon.a [WRN] [WRN] MON_DOWN: 1/3 mons down, quorum a,c
2024-08-07T15:32:13.524056+0000 mon.a [WRN]     mon.b (rank 1) addr v2:172.30.111.99:3300/0 is down (out of quorum)
2024-08-07T15:32:13.546974+0000 mon.a [INF] daemon mds.ocs-storagecluster-cephfilesystem-b restarted
2024-08-07T15:32:13.571798+0000 mon.a [INF] Manager daemon a is now available
2024-08-07T15:32:14.500214+0000 mon.a [INF] Health check cleared: MDS_INSUFFICIENT_STANDBY (was: insufficient standby MDS daemons available)
2024-08-07T15:32:15.139718+0000 mon.a [INF] daemon mds.ocs-storagecluster-cephfilesystem-b restarted
2024-08-07T15:32:15.602506+0000 mon.a [WRN] Health check failed: Degraded data redundancy: 2122080/6366240 objects degraded (33.333%), 4 pgs degraded (PG_DEGRADED)
2024-08-07T15:32:15.725995+0000 mon.a [INF] daemon mds.ocs-storagecluster-cephfilesystem-b restarted
2024-08-07T15:32:16.713176+0000 mon.a [INF] daemon mds.ocs-storagecluster-cephfilesystem-b restarted
2024-08-07T15:32:17.720229+0000 mon.a [INF] daemon mds.ocs-storagecluster-cephfilesystem-b restarted
2024-08-07T15:32:18.754681+0000 mon.a [INF] daemon mds.ocs-storagecluster-cephfilesystem-b restarted
2024-08-07T15:32:19.759228+0000 mon.a [INF] daemon mds.ocs-storagecluster-cephfilesystem-b restarted
2024-08-07T15:32:20.821000+0000 mon.a [INF] daemon mds.ocs-storagecluster-cephfilesystem-b restarted
2024-08-07T15:32:21.829013+0000 mon.a [INF] daemon mds.ocs-storagecluster-cephfilesystem-b restarted
2024-08-07T15:32:21.846112+0000 mon.a [WRN] Health check update: Degraded data redundancy: 2129714/6389142 objects degraded (33.333%), 4 pgs degraded (PG_DEGRADED)
2024-08-07T15:32:22.854516+0000 mon.a [INF] daemon mds.ocs-storagecluster-cephfilesystem-b restarted
2024-08-07T15:32:23.858571+0000 mon.a [INF] daemon mds.ocs-storagecluster-cephfilesystem-b restarted
2024-08-07T15:32:24.913407+0000 mon.a [INF] daemon mds.ocs-storagecluster-cephfilesystem-b restarted
2024-08-07T15:32:25.919793+0000 mon.a [INF] daemon mds.ocs-storagecluster-cephfilesystem-b restarted
2024-08-07T15:32:26.969597+0000 mon.a [INF] daemon mds.ocs-storagecluster-cephfilesystem-b restarted
2024-08-07T15:32:27.970493+0000 mon.a [INF] daemon mds.ocs-storagecluster-cephfilesystem-b restarted
2024-08-07T15:32:30.343628+0000 mon.a [WRN] Health check update: Degraded data redundancy: 2129618/6388854 objects degraded (33.333%), 4 pgs degraded (PG_DEGRADED)
2024-08-07T15:32:34.242144+0000 mon.a [INF] daemon mds.ocs-storagecluster-cephfilesystem-a is now active in filesystem ocs-storagecluster-cephfilesystem as rank 0
2024-08-07T15:32:34.622939+0000 mon.a [INF] Health check cleared: FS_DEGRADED (was: 1 filesystem is degraded)
2024-08-07T15:32:35.347915+0000 mon.a [WRN] Health check update: Degraded data redundancy: 2129053/6387159 objects degraded (33.333%), 4 pgs degraded (PG_DEGRADED)
2024-08-07T15:32:40.350852+0000 mon.a [WRN] Health check update: Degraded data redundancy: 2128471/6385413 objects degraded (33.333%), 4 pgs degraded (PG_DEGRADED)
2024-08-07T15:32:42.287909+0000 mon.b [INF] mon.b calling monitor election
2024-08-07T15:32:42.293112+0000 mon.a [INF] mon.a calling monitor election
2024-08-07T15:32:42.301939+0000 mon.a [INF] mon.a is new leader, mons a,b,c in quorum (ranks 0,1,2)
2024-08-07T15:32:42.317047+0000 mon.a [INF] Health check cleared: MON_DOWN (was: 1/3 mons down, quorum a,c)
2024-08-07T15:32:42.328443+0000 mon.a [WRN] Health detail: HEALTH_WARN 1 osds down; 1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set; 1 host (1 osds) down; 1 zone (1 osds) down; Degraded data redundancy: 2128399/6385197 objects degraded (33.333%), 4 pgs degraded
2024-08-07T15:32:42.328475+0000 mon.a [WRN] [WRN] OSD_DOWN: 1 osds down
2024-08-07T15:32:42.328483+0000 mon.a [WRN]     osd.0 (root=default,region=us-south,zone=us-south-2,host=ocs-deviceset-1-data-0gbnf7) is down
2024-08-07T15:32:42.328488+0000 mon.a [WRN] [WRN] OSD_FLAGS: 1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set
2024-08-07T15:32:42.328495+0000 mon.a [WRN]     zone us-south-2 has flags noout
2024-08-07T15:32:42.328505+0000 mon.a [WRN] [WRN] OSD_HOST_DOWN: 1 host (1 osds) down
2024-08-07T15:32:42.328511+0000 mon.a [WRN]     host ocs-deviceset-1-data-0gbnf7 (root=default,region=us-south,zone=us-south-2) (1 osds) is down
2024-08-07T15:32:42.328525+0000 mon.a [WRN] [WRN] OSD_ZONE_DOWN: 1 zone (1 osds) down
2024-08-07T15:32:42.328539+0000 mon.a [WRN]     zone us-south-2 (root=default,region=us-south) (1 osds) is down
2024-08-07T15:32:42.328554+0000 mon.a [WRN] [WRN] PG_DEGRADED: Degraded data redundancy: 2128399/6385197 objects degraded (33.333%), 4 pgs degraded
2024-08-07T15:32:42.328575+0000 mon.a [WRN]     pg 1.0 is active+undersized+degraded, acting [2,1]
2024-08-07T15:32:42.328584+0000 mon.a [WRN]     pg 2.0 is active+undersized+degraded, acting [1,2]
2024-08-07T15:32:42.328590+0000 mon.a [WRN]     pg 3.0 is active+undersized+degraded, acting [2,1]
2024-08-07T15:32:42.328615+0000 mon.a [WRN]     pg 4.0 is active+undersized+degraded, acting [1,2]
2024-08-07T15:32:46.365040+0000 mon.a [WRN] Health check update: Degraded data redundancy: 2128694/6386082 objects degraded (33.333%), 4 pgs degraded (PG_DEGRADED)
2024-08-07T15:32:51.582806+0000 mon.a [WRN] Health check update: Degraded data redundancy: 2133105/6399315 objects degraded (33.333%), 4 pgs degraded (PG_DEGRADED)
2024-08-07T15:33:00.365860+0000 mon.a [WRN] Health check update: Degraded data redundancy: 2132468/6397404 objects degraded (33.333%), 4 pgs degraded (PG_DEGRADED)
2024-08-07T15:33:05.369169+0000 mon.a [WRN] Health check update: Degraded data redundancy: 2131774/6395322 objects degraded (33.333%), 4 pgs degraded (PG_DEGRADED)
2024-08-07T15:33:07.745128+0000 mon.a [INF] Health check cleared: OSD_DOWN (was: 1 osds down)
2024-08-07T15:33:07.745153+0000 mon.a [INF] Health check cleared: OSD_HOST_DOWN (was: 1 host (1 osds) down)
2024-08-07T15:33:07.745179+0000 mon.a [INF] Health check cleared: OSD_ZONE_DOWN (was: 1 zone (1 osds) down)
2024-08-07T15:33:07.788895+0000 mon.a [INF] osd.0 [v2:10.131.0.39:6800/2123740462,v1:10.131.0.39:6801/2123740462] boot
2024-08-07T15:33:07.002817+0000 osd.0 [WRN] OSD bench result of 29015.942307 IOPS exceeded the threshold limit of 500.000000 IOPS for osd.0. IOPS capacity is unchanged at 315.000000 IOPS. The recommendation is to establish the osd's IOPS capacity using other benchmark tools (e.g. Fio) and then override osd_mclock_max_capacity_iops_[hdd|ssd].
2024-08-07T15:33:10.375207+0000 mon.a [WRN] Health check update: Degraded data redundancy: 2144800/6434400 objects degraded (33.333%), 4 pgs degraded (PG_DEGRADED)
2024-08-07T15:33:11.878638+0000 mon.a [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 2144800/6434400 objects degraded (33.333%), 4 pgs degraded)
2024-08-07T15:33:16.388746+0000 mon.a [WRN] Health check failed: Degraded data redundancy: 2147287/6445536 objects degraded (33.314%), 2 pgs degraded (PG_DEGRADED)
2024-08-07T15:33:22.559904+0000 mon.a [WRN] Health check update: Degraded data redundancy: 2146906/6444624 objects degraded (33.313%), 2 pgs degraded (PG_DEGRADED)
2024-08-07T15:33:30.388359+0000 mon.a [WRN] Health check update: Degraded data redundancy: 2155367/6470181 objects degraded (33.312%), 2 pgs degraded (PG_DEGRADED)
2024-08-07T15:33:35.391672+0000 mon.a [WRN] Health check update: Degraded data redundancy: 2168686/6510342 objects degraded (33.311%), 1 pg degraded (PG_DEGRADED)


Version of all relevant components (if applicable):

ceph version 18.2.1-229.el9cp (ef652b206f2487adfc86613646a4cac946f6b4e0) reef (stable)
ocp: 4.17.0-0.nightly-2024-08-06-235322
odf: 4.17.0-65.stable

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
yes--> Impacting automation.

Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Run IO on PVCs created using ceph filesystem
2. When Io consumes more memory in active MDS, please perform node drain where mds is running.
3. Ceph health will give warning about pg degraded and it will be in the same state forever though mds up and running. All pods are running fine.


Actual results:
Observed PG_DEGRADED warnings in ceph health after node drain.

Expected results:
Ceph health should be OK when all pod are up and running even after node drain.

Additional info:

Comment 7 Sunil Kumar Acharya 2024-09-17 10:05:02 UTC

Moving the non-blocker BZs out of ODF-4.17.0 as part of Development Freeze.

Note You need to log in before you can comment on or make changes to this bug.