Bug 2071085 - RHCS5 - MDS_CLIENT_RECALL: clients failing to respond to cache pressure
Summary: RHCS5 - MDS_CLIENT_RECALL: clients failing to respond to cache pressure
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: CephFS
Version: 5.0
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: ---
: 6.1
Assignee: Kotresh HR
QA Contact: Yogesh Mane
URL:
Whiteboard:
Depends On: 2108656
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-01 18:51 UTC by George Law
Modified: 2023-01-04 06:59 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-03 15:49:51 UTC
Embargoed:
khiremat: needinfo-
khiremat: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 54701 0 None None None 2022-04-07 09:38:46 UTC
Red Hat Issue Tracker RHCEPH-3903 0 None None None 2022-04-01 18:59:12 UTC

Comment 2 Venky Shankar 2022-04-04 14:37:17 UTC
Ramana - please take a look.

Comment 56 George Law 2022-05-17 13:36:00 UTC
Kotresh,

I'll ask for the full sosreports  from both of these nodes 

note : The event the CU mentioned was at 23:31 - not 3:50am 

May 15 23:31:18 lwtxe04kpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04kpapd1i-dqrnxn[3712153]: debug 2022-05-16T04:31:18.286+0000 7f84fb568700  1 mds.root.lwtxe04kpapd1i.dqrnxn Map removed me [mds.root.lwtxe04kpapd1i.dqrnxn{0:1133202} state up:standby-replay seq 1 join_fscid=1 addr [v2:171.176.38.198:6800/3401903807,v1:171.176.38.198:6801/3401903807] compat {c=[1],r=[1],i=[7ff]}] from cluster; respawning! See cluster/monitor logs for details.


May 15 23:31:17 lwtxe04kpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mon-lwtxe04kpapd1i[124408]: cluster 2022-05-16T04:31:16.215532+0
000 mgr.lwtxe04hpapd1i.qifige (mgr.1044249) 79907 : cluster [DBG] pgmap v80007: 4161 pgs: 23 active+clean+scrubbing+deep, 4138 active+cle
an; 22 TiB data, 67 TiB used, 117 TiB / 183 TiB avail; 8.6 MiB/s rd, 982 KiB/s wr, 298 op/s
May 15 23:31:17 lwtxe04kpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mon-lwtxe04kpapd1i[124408]: debug 2022-05-16T04:31:17.616+0000 7
f35ac1ff700  1 mon.lwtxe04kpapd1i@3(peon).osd e22029 e22029: 168 total, 168 up, 168 in
May 15 23:31:17 lwtxe04kpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mon-lwtxe04kpapd1i[124408]: debug 2022-05-16T04:31:17.617+0000 7
f35ada02700  1 heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7f35ada02700' had timed out after 0.000000000s
May 15 23:31:17 lwtxe04kpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mon-lwtxe04kpapd1i[124408]: debug 2022-05-16T04:31:17.617+0000 7
f35ad201700  1 heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7f35ad201700' had timed out after 0.000000000s
May 15 23:31:17 lwtxe04kpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mon-lwtxe04kpapd1i[124408]: debug 2022-05-16T04:31:17.647+0000 7
f35ae203700  1 heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7f35ae203700' had timed out after 0.000000000s

Note the timeout after 0.000000000s - also notice there were 23 pgs deep scrubbing at that time.


Note You need to log in before you can comment on or make changes to this bug.