Bug 2071085
| Summary: | RHCS5 - MDS_CLIENT_RECALL: clients failing to respond to cache pressure | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | George Law <glaw> |
| Component: | CephFS | Assignee: | Kotresh HR <khiremat> |
| Status: | CLOSED NOTABUG | QA Contact: | Yogesh Mane <ymane> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 5.0 | CC: | assingh, ceph-eng-bugs, gfarnum, gjose, khiremat, nojha, pdonnell, rfriedma, sbaldwin, vereddy, vshankar, xiubli |
| Target Milestone: | --- | Flags: | khiremat:
needinfo-
khiremat: needinfo- |
| Target Release: | 6.1 | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-01-03 15:49:51 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2108656 | ||
| Bug Blocks: | |||
|
Comment 2
Venky Shankar
2022-04-04 14:37:17 UTC
Kotresh,
I'll ask for the full sosreports from both of these nodes
note : The event the CU mentioned was at 23:31 - not 3:50am
May 15 23:31:18 lwtxe04kpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mds-root-lwtxe04kpapd1i-dqrnxn[3712153]: debug 2022-05-16T04:31:18.286+0000 7f84fb568700 1 mds.root.lwtxe04kpapd1i.dqrnxn Map removed me [mds.root.lwtxe04kpapd1i.dqrnxn{0:1133202} state up:standby-replay seq 1 join_fscid=1 addr [v2:171.176.38.198:6800/3401903807,v1:171.176.38.198:6801/3401903807] compat {c=[1],r=[1],i=[7ff]}] from cluster; respawning! See cluster/monitor logs for details.
May 15 23:31:17 lwtxe04kpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mon-lwtxe04kpapd1i[124408]: cluster 2022-05-16T04:31:16.215532+0
000 mgr.lwtxe04hpapd1i.qifige (mgr.1044249) 79907 : cluster [DBG] pgmap v80007: 4161 pgs: 23 active+clean+scrubbing+deep, 4138 active+cle
an; 22 TiB data, 67 TiB used, 117 TiB / 183 TiB avail; 8.6 MiB/s rd, 982 KiB/s wr, 298 op/s
May 15 23:31:17 lwtxe04kpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mon-lwtxe04kpapd1i[124408]: debug 2022-05-16T04:31:17.616+0000 7
f35ac1ff700 1 mon.lwtxe04kpapd1i@3(peon).osd e22029 e22029: 168 total, 168 up, 168 in
May 15 23:31:17 lwtxe04kpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mon-lwtxe04kpapd1i[124408]: debug 2022-05-16T04:31:17.617+0000 7
f35ada02700 1 heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7f35ada02700' had timed out after 0.000000000s
May 15 23:31:17 lwtxe04kpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mon-lwtxe04kpapd1i[124408]: debug 2022-05-16T04:31:17.617+0000 7
f35ad201700 1 heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7f35ad201700' had timed out after 0.000000000s
May 15 23:31:17 lwtxe04kpapd1i ceph-1f483d8e-8469-11ec-8e59-d0bf9cf275c8-mon-lwtxe04kpapd1i[124408]: debug 2022-05-16T04:31:17.647+0000 7
f35ae203700 1 heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7f35ae203700' had timed out after 0.000000000s
Note the timeout after 0.000000000s - also notice there were 23 pgs deep scrubbing at that time.
|