Bug 2142068
| Summary: | [cee][cephfs] Snapshot mirror sync stops moving data | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Steve Baldwin <sbaldwin> |
| Component: | CephFS | Assignee: | Rishabh Dave <ridave> |
| Status: | CLOSED DEFERRED | QA Contact: | Hemanth Kumar <hyelloji> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 5.2 | CC: | akraj, bhull, ceph-eng-bugs, cephqe-warriors, gfarnum, klazarsk, rdave, ridave, snipp, vshankar |
| Target Milestone: | --- | Flags: | snipp:
needinfo?
gfarnum: needinfo- |
| Target Release: | 6.1 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-03-23 02:11:46 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 9
Venky Shankar
2022-11-22 06:26:39 UTC
From conversation with Steven - adding our observation during morning sync with client this morning. Based on the symptoms (hanging up on an inode, not finding the file by inode number, debug logging indicating it's unlinked) it appears an index is built at the beginning of mirroring, then when it reaches that file in the index in memory, it doesn't check whether the inode is still linked any more, and hangs up trying to delete that unlinked-but-possibly-held-open file? The log entries are complaining about an unlinked file at $HexInode and when we convert that hex to dec and find $/volume -inum $inode no file is returned (find hangs, never finding the file), but to ensure that the MDS is functioning correctly and serving requests we do a find on other files by inode (determined by ls -li) and it returns the result promptly. on rsync as a workaround - called from the script or normally from the shell prompt, it stalls. David decided to strace to find where it was stalling... and lo and behold it is now working. Strace slows down the process, and by virtue of that we suspect it's not hammering the MDS as hard, thereby allowing it to proceed through. |