Bug 2142068

Summary: [cee][cephfs] Snapshot mirror sync stops moving data
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Steve Baldwin <sbaldwin>
Component: CephFSAssignee: Rishabh Dave <ridave>
Status: CLOSED DEFERRED QA Contact: Hemanth Kumar <hyelloji>
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.2CC: akraj, bhull, ceph-eng-bugs, cephqe-warriors, gfarnum, klazarsk, rdave, ridave, snipp, vshankar
Target Milestone: ---Flags: snipp: needinfo?
gfarnum: needinfo-
Target Release: 6.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-23 02:11:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 9 Venky Shankar 2022-11-22 06:26:39 UTC
*** Bug 2143997 has been marked as a duplicate of this bug. ***

Comment 12 Kimberly Lazarski 2022-11-28 17:49:47 UTC
From conversation with Steven - adding our observation during morning sync with client this morning.


Based on the symptoms (hanging up on an inode, not finding the file by inode number, debug logging indicating it's unlinked) it appears an index is built at the beginning of mirroring, then when it reaches that file in the index in memory, it doesn't check whether the inode is still linked any more, and hangs up trying to delete that unlinked-but-possibly-held-open file?  The log entries are complaining about an unlinked file at $HexInode and when we convert that hex to dec and find $/volume -inum $inode no file is returned (find hangs, never finding the file), but to ensure that the MDS is functioning correctly and serving requests we do a find on other files by inode (determined by ls -li) and it returns the result promptly.

Comment 19 Kimberly Lazarski 2022-12-01 17:20:46 UTC
on rsync as a workaround - called from the script or normally from the shell prompt, it stalls.

David decided to strace to find where it was stalling... and lo and behold it is now working. Strace slows down the process, and by virtue of that we suspect it's not hammering the MDS as hard, thereby allowing it to proceed through.