Bug 2142068 - [cee][cephfs] Snapshot mirror sync stops moving data
Summary: [cee][cephfs] Snapshot mirror sync stops moving data
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: CephFS
Version: 5.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 6.1
Assignee: Rishabh Dave
QA Contact: Hemanth Kumar
URL:
Whiteboard:
: 2143997 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-11-11 13:39 UTC by Steve Baldwin
Modified: 2023-07-19 04:25 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-03-23 02:11:46 UTC
Embargoed:
snipp: needinfo?
gfarnum: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-5596 0 None None None 2022-11-11 13:52:01 UTC

Comment 9 Venky Shankar 2022-11-22 06:26:39 UTC
*** Bug 2143997 has been marked as a duplicate of this bug. ***

Comment 12 Kimberly Lazarski 2022-11-28 17:49:47 UTC
From conversation with Steven - adding our observation during morning sync with client this morning.


Based on the symptoms (hanging up on an inode, not finding the file by inode number, debug logging indicating it's unlinked) it appears an index is built at the beginning of mirroring, then when it reaches that file in the index in memory, it doesn't check whether the inode is still linked any more, and hangs up trying to delete that unlinked-but-possibly-held-open file?  The log entries are complaining about an unlinked file at $HexInode and when we convert that hex to dec and find $/volume -inum $inode no file is returned (find hangs, never finding the file), but to ensure that the MDS is functioning correctly and serving requests we do a find on other files by inode (determined by ls -li) and it returns the result promptly.

Comment 19 Kimberly Lazarski 2022-12-01 17:20:46 UTC
on rsync as a workaround - called from the script or normally from the shell prompt, it stalls.

David decided to strace to find where it was stalling... and lo and behold it is now working. Strace slows down the process, and by virtue of that we suspect it's not hammering the MDS as hard, thereby allowing it to proceed through.


Note You need to log in before you can comment on or make changes to this bug.