Bug 2222231

Summary: mds: allow entries to be removed from lost+found directory
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Venky Shankar <vshankar>
Component: CephFSAssignee: Venky Shankar <vshankar>
Status: CLOSED ERRATA QA Contact: Amarnath <amk>
Severity: high Docs Contact: Akash Raj <akraj>
Priority: unspecified    
Version: 5.2CC: akraj, ceph-eng-bugs, cephqe-warriors, hyelloji, tserlin, vereddy
Target Milestone: ---   
Target Release: 6.1z1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-17.2.6-93.el9cp Doc Type: Bug Fix
Doc Text:
.The recovered files under the `lost+found` directory can now be deleted in Ceph File System With this fix, after recovering a Ceph File System post the disaster recovery the recovered files under the `lost+found` directory can be deleted.
Story Points: ---
Clone Of:
: 2222232 (view as bug list) Environment:
Last Closed: 2023-08-03 16:45:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2221020, 2222232    

Description Venky Shankar 2023-07-12 10:53:14 UTC
Post file system recovery, files which have missing backtraces are recovered in lost+found directory. Users could choose to copy/backup entries from lost+found (file names being inode numbers). Right now, the MDS gates unlinking from lost+found directory with -EROFS which disallows users to cleanup lost+found directory.

Comment 1 RHEL Program Management 2023-07-12 10:53:23 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 8 Amarnath 2023-07-17 12:31:21 UTC
Hi Venky,

As part of creating lost+found directory we have tried below steps

[root@ceph-amk-61-test-xtknkr-node7 ceph-fuse]# rados ls -p cephfs.cephfs.data
10000000201.00000000
100000001fe.00000000
[root@ceph-amk-61-test-xtknkr-node7 ceph-fuse]# rados rmxattr 10000000201.00000000 backtrace -p cephfs.cephfs.data

[root@ceph-amk-61-test-xtknkr-node7 ceph-fuse]# rados rmxattr 100000001fe.00000000 backtrace -p cephfs.cephfs.data

[root@ceph-amk-61-test-xtknkr-node7 ceph-fuse]# ceph fs fail cephfs
cephfs marked not joinable; MDS cannot join the cluster. All MDS ranks marked failed.


[root@ceph-amk-61-test-xtknkr-node7 ceph-fuse]# ceph fs reset cephfs --yes-i-really-mean-it

After reset also we are not able to create lost+found directory.

Could help me with the steps for the same

Regards,
Amarnath

Comment 9 Venky Shankar 2023-07-18 04:16:41 UTC
(In reply to Amarnath from comment #8)
> Hi Venky,
> 
> As part of creating lost+found directory we have tried below steps
> 
> [root@ceph-amk-61-test-xtknkr-node7 ceph-fuse]# rados ls -p
> cephfs.cephfs.data
> 10000000201.00000000
> 100000001fe.00000000
> [root@ceph-amk-61-test-xtknkr-node7 ceph-fuse]# rados rmxattr
> 10000000201.00000000 backtrace -p cephfs.cephfs.data
> 
> [root@ceph-amk-61-test-xtknkr-node7 ceph-fuse]# rados rmxattr
> 100000001fe.00000000 backtrace -p cephfs.cephfs.data
> 
> [root@ceph-amk-61-test-xtknkr-node7 ceph-fuse]# ceph fs fail cephfs
> cephfs marked not joinable; MDS cannot join the cluster. All MDS ranks
> marked failed.
> 
> 
> [root@ceph-amk-61-test-xtknkr-node7 ceph-fuse]# ceph fs reset cephfs
> --yes-i-really-mean-it

Just by resetting the file system will automagically create the lost+found directory. You need to run through the metadata recovery steps as detailed here

> https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#recovery-from-missing-metadata-objects

Since the bactrace xattr is missing, the data scan tool would dump the file in lost+found directory (under /) with the file name as the inode number.

Comment 10 Amarnath 2023-07-18 08:00:28 UTC
Hi All,

Thanks Venky,

We are able to create lost+found folder.

We are able to delete the contents of it

[root@ceph-amk-fs-tier2-js5wzs-node8 ceph-fuse]# ls -lrt
total 1
drwxr-xr-x. 2 root root  0 Jul 18 02:31 test
-rw-r--r--. 1 root root 25 Jul 18 02:31 test_lost.txt
[root@ceph-amk-fs-tier2-js5wzs-node8 ceph-fuse]# rados ls -p cephfs.cephfs.data
100000060dd.00000000
[root@ceph-amk-fs-tier2-js5wzs-node8 ceph-fuse]# rados rmxattr 100000060dd.00000000 backtrace -p cephfs.cephfs.data
[root@ceph-amk-fs-tier2-js5wzs-node8 ceph-fuse]# 
[root@ceph-amk-fs-tier2-js5wzs-node8 ceph-fuse]# 
[root@ceph-amk-fs-tier2-js5wzs-node8 ceph-fuse]# cephfs-table-tool all reset session
Error ((22) Invalid argument)
2023-07-18T02:40:07.954-0400 7f9899e0cfc0 -1 main: Bad rank selection: all'

[root@ceph-amk-fs-tier2-js5wzs-node8 ceph-fuse]# cephfs-table-tool cephfs:0 reset session
{
    "0": {
        "data": {},
        "result": 0
    }
}

[root@ceph-amk-fs-tier2-js5wzs-node8 ceph-fuse]# ceph fs reset cephfs --yes-i-really-mean-it
Error EINVAL: all MDS daemons must be inactive before resetting filesystem: set the cluster_down flag and use `ceph mds fail` to make this so
[root@ceph-amk-fs-tier2-js5wzs-node8 ceph-fuse]# ceph mds fail
Invalid command: missing required parameter role_or_gid(<string>)
mds fail <role_or_gid> :  Mark MDS failed: trigger a failover if a standby is available
Error EINVAL: invalid command
[root@ceph-amk-fs-tier2-js5wzs-node8 ceph-fuse]# ceph fs fail
Invalid command: missing required parameter fs_name(<string>)
fs fail <fs_name> :  bring the file system down and all of its ranks
Error EINVAL: invalid command
[root@ceph-amk-fs-tier2-js5wzs-node8 ceph-fuse]# ceph fs fail caphfs
Error ENOENT: Filesystem not found: 'caphfs'
[root@ceph-amk-fs-tier2-js5wzs-node8 ceph-fuse]# ceph fs fail cephfs
cephfs marked not joinable; MDS cannot join the cluster. All MDS ranks marked failed.
[root@ceph-amk-fs-tier2-js5wzs-node8 ceph-fuse]# ceph fs reset cephfs --yes-i-really-mean-it
[root@ceph-amk-fs-tier2-js5wzs-node8 ceph-fuse]# ls -lrt
total 1
drwxr-xr-x. 2 root root 0 Dec 31  1969 lost+found
[root@ceph-amk-fs-tier2-js5wzs-node8 ceph-fuse]# cd lost+found/
[root@ceph-amk-fs-tier2-js5wzs-node8 lost+found]# ls -lrt
total 1
-r-x------. 1 root root 25 Jul 18 02:32 100000060dd
[root@ceph-amk-fs-tier2-js5wzs-node8 lost+found]# 
[root@ceph-amk-fs-tier2-js5wzs-node8 lost+found]# 
[root@ceph-amk-fs-tier2-js5wzs-node8 lost+found]# rm -rf 100000060dd 
[root@ceph-amk-fs-tier2-js5wzs-node8 lost+found]# ls -lrt
total 0
[root@ceph-amk-fs-tier2-js5wzs-node8 lost+found]# 

Verified on 
[root@ceph-amk-fs-tier2-js5wzs-node8 lost+found]# ceph versions
{
    "mon": {
        "ceph version 17.2.6-98.el9cp (b53d9dfff6b1021fbab8e28f2b873d7d49cf5665) quincy (stable)": 3
    },
    "mgr": {
        "ceph version 17.2.6-98.el9cp (b53d9dfff6b1021fbab8e28f2b873d7d49cf5665) quincy (stable)": 2
    },
    "osd": {
        "ceph version 17.2.6-98.el9cp (b53d9dfff6b1021fbab8e28f2b873d7d49cf5665) quincy (stable)": 12
    },
    "mds": {
        "ceph version 17.2.6-98.el9cp (b53d9dfff6b1021fbab8e28f2b873d7d49cf5665) quincy (stable)": 5
    },
    "overall": {
        "ceph version 17.2.6-98.el9cp (b53d9dfff6b1021fbab8e28f2b873d7d49cf5665) quincy (stable)": 22
    }
}
[root@ceph-amk-fs-tier2-js5wzs-node8 lost+found]# 

Regards,
Amarnath

Comment 12 errata-xmlrpc 2023-08-03 16:45:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 6.1 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:4473