Bug 2111352 - cephfs: tooling to identify inode (metadata) corruption
Summary: cephfs: tooling to identify inode (metadata) corruption
Keywords:
Status: POST
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: CephFS
Version: 4.2
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: ---
: Backlog
Assignee: Patrick Donnelly
QA Contact: Hemanth Kumar
URL:
Whiteboard:
: 2071592 (view as bug list)
Depends On:
Blocks: 2162312
TreeView+ depends on / blocked
 
Reported: 2022-07-27 06:59 UTC by Bipin Kunal
Modified: 2023-07-12 06:31 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
gfarnum: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 56140 0 None None None 2022-07-27 07:00:48 UTC
Github ceph ceph pull 47542 0 None Merged tools/cephfs: add basic detection/cleanup tool for dentry first damage 2022-08-24 05:57:11 UTC
Red Hat Issue Tracker RHCEPH-4939 0 None None None 2022-07-27 07:14:03 UTC
Red Hat Knowledge Base (Solution) 6975619 0 None None None 2022-11-11 15:31:33 UTC

Internal Links: 2071592

Comment 11 Venky Shankar 2022-09-05 12:15:06 UTC
(copying from https://bugzilla.redhat.com/show_bug.cgi?id=2071592#c137)


(In reply to Miguel Duaso from comment #137)
> Hello Venky,
> 
> 
> I have tried to follow your instructions on a healthy system, from the ceph
> toolbox:
> 
> 
> sh-4.4$ ceph tell mds.ocs-storagecluster-cephfilesystem:0 flush journal
> 2022-08-31T07:49:27.363+0000 7fbfd57fa700  0 client.6545924 ms_handle_reset
> on v2:10.129.2.3:6800/819136591
> 2022-08-31T07:49:27.715+0000 7fbfd57fa700  0 client.6545930 ms_handle_reset
> on v2:10.129.2.3:6800/819136591
> {
>     "message": "",
>     "return_code": 0
> }
> sh-4.4$ 
> 
> 
> sh-4.4$ ceph fs ls
> name: ocs-storagecluster-cephfilesystem, metadata pool:
> ocs-storagecluster-cephfilesystem-metadata, data pools:
> [ocs-storagecluster-cephfilesystem-data0 ]
> sh-4.4$ 
> 
> Customer issue was realted to this fs :
> 
> "fs ocs-storagecluster-cephfilesystem mds.0 is damaged"
> 
> So I follow your instructions with this command:
> 
> sh-4.4$ ceph fs fail ocs-storagecluster-cephfilesystem
> ocs-storagecluster-cephfilesystem marked not joinable; MDS cannot join the
> cluster. All MDS ranks marked failed.
> sh-4.4$ 
> 
> Q. How can I unfail this fs (to be done at the end of the test)?

ceph fs set <fs_name> joinable true

> 
> 
> sh-4.4$ ceph -s
>   cluster:
>     id:     081a2386-f070-4d4a-95f8-03df4d949bf4
>     health: HEALTH_ERR
>             1 filesystem is degraded
>             1 filesystem is offline
>             clock skew detected on mon.b, mon.e
>  
>   services:
>     mon: 3 daemons, quorum a,b,e (age 10h)
>     mgr: a(active, since 8d)
>     mds: 0/1 daemons up (1 failed), 2 standby
>     osd: 3 osds: 3 up (since 5d), 3 in (since 5d)
>     rgw: 1 daemon active (1 hosts, 1 zones)
>  
>   data:
>     volumes: 0/1 healthy, 1 failed
>     pools:   11 pools, 177 pgs
>     objects: 4.15k objects, 10 GiB
>     usage:   31 GiB used, 1.5 TiB / 1.5 TiB avail
>     pgs:     177 active+clean
>  
>   io:
>     client:   170 B/s rd, 1023 B/s wr, 0 op/s rd, 0 op/s wr
>  
> sh-4.4$ 
> 
> sh-4.4$ ceph fs status
> ocs-storagecluster-cephfilesystem - 0 clients
> =================================
> RANK  STATE   MDS  ACTIVITY  DNS  INOS  DIRS  CAPS  
>  0    failed                                        
>                    POOL                       TYPE     USED  AVAIL  
> ocs-storagecluster-cephfilesystem-metadata  metadata  2634k   440G  
>  ocs-storagecluster-cephfilesystem-data0      data    13.7G   440G  
>             STANDBY MDS              
> ocs-storagecluster-cephfilesystem-a  
> ocs-storagecluster-cephfilesystem-b  
> MDS version: ceph version 16.2.0-152.el8cp
> (e456e8b705cb2f4a779689a0d80b122bcb0d67c9) pacific (stable)
> sh-4.4$ 
> 
> 
> 
> sh-4.4$ cephfs-journal-tool --rank=ocs-storagecluster-cephfilesystem:0 event
> recover_dentries summary
> Events by type:
>   SUBTREEMAP: 1
>   UPDATE: 6
> Errors: 0
> sh-4.4$ 
> 
> sh-4.4$ cephfs-journal-tool --rank=ocs-storagecluster-cephfilesystem:0
> journal reset                 
> old journal was 9218989~18695
> new journal start will be 12582912 (3345228 bytes past old end)
> writing journal head
> writing EResetJournal entry
> done
> sh-4.4$ 
> 
> I create the scrip first-damage.sh :
> 
> sh-4.4$ vi first-damage.sh
> sh-4.4$ pwd
> /
> sh-4.4$ chmod 777 first-damage.sh 
> sh-4.4$ 
> 
> 
> When I run this I get millions of lines like this:
> 
> sh-4.4$ ./first-damage.sh ocs-storagecluster-cephfilesystem-metadata      
> + main ocs-storagecluster-cephfilesystem-metadata
> ++ getopt --name ./first-damage.sh --options r --longoptions help,remove --
> ocs-storagecluster-cephfilesystem-metadata
> + eval set -- -- ''\''ocs-storagecluster-cephfilesystem-metadata'\'''
> ++ set -- -- ocs-storagecluster-cephfilesystem-metadata
> + '[' 2 -gt 0 ']'
> + case "$1" in
> + shift
> + break
> + '[' -z ocs-storagecluster-cephfilesystem-metadata ']'
> + METADATA_POOL=ocs-storagecluster-cephfilesystem-metadata
> + NEXT_SNAP=
> + '[' -z '' ']'
> ++ mktemp -p /tmp MDS_SNAPTABLE.XXXXXX
> + SNAPTABLE=/tmp/MDS_SNAPTABLE.lTAWCQ
> + rados --pool=ocs-storagecluster-cephfilesystem-metadata get mds_snaptable
> /tmp/MDS_SNAPTABLE.lTAWCQ
> ++ dd if=/tmp/MDS_SNAPTABLE.lTAWCQ bs=1 count=1 skip=8
> ++ od --endian=little -An -t u1
> 1+0 records in
> 1+0 records out
> 1 byte copied, 3.3284e-05 s, 30.0 kB/s
> + V='   5'
> + '[' '   5' -ne 5 ']'
> ++ dd if=/tmp/MDS_SNAPTABLE.lTAWCQ bs=1 count=8 skip=14
> ++ od --endian=little -An -t u8
> 8+0 records in
> 8+0 records out
> 8 bytes copied, 3.2151e-05 s, 249 kB/s
> + NEXT_SNAP=2
> + printf 'found latest snap: %d\n' 2
> found latest snap: 2
> + traverse
> ++ mktemp -p /tmp MDS_TRAVERSAL.XXXXXX
> + local T=/tmp/MDS_TRAVERSAL.nFfvML
> + mrados ls
> + rados --pool=ocs-storagecluster-cephfilesystem-metadata ls
> + grep -E '[[:xdigit:]]{8,}\.[[:xdigit:]]+'
> + read obj
> ++ mktemp -p /tmp 10000000434.00000000.XXXXXX
> + local O=/tmp/10000000434.00000000.lZLNh1
> ++ mrados listomapkeys 10000000434.00000000
> ++ rados --pool=ocs-storagecluster-cephfilesystem-metadata listomapkeys
> 10000000434.00000000
> + for dnk in $(mrados listomapkeys "$obj")
> + mrados getomapval 10000000434.00000000 0.log_head
> /tmp/10000000434.00000000.lZLNh1
> + rados --pool=ocs-storagecluster-cephfilesystem-metadata getomapval
> 10000000434.00000000 0.log_head /tmp/10000000434.00000000.lZLNh1
> Writing to /tmp/10000000434.00000000.lZLNh1
> ++ dd if=/tmp/10000000434.00000000.lZLNh1 bs=1 count=4
> ++ od --endian=little -An -t u8
> 4+0 records in
> 4+0 records out
> 4 bytes copied, 5.2011e-05 s, 76.9 kB/s
> + local 'first=                    2'
> + '[' '                    2' -gt 2 ']'
> + for dnk in $(mrados listomapkeys "$obj")
> + mrados getomapval 10000000434.00000000 1.log_head
> /tmp/10000000434.00000000.lZLNh1
> + rados --pool=ocs-storagecluster-cephfilesystem-metadata getomapval
> 10000000434.00000000 1.log_head /tmp/10000000434.00000000.lZLNh1
> Writing to /tmp/10000000434.00000000.lZLNh1
> ++ dd if=/tmp/10000000434.00000000.lZLNh1 bs=1 count=4
> ++ od --endian=little -An -t u8
> 4+0 records in
> 4+0 records out
> 4 bytes copied, 4.2211e-05 s, 94.8 kB/s
> + local 'first=                    2'
> + '[' '                    2' -gt 2 ']'
> ....
> ....
> 
> 
> I understand we need to capture this whole output to a text file and send it
> to you for analysis, right?

Correct!

Comment 14 Venky Shankar 2022-09-06 12:29:20 UTC
Hi Yogesh,

Check after flushing the MDS journal.

Comment 33 Venky Shankar 2022-09-20 04:09:02 UTC
Hi Miguel,

The tool has been patched in the ceph repo with the fix. Use the latest bits from -- https://raw.githubusercontent.com/ceph/ceph/main/src/tools/cephfs/first-damage.sh

Comment 127 Greg Farnum 2023-03-23 13:47:05 UTC
*** Bug 2071592 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.