Bug 2111352

Summary: cephfs: tooling to identify inode (metadata) corruption
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Bipin Kunal <bkunal>
Component: CephFSAssignee: Patrick Donnelly <pdonnell>
Status: POST --- QA Contact: Hemanth Kumar <hyelloji>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.2CC: amanzane, ceph-eng-bugs, cephqe-warriors, gfarnum, hyelloji, kdreyer, mcaldeir, mduasope, mmanjuna, owasserm, pdonnell, vereddy, vshankar, vumrao
Target Milestone: ---Keywords: FutureFeature
Target Release: BacklogFlags: gfarnum: needinfo-
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2162312    

Comment 11 Venky Shankar 2022-09-05 12:15:06 UTC
(copying from https://bugzilla.redhat.com/show_bug.cgi?id=2071592#c137)


(In reply to Miguel Duaso from comment #137)
> Hello Venky,
> 
> 
> I have tried to follow your instructions on a healthy system, from the ceph
> toolbox:
> 
> 
> sh-4.4$ ceph tell mds.ocs-storagecluster-cephfilesystem:0 flush journal
> 2022-08-31T07:49:27.363+0000 7fbfd57fa700  0 client.6545924 ms_handle_reset
> on v2:10.129.2.3:6800/819136591
> 2022-08-31T07:49:27.715+0000 7fbfd57fa700  0 client.6545930 ms_handle_reset
> on v2:10.129.2.3:6800/819136591
> {
>     "message": "",
>     "return_code": 0
> }
> sh-4.4$ 
> 
> 
> sh-4.4$ ceph fs ls
> name: ocs-storagecluster-cephfilesystem, metadata pool:
> ocs-storagecluster-cephfilesystem-metadata, data pools:
> [ocs-storagecluster-cephfilesystem-data0 ]
> sh-4.4$ 
> 
> Customer issue was realted to this fs :
> 
> "fs ocs-storagecluster-cephfilesystem mds.0 is damaged"
> 
> So I follow your instructions with this command:
> 
> sh-4.4$ ceph fs fail ocs-storagecluster-cephfilesystem
> ocs-storagecluster-cephfilesystem marked not joinable; MDS cannot join the
> cluster. All MDS ranks marked failed.
> sh-4.4$ 
> 
> Q. How can I unfail this fs (to be done at the end of the test)?

ceph fs set <fs_name> joinable true

> 
> 
> sh-4.4$ ceph -s
>   cluster:
>     id:     081a2386-f070-4d4a-95f8-03df4d949bf4
>     health: HEALTH_ERR
>             1 filesystem is degraded
>             1 filesystem is offline
>             clock skew detected on mon.b, mon.e
>  
>   services:
>     mon: 3 daemons, quorum a,b,e (age 10h)
>     mgr: a(active, since 8d)
>     mds: 0/1 daemons up (1 failed), 2 standby
>     osd: 3 osds: 3 up (since 5d), 3 in (since 5d)
>     rgw: 1 daemon active (1 hosts, 1 zones)
>  
>   data:
>     volumes: 0/1 healthy, 1 failed
>     pools:   11 pools, 177 pgs
>     objects: 4.15k objects, 10 GiB
>     usage:   31 GiB used, 1.5 TiB / 1.5 TiB avail
>     pgs:     177 active+clean
>  
>   io:
>     client:   170 B/s rd, 1023 B/s wr, 0 op/s rd, 0 op/s wr
>  
> sh-4.4$ 
> 
> sh-4.4$ ceph fs status
> ocs-storagecluster-cephfilesystem - 0 clients
> =================================
> RANK  STATE   MDS  ACTIVITY  DNS  INOS  DIRS  CAPS  
>  0    failed                                        
>                    POOL                       TYPE     USED  AVAIL  
> ocs-storagecluster-cephfilesystem-metadata  metadata  2634k   440G  
>  ocs-storagecluster-cephfilesystem-data0      data    13.7G   440G  
>             STANDBY MDS              
> ocs-storagecluster-cephfilesystem-a  
> ocs-storagecluster-cephfilesystem-b  
> MDS version: ceph version 16.2.0-152.el8cp
> (e456e8b705cb2f4a779689a0d80b122bcb0d67c9) pacific (stable)
> sh-4.4$ 
> 
> 
> 
> sh-4.4$ cephfs-journal-tool --rank=ocs-storagecluster-cephfilesystem:0 event
> recover_dentries summary
> Events by type:
>   SUBTREEMAP: 1
>   UPDATE: 6
> Errors: 0
> sh-4.4$ 
> 
> sh-4.4$ cephfs-journal-tool --rank=ocs-storagecluster-cephfilesystem:0
> journal reset                 
> old journal was 9218989~18695
> new journal start will be 12582912 (3345228 bytes past old end)
> writing journal head
> writing EResetJournal entry
> done
> sh-4.4$ 
> 
> I create the scrip first-damage.sh :
> 
> sh-4.4$ vi first-damage.sh
> sh-4.4$ pwd
> /
> sh-4.4$ chmod 777 first-damage.sh 
> sh-4.4$ 
> 
> 
> When I run this I get millions of lines like this:
> 
> sh-4.4$ ./first-damage.sh ocs-storagecluster-cephfilesystem-metadata      
> + main ocs-storagecluster-cephfilesystem-metadata
> ++ getopt --name ./first-damage.sh --options r --longoptions help,remove --
> ocs-storagecluster-cephfilesystem-metadata
> + eval set -- -- ''\''ocs-storagecluster-cephfilesystem-metadata'\'''
> ++ set -- -- ocs-storagecluster-cephfilesystem-metadata
> + '[' 2 -gt 0 ']'
> + case "$1" in
> + shift
> + break
> + '[' -z ocs-storagecluster-cephfilesystem-metadata ']'
> + METADATA_POOL=ocs-storagecluster-cephfilesystem-metadata
> + NEXT_SNAP=
> + '[' -z '' ']'
> ++ mktemp -p /tmp MDS_SNAPTABLE.XXXXXX
> + SNAPTABLE=/tmp/MDS_SNAPTABLE.lTAWCQ
> + rados --pool=ocs-storagecluster-cephfilesystem-metadata get mds_snaptable
> /tmp/MDS_SNAPTABLE.lTAWCQ
> ++ dd if=/tmp/MDS_SNAPTABLE.lTAWCQ bs=1 count=1 skip=8
> ++ od --endian=little -An -t u1
> 1+0 records in
> 1+0 records out
> 1 byte copied, 3.3284e-05 s, 30.0 kB/s
> + V='   5'
> + '[' '   5' -ne 5 ']'
> ++ dd if=/tmp/MDS_SNAPTABLE.lTAWCQ bs=1 count=8 skip=14
> ++ od --endian=little -An -t u8
> 8+0 records in
> 8+0 records out
> 8 bytes copied, 3.2151e-05 s, 249 kB/s
> + NEXT_SNAP=2
> + printf 'found latest snap: %d\n' 2
> found latest snap: 2
> + traverse
> ++ mktemp -p /tmp MDS_TRAVERSAL.XXXXXX
> + local T=/tmp/MDS_TRAVERSAL.nFfvML
> + mrados ls
> + rados --pool=ocs-storagecluster-cephfilesystem-metadata ls
> + grep -E '[[:xdigit:]]{8,}\.[[:xdigit:]]+'
> + read obj
> ++ mktemp -p /tmp 10000000434.00000000.XXXXXX
> + local O=/tmp/10000000434.00000000.lZLNh1
> ++ mrados listomapkeys 10000000434.00000000
> ++ rados --pool=ocs-storagecluster-cephfilesystem-metadata listomapkeys
> 10000000434.00000000
> + for dnk in $(mrados listomapkeys "$obj")
> + mrados getomapval 10000000434.00000000 0.log_head
> /tmp/10000000434.00000000.lZLNh1
> + rados --pool=ocs-storagecluster-cephfilesystem-metadata getomapval
> 10000000434.00000000 0.log_head /tmp/10000000434.00000000.lZLNh1
> Writing to /tmp/10000000434.00000000.lZLNh1
> ++ dd if=/tmp/10000000434.00000000.lZLNh1 bs=1 count=4
> ++ od --endian=little -An -t u8
> 4+0 records in
> 4+0 records out
> 4 bytes copied, 5.2011e-05 s, 76.9 kB/s
> + local 'first=                    2'
> + '[' '                    2' -gt 2 ']'
> + for dnk in $(mrados listomapkeys "$obj")
> + mrados getomapval 10000000434.00000000 1.log_head
> /tmp/10000000434.00000000.lZLNh1
> + rados --pool=ocs-storagecluster-cephfilesystem-metadata getomapval
> 10000000434.00000000 1.log_head /tmp/10000000434.00000000.lZLNh1
> Writing to /tmp/10000000434.00000000.lZLNh1
> ++ dd if=/tmp/10000000434.00000000.lZLNh1 bs=1 count=4
> ++ od --endian=little -An -t u8
> 4+0 records in
> 4+0 records out
> 4 bytes copied, 4.2211e-05 s, 94.8 kB/s
> + local 'first=                    2'
> + '[' '                    2' -gt 2 ']'
> ....
> ....
> 
> 
> I understand we need to capture this whole output to a text file and send it
> to you for analysis, right?

Correct!

Comment 14 Venky Shankar 2022-09-06 12:29:20 UTC
Hi Yogesh,

Check after flushing the MDS journal.

Comment 33 Venky Shankar 2022-09-20 04:09:02 UTC
Hi Miguel,

The tool has been patched in the ceph repo with the fix. Use the latest bits from -- https://raw.githubusercontent.com/ceph/ceph/main/src/tools/cephfs/first-damage.sh

Comment 127 Greg Farnum 2023-03-23 13:47:05 UTC
*** Bug 2071592 has been marked as a duplicate of this bug. ***