(copying from https://bugzilla.redhat.com/show_bug.cgi?id=2071592#c137) (In reply to Miguel Duaso from comment #137) > Hello Venky, > > > I have tried to follow your instructions on a healthy system, from the ceph > toolbox: > > > sh-4.4$ ceph tell mds.ocs-storagecluster-cephfilesystem:0 flush journal > 2022-08-31T07:49:27.363+0000 7fbfd57fa700 0 client.6545924 ms_handle_reset > on v2:10.129.2.3:6800/819136591 > 2022-08-31T07:49:27.715+0000 7fbfd57fa700 0 client.6545930 ms_handle_reset > on v2:10.129.2.3:6800/819136591 > { > "message": "", > "return_code": 0 > } > sh-4.4$ > > > sh-4.4$ ceph fs ls > name: ocs-storagecluster-cephfilesystem, metadata pool: > ocs-storagecluster-cephfilesystem-metadata, data pools: > [ocs-storagecluster-cephfilesystem-data0 ] > sh-4.4$ > > Customer issue was realted to this fs : > > "fs ocs-storagecluster-cephfilesystem mds.0 is damaged" > > So I follow your instructions with this command: > > sh-4.4$ ceph fs fail ocs-storagecluster-cephfilesystem > ocs-storagecluster-cephfilesystem marked not joinable; MDS cannot join the > cluster. All MDS ranks marked failed. > sh-4.4$ > > Q. How can I unfail this fs (to be done at the end of the test)? ceph fs set <fs_name> joinable true > > > sh-4.4$ ceph -s > cluster: > id: 081a2386-f070-4d4a-95f8-03df4d949bf4 > health: HEALTH_ERR > 1 filesystem is degraded > 1 filesystem is offline > clock skew detected on mon.b, mon.e > > services: > mon: 3 daemons, quorum a,b,e (age 10h) > mgr: a(active, since 8d) > mds: 0/1 daemons up (1 failed), 2 standby > osd: 3 osds: 3 up (since 5d), 3 in (since 5d) > rgw: 1 daemon active (1 hosts, 1 zones) > > data: > volumes: 0/1 healthy, 1 failed > pools: 11 pools, 177 pgs > objects: 4.15k objects, 10 GiB > usage: 31 GiB used, 1.5 TiB / 1.5 TiB avail > pgs: 177 active+clean > > io: > client: 170 B/s rd, 1023 B/s wr, 0 op/s rd, 0 op/s wr > > sh-4.4$ > > sh-4.4$ ceph fs status > ocs-storagecluster-cephfilesystem - 0 clients > ================================= > RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS > 0 failed > POOL TYPE USED AVAIL > ocs-storagecluster-cephfilesystem-metadata metadata 2634k 440G > ocs-storagecluster-cephfilesystem-data0 data 13.7G 440G > STANDBY MDS > ocs-storagecluster-cephfilesystem-a > ocs-storagecluster-cephfilesystem-b > MDS version: ceph version 16.2.0-152.el8cp > (e456e8b705cb2f4a779689a0d80b122bcb0d67c9) pacific (stable) > sh-4.4$ > > > > sh-4.4$ cephfs-journal-tool --rank=ocs-storagecluster-cephfilesystem:0 event > recover_dentries summary > Events by type: > SUBTREEMAP: 1 > UPDATE: 6 > Errors: 0 > sh-4.4$ > > sh-4.4$ cephfs-journal-tool --rank=ocs-storagecluster-cephfilesystem:0 > journal reset > old journal was 9218989~18695 > new journal start will be 12582912 (3345228 bytes past old end) > writing journal head > writing EResetJournal entry > done > sh-4.4$ > > I create the scrip first-damage.sh : > > sh-4.4$ vi first-damage.sh > sh-4.4$ pwd > / > sh-4.4$ chmod 777 first-damage.sh > sh-4.4$ > > > When I run this I get millions of lines like this: > > sh-4.4$ ./first-damage.sh ocs-storagecluster-cephfilesystem-metadata > + main ocs-storagecluster-cephfilesystem-metadata > ++ getopt --name ./first-damage.sh --options r --longoptions help,remove -- > ocs-storagecluster-cephfilesystem-metadata > + eval set -- -- ''\''ocs-storagecluster-cephfilesystem-metadata'\''' > ++ set -- -- ocs-storagecluster-cephfilesystem-metadata > + '[' 2 -gt 0 ']' > + case "$1" in > + shift > + break > + '[' -z ocs-storagecluster-cephfilesystem-metadata ']' > + METADATA_POOL=ocs-storagecluster-cephfilesystem-metadata > + NEXT_SNAP= > + '[' -z '' ']' > ++ mktemp -p /tmp MDS_SNAPTABLE.XXXXXX > + SNAPTABLE=/tmp/MDS_SNAPTABLE.lTAWCQ > + rados --pool=ocs-storagecluster-cephfilesystem-metadata get mds_snaptable > /tmp/MDS_SNAPTABLE.lTAWCQ > ++ dd if=/tmp/MDS_SNAPTABLE.lTAWCQ bs=1 count=1 skip=8 > ++ od --endian=little -An -t u1 > 1+0 records in > 1+0 records out > 1 byte copied, 3.3284e-05 s, 30.0 kB/s > + V=' 5' > + '[' ' 5' -ne 5 ']' > ++ dd if=/tmp/MDS_SNAPTABLE.lTAWCQ bs=1 count=8 skip=14 > ++ od --endian=little -An -t u8 > 8+0 records in > 8+0 records out > 8 bytes copied, 3.2151e-05 s, 249 kB/s > + NEXT_SNAP=2 > + printf 'found latest snap: %d\n' 2 > found latest snap: 2 > + traverse > ++ mktemp -p /tmp MDS_TRAVERSAL.XXXXXX > + local T=/tmp/MDS_TRAVERSAL.nFfvML > + mrados ls > + rados --pool=ocs-storagecluster-cephfilesystem-metadata ls > + grep -E '[[:xdigit:]]{8,}\.[[:xdigit:]]+' > + read obj > ++ mktemp -p /tmp 10000000434.00000000.XXXXXX > + local O=/tmp/10000000434.00000000.lZLNh1 > ++ mrados listomapkeys 10000000434.00000000 > ++ rados --pool=ocs-storagecluster-cephfilesystem-metadata listomapkeys > 10000000434.00000000 > + for dnk in $(mrados listomapkeys "$obj") > + mrados getomapval 10000000434.00000000 0.log_head > /tmp/10000000434.00000000.lZLNh1 > + rados --pool=ocs-storagecluster-cephfilesystem-metadata getomapval > 10000000434.00000000 0.log_head /tmp/10000000434.00000000.lZLNh1 > Writing to /tmp/10000000434.00000000.lZLNh1 > ++ dd if=/tmp/10000000434.00000000.lZLNh1 bs=1 count=4 > ++ od --endian=little -An -t u8 > 4+0 records in > 4+0 records out > 4 bytes copied, 5.2011e-05 s, 76.9 kB/s > + local 'first= 2' > + '[' ' 2' -gt 2 ']' > + for dnk in $(mrados listomapkeys "$obj") > + mrados getomapval 10000000434.00000000 1.log_head > /tmp/10000000434.00000000.lZLNh1 > + rados --pool=ocs-storagecluster-cephfilesystem-metadata getomapval > 10000000434.00000000 1.log_head /tmp/10000000434.00000000.lZLNh1 > Writing to /tmp/10000000434.00000000.lZLNh1 > ++ dd if=/tmp/10000000434.00000000.lZLNh1 bs=1 count=4 > ++ od --endian=little -An -t u8 > 4+0 records in > 4+0 records out > 4 bytes copied, 4.2211e-05 s, 94.8 kB/s > + local 'first= 2' > + '[' ' 2' -gt 2 ']' > .... > .... > > > I understand we need to capture this whole output to a text file and send it > to you for analysis, right? Correct!
Hi Yogesh, Check after flushing the MDS journal.
Hi Miguel, The tool has been patched in the ceph repo with the fix. Use the latest bits from -- https://raw.githubusercontent.com/ceph/ceph/main/src/tools/cephfs/first-damage.sh
*** Bug 2071592 has been marked as a duplicate of this bug. ***