Bug 1717513

Summary: fsck.gfs2 does not fix corruption that causes an error when moving or removing a file on a gfs2 filesystem
Product: Red Hat Enterprise Linux 8 Reporter: Shane Bradley <sbradley>
Component: gfs2-utilsAssignee: Andrew Price <anprice>
Status: ASSIGNED --- QA Contact: cluster-qe <cluster-qe>
Severity: high Docs Contact:
Priority: medium    
Version: 8.0CC: anprice, cluster-maint, dwysocha, gfs2-maint, rhandlin, rpeterso, sbradley, swhiteho
Target Milestone: rc   
Target Release: 8.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-26 20:29:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1721973    
Bug Blocks:    
Attachments:
Description Flags
Checker script none

Comment 1 Robert Peterson 2019-06-05 17:13:58 UTC
Created attachment 1577642 [details]
Checker script

I wrote this tiny script to compare the values of the formal inode
number in the directory entry versus the dinode. Example usage:

Parameters are: <device> <path from that device>

Run it on a known good file:

# /home/bob/tools/check_formal_ino.sh /dev/fsck/case02236799files /fecho/ioc_mestre/gis/eb_docs_02_gfg.sas7bdat
fecho  -> 		 0x3b3ed9d
ioc_mestre  -> 		 0x3b3eda2
gis  -> 		 0x3b40fe4
eb_docs_02_gfg.sas7bdat  -> 		 0x3b50e7c

Dinode: 0x344
Dirent: 0x344

Notice that the dinode agrees with the dirent that the formal inode
number is 0x344. Now run it on a known bad file:

# /home/bob/tools/check_formal_ino.sh /dev/fsck/case02236799files /fecho/ioc_mestre/gis_erro_sistema/eb_docs_02_gfg.sas7bdat
fecho  -> 		 0x3b3ed9d
ioc_mestre  -> 		 0x3b3eda2
gis_erro_sistema  -> 		 0x3b589b7
eb_docs_02_gfg.sas7bdat  -> 		 0x464011c

Dinode: 0x1f4d1
Dirent: 0x1f54b

Comment 4 Andrew Price 2019-06-13 14:11:07 UTC
Test case:

#!/bin/bash

DEV=/dev/foo
MNT=/mnt/test

mkfs.gfs2 -O -p lock_nolock $DEV
mount $DEV $MNT
touch $MNT/foo
INODE=$(ls -i $MNT/foo | cut -f1 -d' ')
umount $MNT
gfs2_edit -p $INODE field di_num.no_formal_ino 42 $DEV
fsck.gfs2 -y $DEV
if [ $? -ne 1 ]; then
	echo "FAIL: fsck.gfs2 did not find and fix the bad dentry" >&2
	exit 1
fi
fsck.gfs2 -n $DEV
if [ $? -ne 0 ]; then
	echo "FAIL: fsck.gfs2 found problems, fs should be clean" >&2
	exit 1
fi

Comment 14 Dave Wysochanski 2020-09-10 20:38:25 UTC
Should this bug be closed or moved to RHEL8?

Comment 18 Dave Wysochanski 2022-09-26 20:29:00 UTC
Doesn't look like anyone has been working on this for 2 years, single case is closed.

Should this be a CLOSE or what is expected?

Comment 20 Dave Wysochanski 2022-09-26 20:31:33 UTC
Sorry, didn't mean to close until owner and reporter agree on next steps.

Comment 21 Andrew Price 2022-09-27 11:20:12 UTC
I would like to keep this one open to track the remaining work. It's blocked behind the general fsck.gfs2 performance work (which itself is blocked at the moment) so it will likely take some time.

Comment 23 Dave Wysochanski 2023-03-20 20:33:23 UTC
(In reply to Andrew Price from comment #21)
> I would like to keep this one open to track the remaining work. It's blocked
> behind the general fsck.gfs2 performance work (which itself is blocked at
> the moment) so it will likely take some time.

Is this still in the works?  Can you give an update on it, and maybe add in depends on bugs as needed?

Comment 24 Andrew Price 2023-03-21 10:27:24 UTC
(In reply to Dave Wysochanski from comment #23)
> Is this still in the works?  Can you give an update on it, and maybe add in
> depends on bugs as needed?

Yes this still needs to be addressed and it's currently sitting on the backlog. Some groundwork for the general performance issues went into gfs2-utils 3.5.0 but we still need to find a way to implement the formal_ino check without the considerable performance hit so the "Cond NAK: Design" is still valid. A depends on 1721973 is already set so I think the bz state is up-to-date.