Description of problem: GFS2 filesystem created and populated with directories/files. Unmounted and then ran gfs2_fsck on the filesystem, which reported that dirents were of an 'unknown' type. Upon inspection, it appears to be an endian problem (dirents were type 0x004 instead of 0x400). How reproducible: Always. Steps to Reproduce: 1. Create GFS2 filesystem, create files and directories, unmount. 2. Run gfs2_fsck. Additional info: This is fixed in Steve's git tree.
Ryan, can you post the patch here. Also, needs to be posted on rhkernel-list to be included in the rhel5.
Created attachment 138406 [details] Patch to fix endian bug in gfs2 This is already in the upstream kernel and has been tested.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering. This request is not yet committed for inclusion in release.
QE ack for RHEL5B2 for reasons outlines in comment 4.
*** Bug 211044 has been marked as a duplicate of this bug. ***
in kernel-2.6.18-1.2728.el5
Ummm, I assume kernel-2.6.18-1.2728.el5 is for for upcoming el5 (based on fc6 I hear). I currently have fc6 test3 with updates installed. Is there a fc6 version with it included or where can I get the el5 kernel? Or if most of the GFS2 stuff is to be tested in EL5 now, can I download the beta somewhere? Either is fine with me. All my gfs2 servers are in a test environment. Thanks...
Yes, you assume correctly. The same fix is in FC-6 as well along with some other changes. So the upstream kernel (in my gfs2-2.6-fixes.git git tree) is the most up to date source of GFS2, followed by (in descending order) Linus kernel tree, FC-6 and RHEL5. All of them have a fix for this bug, but you'll need to fsck or remake the filesystem to eliminate existing direcrory entries with unknown type against them I'm afraid. It was an error which only affected the . and .. entries created in new directories. All other directory entries were unaffected by it.
OK... Here is what I have done... Installed kernel-2.6.18-1.2798.fc6. Mount a clean (newly formated volume) on a cluster of three machines. Fsck says it is OK. Started a copy of 40GB data to the new volume. Two times the copy process stopped (1st and 4th time, presumably due to some sort of lock). Unable to terminate copy process. Tried dismounting the volume on another machine and the dismount would hang until the computer (the one doing the copy) was rebooted. The other two times the copy completed, but a fsck would generate errors. Some of the errors were: Starting pass2 Block # referenced by directory entry .. is out of range Clearing .. Block # referenced by directory entry .. is out of range Clearing .. Block # referenced by directory entry .. is out of range Clearing .. Block # referenced by directory entry .. is out of range Clearing .. Block # referenced by directory entry . is out of range Clearing . Block # referenced by directory entry . is out of range Clearing . and Starting pass1 Inode 1872239 (0x1c916f): Ondisk block count (1050643) does not match what fsck found (2067) Inode 3902143 (0x3b8abf): Ondisk block count (525258) does not match what fsck found (1034) Inode 4427608 (0x438f58): Ondisk block count (525258) does not match what fsck found (1034) <--more delete--> and lots of message similiar to: Ondisk and fsck bitmaps differ at block 10415231 (0x9eec7f) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. <--Lots more deleted--> and also : RG #10362148 (0x9e1d24) free count inconsistent: is 18 should be 52991 Unless this kernel does not have the fix in it, there may still be a problem... Does this problem need to be opened with a new bz number?
Yes, please. This doesn't look like the same thing at all. It also looks rather worrying to me. Can you reproduce on a single node, or does this only happen when you are using multiple nodes at once?