Bug 210493 - GFS2 dirents are 'unkown' type
GFS2 dirents are 'unkown' type
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Steve Whitehouse
GFS Bugs
:
: 211044 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-10-12 11:26 EDT by Ryan O'Hara
Modified: 2009-05-27 23:33 EDT (History)
5 users (show)

See Also:
Fixed In Version: 5.0.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-11-24 06:51:26 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch to fix endian bug in gfs2 (1.13 KB, patch)
2006-10-13 04:43 EDT, Steve Whitehouse
no flags Details | Diff

  None (edit)
Description Ryan O'Hara 2006-10-12 11:26:51 EDT
Description of problem:

GFS2 filesystem created and populated with directories/files. Unmounted and then
ran gfs2_fsck on the filesystem, which reported that dirents were of an
'unknown' type. Upon inspection, it appears to be an endian problem (dirents
were type 0x004 instead of 0x400).

How reproducible:

Always.

Steps to Reproduce:
1. Create GFS2 filesystem, create files and directories, unmount.
2. Run gfs2_fsck.

Additional info:

This is fixed in Steve's git tree.
Comment 1 Kiersten (Kerri) Anderson 2006-10-12 13:29:43 EDT
Ryan, can you post the patch here.  Also, needs to be posted on rhkernel-list to
be included in the rhel5.
Comment 3 Steve Whitehouse 2006-10-13 04:43:28 EDT
Created attachment 138406 [details]
Patch to fix endian bug in gfs2

This is already in the upstream kernel and has been tested.
Comment 5 RHEL Product and Program Management 2006-10-13 10:04:31 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux release.  Product Management has requested further review
of this request by Red Hat Engineering.  This request is not yet committed for
inclusion in release.
Comment 6 Jay Turner 2006-10-13 11:47:39 EDT
QE ack for RHEL5B2 for reasons outlines in comment 4.
Comment 7 Ryan O'Hara 2006-10-16 18:43:37 EDT
*** Bug 211044 has been marked as a duplicate of this bug. ***
Comment 8 Don Zickus 2006-10-16 22:01:25 EDT
in kernel-2.6.18-1.2728.el5
Comment 9 Gary Lindstrom 2006-10-17 21:51:31 EDT
Ummm, I assume kernel-2.6.18-1.2728.el5 is for for upcoming el5 (based on fc6 I
hear).  I currently have fc6 test3 with updates installed.  Is there a fc6
version with it included or where can I get the el5 kernel?  Or if most of the
GFS2 stuff is to be tested in EL5 now, can I download the beta somewhere? 
Either is fine with me.   All my gfs2 servers are in a test environment.  Thanks...
Comment 10 Steve Whitehouse 2006-10-18 03:56:21 EDT
Yes, you assume correctly. The same fix is in FC-6 as well along with some other
changes. So the upstream kernel (in my gfs2-2.6-fixes.git git tree) is the most
up to date source of GFS2, followed by (in descending order) Linus kernel tree,
FC-6 and RHEL5.

All of them have a fix for this bug, but you'll need to fsck or remake the
filesystem to eliminate existing direcrory entries with unknown type against
them I'm afraid. It was an error which only affected the . and .. entries
created in new directories. All other directory entries were unaffected by it.
Comment 11 Gary Lindstrom 2006-10-18 16:34:22 EDT
OK... Here is what I have done...  Installed kernel-2.6.18-1.2798.fc6.  Mount a
clean (newly formated volume) on a cluster of three machines. Fsck says it is
OK.  Started a copy of 40GB data to the new volume.  Two times the copy process
stopped (1st and 4th time, presumably due to some sort of lock).  Unable to
terminate copy process.  Tried dismounting the volume on another machine and the
dismount would hang until the computer (the one doing the copy) was rebooted. 
The other two times the copy completed, but a fsck would generate errors.  Some
of the errors were:

Starting pass2
Block # referenced by directory entry .. is out of range
Clearing ..
Block # referenced by directory entry .. is out of range
Clearing ..
Block # referenced by directory entry .. is out of range
Clearing ..
Block # referenced by directory entry .. is out of range
Clearing ..
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .

and

Starting pass1
Inode 1872239 (0x1c916f): Ondisk block count (1050643) does not match what fsck
found (2067)
Inode 3902143 (0x3b8abf): Ondisk block count (525258) does not match what fsck
found (1034)
Inode 4427608 (0x438f58): Ondisk block count (525258) does not match what fsck
found (1034)
<--more delete-->

and lots of message similiar to:

Ondisk and fsck bitmaps differ at block 10415231 (0x9eec7f)
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Succeeded.
<--Lots more deleted-->

and also :

RG #10362148 (0x9e1d24) free count inconsistent: is 18 should be 52991


Unless this kernel does not have the fix in it, there may still be a problem...

Does this problem need to be opened with a new bz number?

Comment 12 Steve Whitehouse 2006-10-19 05:08:10 EDT
Yes, please. This doesn't look like the same thing at all. It also looks rather
worrying to me. Can you reproduce on a single node, or does this only happen
when you are using multiple nodes at once?

Note You need to log in before you can comment on or make changes to this bug.