210493 – GFS2 dirents are 'unkown' type

Bug 210493 - GFS2 dirents are 'unkown' type

Summary: GFS2 dirents are 'unkown' type

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Steve Whitehouse
QA Contact:	GFS Bugs
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	211044 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-10-12 15:26 UTC by Ryan O'Hara
Modified:	2009-05-28 03:33 UTC (History)
CC List:	5 users (show)
Fixed In Version:	5.0.0
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-11-24 11:51:26 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Patch to fix endian bug in gfs2 (1.13 KB, patch) 2006-10-13 08:43 UTC, Steve Whitehouse	no flags	Details \| Diff
View All

Description Ryan O'Hara 2006-10-12 15:26:51 UTC

Description of problem:

GFS2 filesystem created and populated with directories/files. Unmounted and then
ran gfs2_fsck on the filesystem, which reported that dirents were of an
'unknown' type. Upon inspection, it appears to be an endian problem (dirents
were type 0x004 instead of 0x400).

How reproducible:

Always.

Steps to Reproduce:
1. Create GFS2 filesystem, create files and directories, unmount.
2. Run gfs2_fsck.

Additional info:

This is fixed in Steve's git tree.

Comment 1 Kiersten (Kerri) Anderson 2006-10-12 17:29:43 UTC

Ryan, can you post the patch here.  Also, needs to be posted on rhkernel-list to
be included in the rhel5.

Comment 3 Steve Whitehouse 2006-10-13 08:43:28 UTC

Created attachment 138406 [details]
Patch to fix endian bug in gfs2

This is already in the upstream kernel and has been tested.

Comment 5 RHEL Program Management 2006-10-13 14:04:31 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux release.  Product Management has requested further review
of this request by Red Hat Engineering.  This request is not yet committed for
inclusion in release.

Comment 6 Jay Turner 2006-10-13 15:47:39 UTC

QE ack for RHEL5B2 for reasons outlines in comment 4.

Comment 7 Ryan O'Hara 2006-10-16 22:43:37 UTC

*** Bug 211044 has been marked as a duplicate of this bug. ***

Comment 8 Don Zickus 2006-10-17 02:01:25 UTC

in kernel-2.6.18-1.2728.el5

Comment 9 Gary Lindstrom 2006-10-18 01:51:31 UTC

Ummm, I assume kernel-2.6.18-1.2728.el5 is for for upcoming el5 (based on fc6 I
hear).  I currently have fc6 test3 with updates installed.  Is there a fc6
version with it included or where can I get the el5 kernel?  Or if most of the
GFS2 stuff is to be tested in EL5 now, can I download the beta somewhere? 
Either is fine with me.   All my gfs2 servers are in a test environment.  Thanks...

Comment 10 Steve Whitehouse 2006-10-18 07:56:21 UTC

Yes, you assume correctly. The same fix is in FC-6 as well along with some other
changes. So the upstream kernel (in my gfs2-2.6-fixes.git git tree) is the most
up to date source of GFS2, followed by (in descending order) Linus kernel tree,
FC-6 and RHEL5.

All of them have a fix for this bug, but you'll need to fsck or remake the
filesystem to eliminate existing direcrory entries with unknown type against
them I'm afraid. It was an error which only affected the . and .. entries
created in new directories. All other directory entries were unaffected by it.

Comment 11 Gary Lindstrom 2006-10-18 20:34:22 UTC

OK... Here is what I have done...  Installed kernel-2.6.18-1.2798.fc6.  Mount a
clean (newly formated volume) on a cluster of three machines. Fsck says it is
OK.  Started a copy of 40GB data to the new volume.  Two times the copy process
stopped (1st and 4th time, presumably due to some sort of lock).  Unable to
terminate copy process.  Tried dismounting the volume on another machine and the
dismount would hang until the computer (the one doing the copy) was rebooted. 
The other two times the copy completed, but a fsck would generate errors.  Some
of the errors were:

Starting pass2
Block # referenced by directory entry .. is out of range
Clearing ..
Block # referenced by directory entry .. is out of range
Clearing ..
Block # referenced by directory entry .. is out of range
Clearing ..
Block # referenced by directory entry .. is out of range
Clearing ..
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .

and

Starting pass1
Inode 1872239 (0x1c916f): Ondisk block count (1050643) does not match what fsck
found (2067)
Inode 3902143 (0x3b8abf): Ondisk block count (525258) does not match what fsck
found (1034)
Inode 4427608 (0x438f58): Ondisk block count (525258) does not match what fsck
found (1034)
<--more delete-->

and lots of message similiar to:

Ondisk and fsck bitmaps differ at block 10415231 (0x9eec7f)
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Succeeded.
<--Lots more deleted-->

and also :

RG #10362148 (0x9e1d24) free count inconsistent: is 18 should be 52991


Unless this kernel does not have the fix in it, there may still be a problem...

Does this problem need to be opened with a new bz number?

Comment 12 Steve Whitehouse 2006-10-19 09:08:10 UTC

Yes, please. This doesn't look like the same thing at all. It also looks rather
worrying to me. Can you reproduce on a single node, or does this only happen
when you are using multiple nodes at once?

Note You need to log in before you can comment on or make changes to this bug.