Bug 211465 - fsck errors on gfs2 volume
fsck errors on gfs2 volume
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: gfs2-utils (Show other bugs)
6
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Robert Peterson
:
Depends On:
Blocks: 204760 215809
  Show dependency treegraph
 
Reported: 2006-10-19 11:32 EDT by Gary Lindstrom
Modified: 2008-08-02 19:40 EDT (History)
3 users (show)

See Also:
Fixed In Version: FC6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-04-13 17:27:50 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
backtrace after hang of copy on gfs2 (145.94 KB, application/octet-stream)
2006-10-20 12:03 EDT, Gary Lindstrom
no flags Details
gfs2 lock with data=writeback (95.73 KB, application/octet-stream)
2006-10-20 19:02 EDT, Gary Lindstrom
no flags Details
Newer/better backtrace for last problem (143.66 KB, application/octet-stream)
2006-10-20 19:21 EDT, Gary Lindstrom
no flags Details

  None (edit)
Description Gary Lindstrom 2006-10-19 11:32:32 EDT
Description of problem:

This new bug opened at request of swhiteho@redhat.com.  See bz 210493.  Fsck
errors on gfs2 volume, and possible lock problem? on copy of files to gfs2 volume.

Version-Release number of selected component (if applicable):
kernel-2.6.18-1.2798.fc6
gfs2-utils-0.1.7-1.fc6
cman-2.0.18-2.fc6

How reproducible:
Regular
  
Actual results:
fsck errors

Expected results:
clean filesystem

From post under 210493:
OK... Here is what I have done...  Installed kernel-2.6.18-1.2798.fc6.  Mount a
clean (newly formated volume) on a cluster of three machines. Fsck says it is
OK.  Started a copy of 40GB data to the new volume.  Two times the copy process
stopped (1st and 4th time, presumably due to some sort of lock).  Unable to
terminate copy process.  Tried dismounting the volume on another machine and the
dismount would hang until the computer (the one doing the copy) was rebooted.
The other two times the copy completed, but a fsck would generate errors.  Some
of the errors were:

Starting pass2
Block # referenced by directory entry .. is out of range
Clearing ..
Block # referenced by directory entry .. is out of range
Clearing ..
Block # referenced by directory entry .. is out of range
Clearing ..
Block # referenced by directory entry .. is out of range
Clearing ..
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .

and

Starting pass1
Inode 1872239 (0x1c916f): Ondisk block count (1050643) does not match what fsck
found (2067)
Inode 3902143 (0x3b8abf): Ondisk block count (525258) does not match what fsck
found (1034)
Inode 4427608 (0x438f58): Ondisk block count (525258) does not match what fsck
found (1034)
<--more delete-->

and lots of message similiar to:

Ondisk and fsck bitmaps differ at block 10415231 (0x9eec7f)
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Succeeded.
<--Lots more deleted-->

and also :

RG #10362148 (0x9e1d24) free count inconsistent: is 18 should be 52991
Comment 2 Ryan O'Hara 2006-10-19 12:35:25 EDT
Can you specify the parameters used when creating the filesystem? Specifically,
which locking protocol did you use?

Comment 3 Gary Lindstrom 2006-10-19 12:47:47 EDT
OK...
mkfs.gfs2 -O -t fpcl01:vg00lv00 -p lock_dlm -j 8 /dev/fpcl01vg00/fpcl01vg00lv00

first test...  Nothing should be writing to the volume other than the copy. 
Porcess was: mkfs new gfs2 volume.  Mount on all three nodes...  copy 40GB to
volume...  disount on all nodes... run gfs2_fsck... getting:

[root@spool5 /]# time gfs2_fsck -y /dev/fpcl01vg00/fpcl01vg00lv00
Initializing fsck
Clearing journals (this may take a while)....
Journals cleared.
Starting pass1
Inode 199724591 (0xbe78e2f): Ondisk block count (1050643) does not match what
fsck found (2067)
Inode 201754495 (0xc06877f): Ondisk block count (525258) does not match what
fsck found (1034)
Inode 202279960 (0xc0e8c18): Ondisk block count (525258) does not match what
fsck found (1034)
Inode 202805411 (0xc1690a3): Ondisk block count (525258) does not match what
fsck found (1034)
Inode 203330885 (0xc1e9545): Ondisk block count (525258) does not match what
fsck found (1034)
Inode 203856593 (0xc269ad1): Ondisk block count (929588) does not match what
fsck found (1828)
Inode 204889469 (0xc365d7d): Ondisk block count (580952) does not match what
fsck found (1144)
Inode 205470658 (0xc3f3bc2): Ondisk block count (1050643) does not match what
fsck found (2067)
Inode 206635092 (0xc510054): Ondisk block count (580952) does not match what
fsck found (1144)
Inode 207216557 (0xc59dfad): Ondisk block count (1050643) does not match what
fsck found (2067)
Pass1 complete
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c complete
Starting pass2
Block # referenced by directory entry .. is out of range
Clearing ..
Block # referenced by directory entry .. is out of range
Clearing ..
Block # referenced by directory entry .. is out of range
Clearing ..
Block # referenced by directory entry .. is out of range
Clearing ..
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Block # referenced by directory entry . is out of range
Clearing .
Pass2 complete
Starting pass3
Pass3 complete
Starting pass4
Pass4 complete
Starting pass5

Way to much to bother sending you, but got to about 30% or so and then I am
getting lots of:


Ondisk and fsck bitmaps differ at block 199746004 (0xbe7e1d4)
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Succeeded.
Ondisk and fsck bitmaps differ at block 199746005 (0xbe7e1d5)
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Succeeded.
Ondisk and fsck bitmaps differ at block 199746006 (0xbe7e1d6)
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Succeeded.
Ondisk and fsck bitmaps differ at block 199746007 (0xbe7e1d7)
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Succeeded.
Ondisk and fsck bitmaps differ at block 199746008 (0xbe7e1d8)
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Succeeded.
Ondisk and fsck bitmaps differ at block 199746009 (0xbe7e1d9)
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Succeeded.
Ondisk and fsck bitmaps differ at block 199746010 (0xbe7e1da)
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Succeeded.
Ondisk and fsck bitmaps differ at block 199746011 (0xbe7e1db)
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Succeeded.
Ondisk and fsck bitmaps differ at block 199746013 (0xbe7e1dd)
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Succeeded.
Ondisk and fsck bitmaps differ at block 199746014 (0xbe7e1de)
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Succeeded.
Ondisk and fsck bitmaps differ at block 199746015 (0xbe7e1df)
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Succeeded.
Ondisk and fsck bitmaps differ at block 199746016 (0xbe7e1e0)
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Succeeded.
Ondisk and fsck bitmaps differ at block 199746017 (0xbe7e1e1)
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Succeeded.
Ondisk and fsck bitmaps differ at block 199746018 (0xbe7e1e2)
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Succeeded.
Ondisk and fsck bitmaps differ at block 199746019 (0xbe7e1e3)
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Succeeded.
Ondisk and fsck bitmaps differ at block 199746020 (0xbe7e1e4)
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Succeeded.


If history holds like another one of these, it will take 4+ hours to finish now
rather than the standard 30-45 minutes...
Comment 4 Gary Lindstrom 2006-10-19 13:05:59 EDT
In case you are interested or makes a difference:

[root@spool5 ~]# time mkfs.gfs2 -O -t fpcl01:vg00lv00 -p lock_dlm -j 8
/dev/fpcl01vg00/fpcl01vg00lv00
Device:                    /dev/fpcl01vg00/fpcl01vg00lv00
Blocksize:                 4096
Device Size                3019.94 GB (791658496 blocks)
Filesystem Size:           3019.94 GB (791658495 blocks)
Journals:                  8
Resource Groups:           12080
Locking Protocol:          "lock_dlm"
Lock Table:                "fpcl01:vg00lv00"


real    0m36.959s
user    0m12.761s
sys     0m1.924s
Comment 5 Gary Lindstrom 2006-10-19 14:37:14 EDT
It finished ahead of expected time... here is the finish:

Ondisk and fsck bitmaps differ at block 208267582 (0xc69e93e)
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Succeeded.
Ondisk and fsck bitmaps differ at block 208267583 (0xc69e93f)
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Succeeded.
RG #208209294 (0xc69058e) free count inconsistent: is 18 should be 58187
Resource group counts updated
Pass5 complete
Writing changes to disk
gfs2_fsck complete

real    96m20.340s
user    52m46.562s
sys     9m26.135s
Comment 6 Gary Lindstrom 2006-10-19 14:51:30 EDT
And... just did a new mkfs on volume, only mounted the volume on the one node,
started a copy, watched io with vmstat...  All io has ceased for the copy, and
the copy command seems to be hung...  I can leave it in this state for a little
while if someone gets back to me and lets me know if there is anything they want
me to try or do...  It was about 3.5GB into the 40GB of data to copy...

ps ax |grep copy shows:

8558 pts/0    D+     2:08 cp -ar /mnt/fpcl01vg01lv00/copy2
/mnt/fpcl01vg01lv00/etc /mnt/fpcl01vg01lv00/home /mnt/fpcl01vg01lv00....<to wide>
Comment 7 Steve Whitehouse 2006-10-20 03:49:54 EDT
If its still possible it would be very useful to know which of the gfs2 deamons
have hung, and if its possible (needs console access), to get backtraces with
Alt-SysRq-t (you might have to set the right value into /proc/sys/kernel/sysrq
to get that to work).

That should give us some pointers as to what is causing the hang.
Comment 8 Gary Lindstrom 2006-10-20 10:13:11 EDT
I didn't really see any indication of which daemon was hung (not obvious to me
anyway).  Computer has since been rebooted, but it does not appear to be hard to
replicate (just may take a couple tries) so I will try to replicate it again. 
Not really a developer so I haven't used the SysRq interface.  I assume I put 1
into /proc/sys/kernel/sysrq.  Do I just press Alt-SysRq-t and it dumps stuff to
a file, or do I need to do/type something else.  If I have an IP, I would also
be willing to open a hole in firewall for you to gain access to the machine...
Comment 9 Steve Whitehouse 2006-10-20 10:23:01 EDT
You can either do Alt-Sysrq-t or echo 't' >/proc/sysrq-trigger which does the
same thing. To see whether a daemon has hung just do a few "ps aux" and if its
stuck in 'D' all the time, then its probably stuck. You should see 'R' or 'S'
otherwise.

If you want to see more info about what you can get sysrq to do, then there is a
summary in the Documentation/sysrq.txt file from any recent set of kernel source.
Comment 10 Gary Lindstrom 2006-10-20 11:53:09 EDT
Got it...

[root@spool5 ~]# ps ax |grep D
  PID TTY      STAT   TIME COMMAND
  195 ?        D      0:05 [pdflush]
 4791 ?        D<     0:16 [lock_dlm1]
 4792 ?        D<     0:18 [lock_dlm2]
 4818 ?        D<     0:00 [gfs2_logd]
 4819 ?        D<     0:00 [gfs2_quotad]
 4821 pts/0    D+     4:17 cp -ar /mnt/fpcl01vg01lv00/copy2
/mnt/fpcl01vg01lv00/etc /mnt/fpcl01vg01lv00/home /mnt/fpcl01vg01lv00/lost+found
/mnt/fpcl01vg01lv00/save /mnt/fpcl01vg01lv00/tmp /mnt/fpcl01vg01lv00/usr
/mnt/fpcl01vg01lv00/vmware .
 5592 pts/2    S+     0:00 grep D

I'll try getting back traces next...
Comment 11 Gary Lindstrom 2006-10-20 12:03:07 EDT
Created attachment 138994 [details]
backtrace after hang of copy on gfs2

Here you go...	hope it helps....  let me know if it is useful...
Comment 12 Steve Whitehouse 2006-10-20 12:31:24 EDT
It does look very useful, thanks for send us the trace. So far as the hang goes,
it looks like its caused by a deadlock when a "droplocks" callback has been
received. When the lock subsystem thinks its running out of memory due to having
a large number of locks cached, it sends one of these callbacks to the nodes.
The nodes are supposed to respond by writing out any cached data, and dropping
the glocks on their least recently used inodes.

It looks like what happened is that this has then in turn caused a writeout of
dirty data (as it should) but that the transaction code has deadlocked with it
due to asking for a glock.

I have a suspicion that if you mount with data=writeback (rather than the
default data=journal) that you will not see this deadlock. I'll have a think and
see if I can figure out why this glock should get stuck.

It doesn't explain the messages from fsck though. I think they must have a
different cause.
Comment 13 Gary Lindstrom 2006-10-20 13:15:06 EDT
Hmmm...  I don't have the volume mounted on any other nodes at the time, just
the one (spool5 in this case) so there really should not be any reason to write
data or hold locks by the other nodes I would think, but I'll leave that to you
to decide before I make some stupid statement...  The other nodes do have clvm
enabled though, just no gfs2 volume mounted.  I have a few things to do, but I
might try it with clvm not loaded on the others and also the data=writeback a
little later today.

Do we need to split this bz into 2 seperate bz cases, or should we leave it
combined for now...
Comment 14 Gary Lindstrom 2006-10-20 19:02:36 EDT
Created attachment 139038 [details]
gfs2 lock with data=writeback

Tried it with data=writeback if I did it right and it locked again...  At the
time I was doing a df to see how much had been written into certain directories
but this may or may not have had anything to do with it...

ps ax |grep D:
  PID TTY      STAT   TIME COMMAND
 3024 ?        D<     0:00 [lock_dlm1]
 3064 pts/0    D+     0:36 cp -ar /mnt/fpcl01vg01lv00/copy2
/mnt/fpcl01vg01lv00/etc /mnt/fpcl01vg01lv00/home /mnt/fpcl01vg01lv00/lost+found
/mnt/fpcl01vg01lv00/save /mnt/fpcl01vg01lv00/tmp /mnt/fpcl01vg01lv00/usr
/mnt/fpcl01vg01lv00/vmware .
 3129 pts/2    D+     0:06 du -s copy2 etc home lost+found save
 3176 pts/3    S+     0:00 grep D

mount:
/dev/fpcl01vg00/fpcl01vg00lv00 on /mnt/fpcl01vg00lv00 type gfs2
(rw,hostdata=jid=0:id=196609:first=1,data=writeback)
Comment 15 Gary Lindstrom 2006-10-20 19:21:34 EDT
Created attachment 139040 [details]
Newer/better backtrace for last problem

Not exactly sure what happened on last attachment, but this one is more
complete...
Comment 17 Steve Whitehouse 2006-11-06 07:25:19 EST
We think we know what the deadlock is here - its a conflict between truncating
an inodes pages and readpage. Its not related to the fsck errors and we should
have a fix fairly shortly now. I'll post a patch as soon as I have one.
Comment 22 Robert Peterson 2006-11-14 18:09:56 EST
I committed a fix to gfs2_fsck in the HEAD and RHEL5 branches of CVS
so that it will handle this file system condition correctly.
Therefore, I'm changing the status of this bugzilla to modified.

The hang mentioned in previous comments is a separate issue and if it
still needs attention, another bugzilla should be opened to track its
progress.
Comment 23 Steve Whitehouse 2006-11-17 07:22:35 EST
I've just pushed the kernel change for the fsck/dirent problem into my -nmw git
tree.
Comment 24 Matthew Miller 2007-04-06 15:33:41 EDT
Fedora Core 5 and Fedora Core 6 are, as we're sure you've noticed, no longer
test releases. We're cleaning up the bug database and making sure important bug
reports filed against these test releases don't get lost. It would be helpful if
you could test this issue with a released version of Fedora or with the latest
development / test release. Thanks for your help and for your patience.

[This is a bulk message for all open FC5/FC6 test release bugs. I'm adding
myself to the CC list for each bug, so I'll see any comments you make after this
and do my best to make sure every issue gets proper attention.]
Comment 25 Robert Peterson 2007-04-13 17:27:50 EDT
I verified that this is fixed on the "Gold" version of FC6 and 
the initial release of RHEL5.  Closing as CurrentRelease.

Note You need to log in before you can comment on or make changes to this bug.