Bug 471618 - GFS2: gfs2_convert is broken
GFS2: gfs2_convert is broken
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: gfs2-utils (Show other bugs)
All Linux
high Severity high
: rc
: ---
Assigned To: Robert Peterson
Cluster QE
: 472003 475488 (view as bug list)
Depends On: 475488
  Show dependency treegraph
Reported: 2008-11-14 12:43 EST by Robert Peterson
Modified: 2010-01-11 22:41 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-01-20 15:51:52 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Preliminary patch (33.39 KB, patch)
2008-11-14 23:52 EST, Robert Peterson
no flags Details | Diff
A working patch (107.47 KB, patch)
2008-11-26 10:15 EST, Robert Peterson
no flags Details | Diff
Revised patch (107.99 KB, patch)
2008-11-26 15:11 EST, Robert Peterson
no flags Details | Diff
Final patch (110.90 KB, patch)
2008-11-26 16:44 EST, Robert Peterson
no flags Details | Diff
addendum patch (3.10 KB, patch)
2008-12-09 18:01 EST, Robert Peterson
no flags Details | Diff
addendum patch 2 (1.66 KB, patch)
2008-12-10 15:07 EST, Robert Peterson
no flags Details | Diff
Addendum 3 patch (583 bytes, patch)
2008-12-11 14:53 EST, Robert Peterson
no flags Details | Diff

  None (edit)
Description Robert Peterson 2008-11-14 12:43:28 EST
Description of problem:
gfs2_convert does not take into account the fact that GFS1 indirect
blocks use a starting offset for pointers different from GFS2.
That means file systems converted with gfs2_convert will have all their
data, but at the wrong offsets.  Also, recent changes to libgfs2 seem to
have broken gfs2_convert.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. mkfs.gfs1
2. create files with more than 2GB of data, or directories with lots of
   different files.
3. gfs2_convert the file system
4. compare the original files to the converted files
Actual results:

Expected results:

Additional info:
Comment 1 Robert Peterson 2008-11-14 23:52:00 EST
Created attachment 323683 [details]
Preliminary patch

This patch restores basic functionality to gfs2_convert, but more work
needs to be done.  This is not the final patch.  This patch does not
address the problem regarding gfs_indirect blocks versus indirect blocks
in GFS2 which do not have the extra 64-bytes of reserved space.

This patch may look big, but it's not as bad as it looks.
Much of the gfs1-specific code was local to gfs2_edit parts.  But now
gfs2_convert needs to use these functions as well.  Therefore I created
a new part, gfs1.c, and moved all these gfs1-specific functions to that
new part.
Comment 2 Robert Peterson 2008-11-15 21:58:14 EST
I confirmed that the gfs_indirect structure, which doesn't exist in gfs2,
does cause a problem for file systems that have been converted by the
gfs2_convert tool.  Here's what I did:

[root@roth-01 ../bob/cluster/gfs2/convert]# mkfs.gfs -O -b 1024 -t bobs_roth:test_gfs -p lock_dlm -j 3 /dev/roth_vg/roth_lv
Device:                    /dev/roth_vg/roth_lv
Blocksize:                 1024
Filesystem Size:           1073072060
Journals:                  3
Resource Groups:           4096
Locking Protocol:          lock_dlm
Lock Table:                bobs_roth:test_gfs

All Done
[root@roth-01 ../bob/cluster/gfs2/convert]# mount -tgfs /dev/roth_vg/roth_lv /mnt/gfs
[root@roth-01 ../bob/cluster/gfs2/convert]# dd if=/dev/urandom of=/tmp/somejunk bs=1024 count=3
3+0 records in
3+0 records out
3072 bytes (3.1 kB) copied, 0.000999 seconds, 3.1 MB/s
[root@roth-01 ../bob/cluster/gfs2/convert]# dd if=/tmp/somejunk of=/mnt/gfs/bigmama bs=1024 seek=496572346368
3+0 records in
3+0 records out
3072 bytes (3.1 kB) copied, 0.000235 seconds, 13.1 MB/s
[root@roth-01 ../bob/cluster/gfs2/convert]# sync
[root@roth-01 ../bob/cluster/gfs2/convert]# dd if=/mnt/gfs/bigmama of=/tmp/tocompare skip=496572346368 bs=1024 count=3
3+0 records in
3+0 records out
3072 bytes (3.1 kB) copied, 8.6e-05 seconds, 35.7 MB/s
[root@roth-01 ../bob/cluster/gfs2/convert]# diff /tmp/somejunk /tmp/tocompare
[root@roth-01 ../bob/cluster/gfs2/convert]# umount /mnt/gfs
[root@roth-01 ../bob/cluster/gfs2/convert]# ./gfs2_convert /dev/roth_vg/roth_lv 
gfs2_convert version 2 (built Nov 14 2008 17:51:38)
Copyright (C) Red Hat, Inc.  2004-2006  All rights reserved.

Examining file system........................................
This program will convert a gfs1 filesystem to a gfs2 filesystem.
WARNING: This can't be undone.  It is strongly advised that you:

   1. Back up your entire filesystem first.
   2. Run gfs_fsck first to ensure filesystem integrity.
   3. Make sure the filesystem is NOT mounted from any node.
   4. Make sure you have the latest software versions.
Convert /dev/roth_vg/roth_lv from GFS1 to GFS2? (y/n)y
Converting resource groups.........................................
Converting inodes.
7 inodes from 4096 rgs converted.
Fixing file and directory information.
1 directories, 3 dirents fixed.
Converting journals.
Converting journal space to rg space.
Building system structures.
Removing obsolete gfs1 structures.
Committing changes to disk.
/dev/roth_vg/roth_lv: filesystem converted successfully to gfs2.
[root@roth-01 ../bob/cluster/gfs2/convert]# mount -tgfs2 /dev/roth_vg/roth_lv /mnt/gfs2
[root@roth-01 ../bob/cluster/gfs2/convert]# dd if=/mnt/gfs2/bigmama of=/tmp/tocompare skip=496572346368 bs=1024 count=3
3+0 records in
3+0 records out
3072 bytes (3.1 kB) copied, 8.2e-05 seconds, 37.5 MB/s
[root@roth-01 ../bob/cluster/gfs2/convert]# diff /tmp/somejunk /tmp/tocompare
Binary files /tmp/somejunk and /tmp/tocompare differ
[root@roth-01 ../bob/cluster/gfs2/convert]# /home/bob/hdump /tmp/tocompare 
0000  00000000 00000000 00000000 00000000 [................]
0010  00000000 00000000 00000000 00000000 [................]
0020  00000000 00000000 00000000 00000000 [................]
0030  00000000 00000000 00000000 00000000 [................]
0040  00000000 00000000 00000000 00000000 [................]
[root@roth-01 ../bob/cluster/gfs2/convert]# /home/bob/hdump /tmp/somejunk 
0000  5E22B876 28111176 A184E819 7F2418A8 [^".v(..v.....$..]
0010  7829A91C CC77D720 364B90B2 2E4D123B [x)...w. 6K...M.;]
0020  9673BE18 A826C0B8 5617C5A1 1CAC13E6 [.s...&..V.......]
0030  29F65171 A69787EF 62DA2825 1AF7DDC8 [).Qq....b.(%....]
0040  B13F5161 AED75A07 D9EC6E4D D707E6D3 [.?Qa..Z...nM....]

So now I've got to work on a reasonable way to consolidate all
gfs_indirect blocks into blocks with only a gfs2_metaheader on the
front, and that means shifting all the indirect pointers down and
moving block pointers from page to page.  Not an easy thing.
Comment 3 Robert Peterson 2008-11-17 16:29:15 EST
I was hoping for an easy way to fix this by copying block pointers from
block buffer to block buffer, but there's more to it than that.

At heights 0 and 1, things are relatively easy.

At height 2 things get complex.  GFS1 reserves an area of 64 (0x40) bytes
at the start of the indirect block, so for 4K blocks, you can fit 501
pointers.  GFS2 doesn't reserve that space, so you can fit 509 pointers.
For 1K blocks, it's 117 pointers in GFS1 and 125 in GFS2.

That means, for example, that if you have 4K blocks, a 946MB file will
require a height of 3 for GFS, but require a height of 2 for GFS2.
There isn't a good way to shift the pointers around from one height to
another; you would overwrite good pointers and lose information.  So the
only way to do it is to rebuild the entire metadata structure for each file,
and that requires allocating blocks.  We can't just convert them in place.
It will also be very slow running.  I'm having trouble figuring out the
best way to do this.
Comment 4 Steve Whitehouse 2008-11-18 10:00:00 EST
I suspect that the way to do this, is to start at the bottom level of the metadata tree. You can consider that like an array of pointers. You can fix that up first, moving the pointers and packing them as you go. Having done that, then you can use the existing blocks higher in the metadata tree into which can be rebuilt a new metadata tree. Any unused blocks can then be freed, and the inode's block count updated.

The only questions then are how to keep track of where you've got to, in case a conversion needs to be restarted, and how much memory is required in the worst case.
Comment 5 Robert Peterson 2008-11-18 11:00:18 EST
I could be wrong but I don't think that method would work for all cases.
I'm not sure I can explain this well enough.

On very large files, pointers would have to be shifted around.  The dinode
has the same structure size between GFS and GFS2, so it doesn't need to be
packed.  But in order to ensure that no valid pointers were overwritten,
we would have to pack all height==1 blocks first, then height==2 blocks,
then all height==3 blocks, and so forth.  But due to the different number
of pointers that can be kept between GFS and GFS2, each pointer at
height==1 represents a different block offset for GFS and GFS2.

Take my example from comment #3: a file of 946MB and block size 4K.
The height for the file is 3 for GFS, but 2 for GFS2.  For GFS, the block
pointer for the last block would be near the start of level 3.  For GFS2
the same block needs to appear near end of the last level 2 block.

I think I figured out a way to reformat the blocks, but it requires
block allocations be done.
Comment 6 Nate Straz 2008-11-18 12:05:21 EST
*** Bug 472003 has been marked as a duplicate of this bug. ***
Comment 7 Nate Straz 2008-11-18 12:44:36 EST
After getting a patch from Bob to fix the busy loop from 472003 I hit a segfault when running gfs2_convert:

Program terminated with signal 11, Segmentation fault.
[New process 20507]
#0  0x000000000040e6f5 in gfs2_rgrp_free (rglist=0x6198d8, updated=updated)
    at rgrp.c:164
(gdb) list
159     {
160             struct rgrp_list *rgd;
162             while(!osi_list_empty(rglist->next)){
163                     rgd = osi_list_entry(rglist->next, struct rgrp_list, list);
164                     if (rgd->bh && rgd->bh[0] && /* if a buffer exists and       */
165                             rgd->bh[0]->b_count) /* the 1st buffer is allocated */
166                             gfs2_rgrp_relse(rgd, updated); /* free them all. */
167                     if(rgd->bits)
168                             free(rgd->bits);
(gdb) print rgd->bh[0]->b_count
Cannot access memory at address 0xe4d4190
Comment 8 Nate Straz 2008-11-19 11:45:20 EST
I limited the file system size to get around #c7 and I was able to successfully convert an empty GFS to GFS2, but gfs2_fsck didn't like the file system.

 fsck 1.39 (29-May-2006)
 Initializing fsck
 Validating Resource Group index.
 Level 1 RG check.
 (level 1 passed)
 Starting pass1
 Pass1 complete      
 Invalid or missing root system inode.
 Cannot continue without valid root inode
 Starting pass1b
 Pass1b complete
 Starting pass1c
 Pass1c complete
 Starting pass2
 Directory entry '.' at block 25 (0x19) in dir inode 25 (0x19) has an invalid block type: 15.
 Directory entry to non-inode block remains
 Directory entry '..' at block 25 (0x19) in dir inode 25 (0x19) has an invalid block type: 15.
 Directory entry to non-inode block remains
 No '.' entry found for root directory.
 Adding '.' entry
 Pass2 complete      
 Starting pass3
 Pass3 complete      
 Starting pass4
 Pass4 complete      
 Link count inconsistent for inode 25 (0x19) has 0 but fsck found 1.
 Starting pass5
 Link count for inode 25 (0x19) still incorrect
 RG #17 (0x11) free count inconsistent: is 31626 should be 31627
 Inode count inconsistent: is 12 should be 11
 Resource group counts left inconsistent
 Pass5 complete      
 gfs2_fsck: bad write: Bad file descriptor on line 42 of file buf.c
Comment 10 Robert Peterson 2008-11-26 10:15:55 EST
Created attachment 324730 [details]
A working patch

Like its predecessor, this patch is a lot simpler than it looks.
There are two reasons for its huge size:

(1) A bunch of functions that are gfs1-specific were moved from
gfs2_edit's savemeta code to a gfs1-specific "gfs1.c" in libgfs2
so that they could be shared with gfs2_convert.c.  Similarly, a
bunch of gfs1-specific structures were moved into libgfs2.h.

(2) I changed all the functions in buf.c so that the caller can
keep a separate linked list of buffers rather than always working
on one list.  That has two advantages: First and most most
importantly, the user space tools can keep a set of volatile and
a set of non-volatile buffers.  The non-volatile buffers are for
things like resource group bitmaps needed in memory for gfs2_fsck
and/or gfs2_convert.  The volatile buffers are for ordinary gfs2
structures that we fix and forget.  Since these functions are used
extensively throughout the gfs2 user space tools, the changes
affected all the calls.  The change was necessary because the fact
that RG buffers were volatile was causing them to be freed at
inopportune times, and the pointers to them accessed afterward,
causing occasional segfaults when the file system gets very big.
I would like to have rewritten how the whole interface works, but
this way minimizes the impact to all the tools.

Functional changes to the code are as follows:

1. Original problem is fixed

The original problem reported in the bug was caused because
there is a fundamental shift of indirect block pointers between
GFS1 and GFS2.  For indirect blocks, GFS1 added 64 bytes of
reserved space after the gfs meta header.  GFS2 did not.  That
meant GFS1 could hold fewer pointers than GFS2 on the same block.
The meant code had to be added to gfs2_convert to shuffle all
the pointers around to their corresponding gfs2 locations.
This was not a simple calculation because of sparse files.  For
example, with a 1K block size, if you do:

dd if=/mnt/gfs/big of=/tmp/tocompare skip=496572346368 bs=1024 count=1

the resulting metadata paths will look vastly different for the
single block of data:

height    0     1     2     3     4     5
GFS1:  0x16  0x46  0x70  0x11  0x5e  0x4a
GFS2:  0x10  0x21  0x78  0x05  0x14  0x78

It is relatively easy to calculate the new metapath, but you
can't just shuffle a pointer from its old location to its new
location because the destination slot might be used for a
different source slot.  In the example above, we couldn't just
move the pointer at height 3 from offset 0x11 to another block
at offset 0x05 because there might already be a pointer at 0x05
being used for something else.

To complicate matters, we could not just assign new blocks and
copy the data because we should be able to run on a "full" file
system.  So I had to write a complex new function called
adjust_indirect_blocks along with several support functions.
Someone else might be able to write a function to get the job
done simpler, but at least this one works for both sparse and
fully-packed metadata trees of several heights.  It basically
formulates in memory a metadata tree for all the data blocks,
clears out all the metadata buffers, and lays them all back down
again according to the new layout.

2. Improved error checking

I added some badly needed error checking.  That's because I was
running into problems and wanted to eliminate the possibility
that errors were occurring, but not being reported.

3. Improved progress reporting

With a large number of resource groups, I saw long periods of
time where the program would just sit for five minutes.  I was
afraid that the tool had hung, so I kept breaking in with gdb
to check on the tool's progress.  But if I'm that concerned
about what it is doing when it's quiet, so too will be the
customers who are anxiously trying to watch its progress.  So I
added periodic reports of progress through the RGs when converting
inodes, and also an occasional "." when the RGs themselves are
being analyzed and/or manipulated.

Same goes for writing out the new gfs2 journals.  The journals in
gfs2 are quite different and we need to write them out entirely,
all 128MB.  That takes a fair amount of time going through the
functions in buf.c.  So I added a new message when each journal is
written, so that the code doesn't appear stuck for a long period
of time.  This could still use some improvement, but it's better.

I also modified the progress messages to make them more clear.
For example, the message "Removing obsolete gfs1 structures"
sounded to me like it could cause alarm or panic to a customer.
So I rephased it to "Removing obsolete GFS1 file system

4. Journal size was not being preserved

In testing, I discovered that if the source file system had 32MB
journals, the reformatted GFS2 file system had 128MB.  I decided
that's not good and decided to fix it.  That is especially
important if the file system is "full" to begin with because we
can't add more blocks.

5. Fixed Nate's bug

I found and fixed the bug reported in comment #7.
Comment 11 Robert Peterson 2008-11-26 15:11:36 EST
Created attachment 324790 [details]
Revised patch

This version fixes some ugly problems I found during testing.
I'm not promising this one is the final version, but it's close.
Needs more testing.
Comment 12 Robert Peterson 2008-11-26 16:37:47 EST
Here are the basic functional tests I've run that have uncovered so
many bugs.  At this point I am hand-verifying the results of each
step to make sure there were no errors, and I'm just copy-and-pasting
the commands into a terminal session:

convert_test1:  sparse file with height of 6:

mkfs.gfs -J32 -O -b 1024 -t bobs_roth:test_gfs -p lock_dlm -j 3 /dev/roth_vg/bob
mount -tgfs /dev/roth_vg/bob /mnt/gfs
dd if=/tmp/somejunk of=/mnt/gfs/bigmama bs=1024 seek=496572346368
dd if=/mnt/gfs/bigmama of=/tmp/tocompare skip=496572346368 bs=1024 count=3
diff /tmp/somejunk /tmp/tocompare
umount /mnt/gfs
/home/bob/cluster/gfs2/convert/gfs2_convert -y /dev/roth_vg/bob
mount -tgfs2 /dev/roth_vg/bob /mnt/gfs
dd if=/mnt/gfs/bigmama of=/tmp/tocompare skip=496572346368 bs=1024 count=3
umount /mnt/gfs
diff /tmp/somejunk /tmp/tocompare
/home/bob/cluster/gfs2/fsck/fsck.gfs2 /dev/roth_vg/bob

convert_test2: contiguous 5.7MB file, height 2

mkfs.gfs -O -t bobs_roth:test_gfs -p lock_dlm -j 1 /dev/roth_vg/roth_lv
mount -tgfs /dev/roth_vg/roth_lv /mnt/gfs
cp -a /bob_music/Metal/Adagio/Sanctus\ Ignis/01\ -\ Second\ Sight.mp3 /mnt/gfs/
umount /mnt/gfs
/home/bob/cluster/gfs2/convert/gfs2_convert -y /dev/roth_vg/roth_lv
mount -tgfs2 /dev/roth_vg/roth_lv /mnt/gfs
diff /bob_music/Metal/Adagio/Sanctus\ Ignis/01\ -\ Second\ Sight.mp3 /mnt/gfs/01\ -\ Second\ Sight.mp3
umount /mnt/gfs
/home/bob/cluster/gfs2/fsck/fsck.gfs2 /dev/roth_vg/roth_lv

convert_test3: metadata reduction test: contiguous 4MB file.

mkfs.gfs -O -t bobs_roth:test_gfs -p lock_dlm -j 1 /dev/roth_vg/bob
mount -tgfs /dev/roth_vg/bob /mnt/gfs
cp -a /bob_music/Metal/Aerosmith/Pump/01\ -\ Young\ Lust.mp3 /mnt/gfs/
umount /mnt/gfs
/home/bob/cluster/gfs2/convert/gfs2_convert -y /dev/roth_vg/bob
mount -tgfs2 /dev/roth_vg/bob /mnt/gfs
diff /bob_music/Metal/Aerosmith/Pump/01\ -\ Young\ Lust.mp3 /mnt/gfs/01\ -\ Young\ Lust.mp3
umount /mnt/gfs
/home/bob/cluster/gfs2/fsck/fsck.gfs2 /dev/roth_vg/bob

convert_test4: 15GB of misc mp3 data I've used for other tests:

mkfs.gfs -O -t bobs_roth:test_gfs -p lock_dlm -j 1 /dev/roth_vg/roth_lv
mount -tgfs /dev/roth_vg/roth_lv /mnt/gfs
cd /bob_music/Metal/
time cp -a * /mnt/gfs/
umount /mnt/gfs
/home/bob/cluster/gfs2/convert/gfs2_convert -y /dev/roth_vg/roth_lv
mount -tgfs2 /dev/roth_vg/roth_lv /mnt/gfs
diff -r /mnt/gfs /bob_music/Metal
umount /mnt/gfs
diff /tmp/somejunk /tmp/tocompare
/home/bob/cluster/gfs2/fsck/fsck.gfs2 /dev/roth_vg/roth_lv
Comment 13 Robert Peterson 2008-11-26 16:44:13 EST
Created attachment 324803 [details]
Final patch

This patch fixes one last bug I found in testing:

This is best explained with an example:
In a normal 4K block file system a contiguous file that is 4152273
bytes long needs 1014 data blocks.  GFS1 needs 3 gfs_indirect metadata
blocks to keep track of them: (#1) data blocks 0-500, (#2) data blocks
501-1001, and (#3) data blocks 1002=1014.  However, GFS2 only needs 2
metadata blocks to keep track of them: (1) 0-508, (2) 509-1014.  So the
dinode's count of blocks used has to be one less, and that final block
needs to be freed.

So this patch adds the ability to free up metadata blocks that are no
longer needed, and adjust the dinode blocks counter accordingly.

This version has passed all four of the tests given above and will be
committed to git.
Comment 14 Robert Peterson 2008-12-01 14:07:48 EST
Patch tested on system roth-01 with the tests given in comment #12.
Patch was pushed to the master and STABLE2 branches of the cluster
git tree.  Now I'm waiting for approval before I push it to the
RHEL5 and RHEL53 branches.  I also gave a pre-release copy to the
QE people for testing.
Comment 16 Robert Peterson 2008-12-01 17:30:36 EST
I received the necessary ack flags, pushed the patch to the RHEL5
and RHEL53 branches of the cluster git tree.  Changing status to
Modified for inclusion into 5.3.
Comment 17 Steve Whitehouse 2008-12-02 05:47:28 EST
I think we should add a couple more tests:

1. Upgrade of a 100% full gfs1 filesystem (just to be sure that it works ok)
2. Kill an upgrade while its halfway though, and then restart it to see if it completes correctly.

Might also be a useful future addition to fsck if it can detect a partial upgrade in case someone tries to fix an aborted upgrade with fsck. At least we should document somewhere what someone should do if the conversion fails part way though.
Comment 20 Nate Straz 2008-12-04 14:07:48 EST
I'm running into complaints from gfs2_fsck after I convert a 1k block size GFS to GFS2.

Starting pass5
Ondisk and fsck bitmaps differ at block 137 (0x89) 
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Bitmap at block 137 (0x89) left inconsistent
Ondisk and fsck bitmaps differ at block 138 (0x8a) 
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Bitmap at block 138 (0x8a) left inconsistent
[ 4000+ lines of output ]

fsck -t gfs2 returns 0 after all of these warnings and the subsequent verify operation passes.  So there isn't any data corruption going on, but gfs2_fsck isn't happy with the converted file system.
Comment 21 Robert Peterson 2008-12-04 17:05:34 EST
There are two problems cited in comment #20:
(1) The blocks that were supposed to be freed but weren't, and
(2) The return code from gfs2_fsck does not follow the standards
listed in the fsck(8) man page. 

The first problem was caused because the rindex file was so big it
went to height 2, but libgfs2 function gfs2_freedi cannot currently
free blocks above height 1.  I consider this a separate bug.
It may be annoying that gfs2_convert should have freed those blocks
and did not, but IMHO, it's not worth re-spinning the package.  I
recommend opening a new bug and deferring this to 5.4.

The issue of gfs2_fsck not providing a standard return code as
listed in the fsck(8) man page is also a separate problem, also an
annoyance, and should also be fixed in 5.4.  The code was following
the lead of gfs1's gfs_fsck that also provides the wrong return code.
We need to determine if that's worth fixing in 5.4.  I suspect not
because customers may already expect gfs_fsck to return the "wrong"
return code, and may have coded scripts that rely upon that fact.
Comment 22 Robert Peterson 2008-12-04 17:36:59 EST
Bug #474707 was opened to address the first problem in comment #21.
Bug #474705 was opened to address the second problem in comment #21.
Comment 23 Nate Straz 2008-12-09 17:41:23 EST
Unfortunately the fix for this bug introduced a regression in mkfs.gfs2, bug 475488.  Can we get this in for snap6?
Comment 24 Robert Peterson 2008-12-09 18:01:00 EST
Created attachment 326421 [details]
addendum patch

This addendum patch fixes the bug Nate found in mkfs.gfs2, and it's
being tested now.
Comment 25 Robert Peterson 2008-12-09 18:11:19 EST
*** Bug 475488 has been marked as a duplicate of this bug. ***
Comment 27 Robert Peterson 2008-12-10 15:07:44 EST
Created attachment 326541 [details]
addendum patch 2

This patch is an add-on to the previous addendum patch to fix the
problem.  This second addendum consists of two parts: (1) to fix
the problem for mkfs, and (2) to prevent other pieces of user space
code from falling on their face if they make the same assumption.

To in summary:
It turns out that there were three problems with the original patch:

(1) The first problem is where the offsets were getting skewed by
    bad calculations.  This allowed mkfs to start working for some
    cases, but only if the journal size was small.  This is fixed
    by the first addendum patch.
(2) The second problem is that the original patch inadvertently
    reversed the order in which buffers were freed off the block
    list in the event of an overflow.  That meant that the most
    recently used block was first to be freed, which makes it horribly
    inefficient, but likely to uncover bugs such as bug #3.  This is
    fixed in addendum patch 2.
(3) The third problem is due to the fact that the get_first_leaf
    and get_next_leaf code were making the faulty assumption that the
    leaf blocks they're searching will be in memory.  That's not
    guaranteed, but we've been getting away with it for all these
    years because of bug #2. This is fixed in addendum patch 2.

For 5.4 we need to carefully scrutinize every place that function
bget is called in the gfs2 user space code.  We need to make sure
bget is only called in places where the buffer is newly allocated
or unallocated.  Otherwise we can get into problems like #3 again.
Comment 29 Robert Peterson 2008-12-11 14:53:03 EST
Created attachment 326662 [details]
Addendum 3 patch

Nate found one more bug here.  This one has been in the code forever,
but Nate found a test case that uncovers it.  This one-line patch
fixes the problem.  I suggest another respin for the next snap.
Comment 30 Robert Peterson 2008-12-11 15:06:02 EST
Changing status back to ASSIGNED so we can get the addendum 3 patch
included in the next build.
Comment 31 Robert Peterson 2008-12-12 15:16:27 EST
Addendum tested by Nate and pushed to these branches of the cluster
git tree: RHEL5, RHEL53, master, STABLE2.  Changing status back to
Comment 33 Nate Straz 2008-12-16 11:33:00 EST
Verified with gfs2-utils-0.1.53-1.el5.
Comment 35 errata-xmlrpc 2009-01-20 15:51:52 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

Comment 36 Fedora Update System 2009-01-23 21:36:03 EST
gfs2-utils-2.03.11-1.fc9, cman-2.03.11-1.fc9, rgmanager-2.03.11-1.fc9 has been pushed to the Fedora 9 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.