Description of problem: Our test suite for growfs on GFS doesn't work on GFS2 after updating the commands because the file system size doesn't update until after the gfs2_grow command exits. Version-Release number of selected component (if applicable): gfs2-utils-0.1.49-1.el5 How reproducible: 100% Steps to Reproduce: 1. df /mnt/gfs2 2. lvextend 3. gfs2_grow /mnt/gfs2; df /mnt/gfs2 4. Compare output from 1 and 3 Actual results: lvextend -l +50%FREE growfs/gfs2 on west-02 growing gfs2 on west-01 verifying grow size of gfs /mnt/gfs2 did not increase, was: 79008, is now: 79008 after 1 seconds Expected results: The new size should be available immediately after gfs2_grow exits. Additional info:
Moving this out to RHEL 5.4. This could cause problems with management tools which expect the grow to work right away, but it's too late in the 5.3 cycle to get this in.
We also need to look into what happens when we add new journals to a live filesystem. Currently they seem to be ignored, so that if a node were to mount the newly created journal and then fail, its journal might not be recoverable by one of the previously existing nodes. This is a result of changing the jindex from a special file to a directory I think, as we no longer keep the shared lock on it all the time, like we used to. I spotted this recently when looking at the recovery code.
While fixing this bug and testing the fix, I found another related nasty bug in gfs2_grow. It relates to alternate block sizes. Here is the symptom: [root@roth-01 ../src/redhat/RPMS/x86_64]# lvcreate --name roth_lv -L 5G /dev/roth_vg Logical volume "roth_lv" created [root@roth-01 ../src/redhat/RPMS/x86_64]# mkfs.gfs2 -O -b1024 -t bobs_roth:test_gfs -p lock_dlm -j 1 /dev/roth_vg/roth_lv Device: /dev/roth_vg/roth_lv Blocksize: 1024 Device Size 5.00 GB (5242880 blocks) Filesystem Size: 5.00 GB (5242878 blocks) Journals: 1 Resource Groups: 20 Locking Protocol: "lock_dlm" Lock Table: "bobs_roth:test_gfs" [root@roth-01 ../src/redhat/RPMS/x86_64]# mount -tgfs2 /dev/roth_vg/roth_lv /mnt/gfs2 [root@roth-01 ../src/redhat/RPMS/x86_64]# /usr/sbin/lvresize -L +1T /dev/roth_vg/roth_lv Extending logical volume roth_lv to 1.00 TB Logical volume roth_lv successfully resized [root@roth-01 ../src/redhat/RPMS/x86_64]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 71G 58G 9.1G 87% / /dev/sda1 99M 94M 220K 100% /boot tmpfs 279M 0 279M 0% /dev/shm /dev/mapper/roth_vg-roth_lv 5.0G 131M 4.9G 3% /mnt/gfs2 [root@roth-01 ../src/redhat/RPMS/x86_64]# gfs2_grow /mnt/gfs2 ; df -h FS: Mount Point: /mnt/gfs2 FS: Device: /dev/mapper/roth_vg-roth_lv FS: Size: 5242878 (0x4ffffe) FS: RG size: 262140 (0x3fffc) DEV: Size: 269746176 (0x10140000) The file system grew by 258304MB. gfs2_grow complete. Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 71G 58G 9.1G 87% / /dev/sda1 99M 94M 220K 100% /boot tmpfs 279M 0 279M 0% /dev/shm /dev/mapper/roth_vg-roth_lv 257G 131M 257G 1% /mnt/gfs2 [root@roth-01 ../src/redhat/RPMS/x86_64]# So I extended the partition size by 1TB, but gfs2_grow only allocated enough resource groups for one fourth of that, or 256G. I debugged that problem and will post a patch shortly. We will definitely want to z-stream this one for 5.3.z.
Created attachment 329604 [details] Patch to fix the problem This patch was tested on system roth-01.
The same commands/output from comment #3, but with the patch applied: [root@roth-01 ../bob/cluster/gfs2/mkfs]# lvcreate --name roth_lv -L 5G /dev/roth_vg Logical volume "roth_lv" created [root@roth-01 ../bob/cluster/gfs2/mkfs]# mkfs.gfs2 -O -b1024 -t bobs_roth:test_gfs -p lock_dlm -j 1 /dev/roth_vg/roth_lv Device: /dev/roth_vg/roth_lv Blocksize: 1024 Device Size 5.00 GB (5242880 blocks) Filesystem Size: 5.00 GB (5242878 blocks) Journals: 1 Resource Groups: 20 Locking Protocol: "lock_dlm" Lock Table: "bobs_roth:test_gfs" [root@roth-01 ../bob/cluster/gfs2/mkfs]# mount -tgfs2 /dev/roth_vg/roth_lv /mnt/gfs2 [root@roth-01 ../bob/cluster/gfs2/mkfs]# /usr/sbin/lvresize -L +1T /dev/roth_vg/roth_lv Extending logical volume roth_lv to 1.00 TB Logical volume roth_lv successfully resized [root@roth-01 ../bob/cluster/gfs2/mkfs]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 71G 58G 9.1G 87% / /dev/sda1 99M 94M 220K 100% /boot tmpfs 279M 0 279M 0% /dev/shm /dev/mapper/roth_vg-roth_lv 5.0G 131M 4.9G 3% /mnt/gfs2 [root@roth-01 ../bob/cluster/gfs2/mkfs]# ./gfs2_grow /mnt/gfs2 ; df -h FS: Mount Point: /mnt/gfs2 FS: Device: /dev/mapper/roth_vg-roth_lv FS: Size: 5242878 (0x4ffffe) FS: RG size: 262140 (0x3fffc) DEV: Size: 1078984704 (0x40500000) The file system grew by 1048576MB. gfs2_grow complete. Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 71G 58G 9.1G 87% / /dev/sda1 99M 94M 220K 100% /boot tmpfs 279M 0 279M 0% /dev/shm /dev/mapper/roth_vg-roth_lv 1.1T 131M 1.1T 1% /mnt/gfs2 [root@roth-01 ../bob/cluster/gfs2/mkfs]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 71G 58G 9.1G 87% / /dev/sda1 99M 94M 220K 100% /boot tmpfs 279M 0 279M 0% /dev/shm /dev/mapper/roth_vg-roth_lv 1.1T 131M 1.1T 1% /mnt/gfs2 [root@roth-01 ../bob/cluster/gfs2/mkfs]#
I re-ran our growfs test script and was able to reproduce this with gfs2-utils-0.1.53-1.el5. The test script does multiple file system grows in a row. In this case the second grow did not immediately return the new size. Starting io load to filesystems adding /dev/sdb10 to VG growfs on dash-03 lvextend -l +50%FREE growfs/gfs1 on dash-02 growing gfs1 on dash-03 verifying grow lvextend -l +50%FREE growfs/gfs2 on dash-03 growing gfs2 on dash-01 verifying grow size of gfs /mnt/gfs2 did not increase, was: 265702, is now: 265702 To Reproduce: 1. /usr/tests/sts-rhel5.3/gfs/bin/growfs -2 -i 1
Created attachment 329621 [details] Patch for the block size problem There are two problems to be fixed: (1) The non-default block size problem, and (2) The fact that OTHER NODES do not see changes made by gfs2_grow until some time after gfs2_grow ends, due to fast_statfs. The previously posted patch does not fix problem 2. This patch fixes problem 1 only. After some discussion on irc, we decided that problem 2 should be fixed rather than documented around, and that the solution may very well involve the gfs2 kernel module. So it's likely we'll need a kernel bug record as well. Steve's suggestion was to make statfs check the rindex to see if it has changed. "In the unlikely event that it has changed, we go back to slow statfs [code path] for just the one call." Nate also discovered that gfs's fast_statfs feature has the same problem but it's apparently worse: it never re-syncs on the other nodes. If fast_statfs is not used for gfs, the file system size is cluster coherent (i.e. the bug does not recreate on gfs1 unless fast_statfs is used). I think we've know this is broken for a very long time. I'm not sure it's easy to fix for gfs, and I'm not sure it's worth it. But we do need to fix gfs2. When we come up with a solution for problem 2, I'll likely use this bugzilla to fix that, and open another for problem 1.
Even though the symptoms are the same, there is a user space problem and another problem that will likely be fixed in the gfs2 kernel code. My intent is now to fix problem #1 (user space) described in comment #7 using this bug record. I cloned this record to bug #482756 so we can do the kernel portion there. This fix can be shipped independently though.
Incidentally, the patch was pushed to the master branch of the gfs2-utils git repo, and the STABLE2 and STABLE3 branches of the cluster git repo.
This patch is now pushed to the RHEL5 branch of the cluster git repo for inclusion into 5.4. It was tested on roth-01. So I'm changing the status to MODIFIED. This problem is serious enough that I think we need to z-stream it. I'm bumping the priority and severity to reflect that. I'm also adding Ben Kahn and Chris Feist to the cc list toward that end.
How close should gfs2_grow get to filling the block device? In testing with gfs2-utils-0.1.53-1.el5_3.1 I was still about 680MB short of the end of the block device and the RG size was 256MB. growing gfs1 on z1 FS: Mount Point: /mnt/gfs1 FS: Device: /dev/mapper/growfs-gfs1 FS: Size: 7139327 (0x6cefff) FS: RG size: 254973 (0x3e3fd) DEV: Size: 14282752 (0xd9f000) The file system grew by 6976MB. gfs2_grow complete. ... File system didn't grow to fill volume fs = 13946, lv = 14625.54 The last two numbers are both in MB.
Unlike gfs, gfs2_grow adds new space on even resource group (RG) boundaries. That has the advantage that the rindex file can be rebuilt in gfs2_fsck using simple block calculations. The disadvantage is that gfs2_grow may leave some space at the end of the device that is unusable unless/until the device is extended to the next RG boundary. The "free space" returned by df will show the space minus the blocks used by the new RGs and their bitmaps, so the only way to tell whether there's a problem is for me to examine file system with gfs2_edit. Given the rg size shown above, if there really is 680MB of space unaccounted for in the file system, including the RG and bitmap space, then that would be a bug. I would expect less than 256MB after the last RG and its bitmaps. But again, I'd want to take a look to see how everything was laid out.
Regarding comment #13: I examined Nate's gfs2 file system with gfs2_edit and determined that gfs2_grow apparently did the right thing. Here is exactly what I did: From gfs2_edit I first determined that the file system block size is 1K. Then I got the device size (in terms of the 1K block size): [root@z1 tool]# gfs2_edit -p size /dev/growfs/gfs1 | head -1 Device size: 14282752 (0xd9f000) So the actual device size is 0xd9f000 blocks of 1K. Next, I printed out the last two entries of the rindex file: root@z1 tool]# gfs2_edit -p rindex /dev/growfs/gfs1 | tail -13 RG #54 ri_addr 13768626 0xd217b2 ri_length 64 0x40 ri_data0 13768690 0xd217f2 ri_data 254908 0x3e3bc ri_bitbytes 63727 0xf8ef RG #55 ri_addr 14023599 0xd5fbaf ri_length 64 0x40 ri_data0 14023663 0xd5fbef ri_data 254908 0x3e3bc ri_bitbytes 63727 0xf8ef Then I did the math. The space between RGs is 0xd5fbaf - 0xd217b2 which equals: 0x3e3fd. So if gfs2_grow wanted to add another RG to the file system, we would get: 0xd5fbaf + 0x3e3fd = 0xddc3a9. That value is beyond the end of the device, 0xd9f000, from step 1. Therefore, gfs2_grow could not possibly have added another full RG after the last one. Note that in this particular case, the bitmaps take up 64 blocks of 1K each (as shown in ri_length) which means free space in df will be missing that many blocks for each RG due to the space reserved for bitmaps.
*** Bug 492932 has been marked as a duplicate of this bug. ***
I'm changing the summary of this bugzilla. In reality, the miscalculations cause one of two symptoms: (1) The file system grows by too little, or (2) The file system can't grow at all when it should. The symptoms are more likely to occur when the file system has block size smaller than the default of 4K. But the error can occur even with 4K blocks if the file system is small enough.
*** Bug 491951 has been marked as a duplicate of this bug. ***
I have not seen this issue during GFS2 growfs testing. Verified against gfs2-utils-0.1.58-1.el5 and kernel-2.6.18-154.el5.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1337.html