+++ This bug was initially created as a clone of Bug #469773 +++ Since the symptom is nearly the same, I decided to use the original bz record, bug #469773 for the user space fix. This new bug record may be used for the gfs2 kernel changes needed to fix the problem. See the bottom of this description text for a breakdown on the two problems and how we plan to fix them. Original text follows: Description of problem: Our test suite for growfs on GFS doesn't work on GFS2 after updating the commands because the file system size doesn't update until after the gfs2_grow command exits. Version-Release number of selected component (if applicable): gfs2-utils-0.1.49-1.el5 How reproducible: 100% Steps to Reproduce: 1. df /mnt/gfs2 2. lvextend 3. gfs2_grow /mnt/gfs2; df /mnt/gfs2 4. Compare output from 1 and 3 Actual results: lvextend -l +50%FREE growfs/gfs2 on west-02 growing gfs2 on west-01 verifying grow size of gfs /mnt/gfs2 did not increase, was: 79008, is now: 79008 after 1 seconds Expected results: The new size should be available immediately after gfs2_grow exits. Additional info: --- Additional comment from nstraz on 2008-11-04 16:51:03 EDT --- Moving this out to RHEL 5.4. This could cause problems with management tools which expect the grow to work right away, but it's too late in the 5.3 cycle to get this in. --- Additional comment from swhiteho on 2008-12-03 05:18:30 EDT --- We also need to look into what happens when we add new journals to a live filesystem. Currently they seem to be ignored, so that if a node were to mount the newly created journal and then fail, its journal might not be recoverable by one of the previously existing nodes. This is a result of changing the jindex from a special file to a directory I think, as we no longer keep the shared lock on it all the time, like we used to. I spotted this recently when looking at the recovery code. --- Additional comment from rpeterso on 2009-01-21 08:52:02 EDT --- While fixing this bug and testing the fix, I found another related nasty bug in gfs2_grow. It relates to alternate block sizes. Here is the symptom: [root@roth-01 ../src/redhat/RPMS/x86_64]# lvcreate --name roth_lv -L 5G /dev/roth_vg Logical volume "roth_lv" created [root@roth-01 ../src/redhat/RPMS/x86_64]# mkfs.gfs2 -O -b1024 -t bobs_roth:test_gfs -p lock_dlm -j 1 /dev/roth_vg/roth_lv Device: /dev/roth_vg/roth_lv Blocksize: 1024 Device Size 5.00 GB (5242880 blocks) Filesystem Size: 5.00 GB (5242878 blocks) Journals: 1 Resource Groups: 20 Locking Protocol: "lock_dlm" Lock Table: "bobs_roth:test_gfs" [root@roth-01 ../src/redhat/RPMS/x86_64]# mount -tgfs2 /dev/roth_vg/roth_lv /mnt/gfs2 [root@roth-01 ../src/redhat/RPMS/x86_64]# /usr/sbin/lvresize -L +1T /dev/roth_vg/roth_lv Extending logical volume roth_lv to 1.00 TB Logical volume roth_lv successfully resized [root@roth-01 ../src/redhat/RPMS/x86_64]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 71G 58G 9.1G 87% / /dev/sda1 99M 94M 220K 100% /boot tmpfs 279M 0 279M 0% /dev/shm /dev/mapper/roth_vg-roth_lv 5.0G 131M 4.9G 3% /mnt/gfs2 [root@roth-01 ../src/redhat/RPMS/x86_64]# gfs2_grow /mnt/gfs2 ; df -h FS: Mount Point: /mnt/gfs2 FS: Device: /dev/mapper/roth_vg-roth_lv FS: Size: 5242878 (0x4ffffe) FS: RG size: 262140 (0x3fffc) DEV: Size: 269746176 (0x10140000) The file system grew by 258304MB. gfs2_grow complete. Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 71G 58G 9.1G 87% / /dev/sda1 99M 94M 220K 100% /boot tmpfs 279M 0 279M 0% /dev/shm /dev/mapper/roth_vg-roth_lv 257G 131M 257G 1% /mnt/gfs2 [root@roth-01 ../src/redhat/RPMS/x86_64]# So I extended the partition size by 1TB, but gfs2_grow only allocated enough resource groups for one fourth of that, or 256G. I debugged that problem and will post a patch shortly. We will definitely want to z-stream this one for 5.3.z. --- Additional comment from rpeterso on 2009-01-21 08:54:54 EDT --- Created an attachment (id=329604) Patch to fix the problem This patch was tested on system roth-01. --- Additional comment from rpeterso on 2009-01-21 08:57:32 EDT --- The same commands/output from comment #3, but with the patch applied: [root@roth-01 ../bob/cluster/gfs2/mkfs]# lvcreate --name roth_lv -L 5G /dev/roth_vg Logical volume "roth_lv" created [root@roth-01 ../bob/cluster/gfs2/mkfs]# mkfs.gfs2 -O -b1024 -t bobs_roth:test_gfs -p lock_dlm -j 1 /dev/roth_vg/roth_lv Device: /dev/roth_vg/roth_lv Blocksize: 1024 Device Size 5.00 GB (5242880 blocks) Filesystem Size: 5.00 GB (5242878 blocks) Journals: 1 Resource Groups: 20 Locking Protocol: "lock_dlm" Lock Table: "bobs_roth:test_gfs" [root@roth-01 ../bob/cluster/gfs2/mkfs]# mount -tgfs2 /dev/roth_vg/roth_lv /mnt/gfs2 [root@roth-01 ../bob/cluster/gfs2/mkfs]# /usr/sbin/lvresize -L +1T /dev/roth_vg/roth_lv Extending logical volume roth_lv to 1.00 TB Logical volume roth_lv successfully resized [root@roth-01 ../bob/cluster/gfs2/mkfs]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 71G 58G 9.1G 87% / /dev/sda1 99M 94M 220K 100% /boot tmpfs 279M 0 279M 0% /dev/shm /dev/mapper/roth_vg-roth_lv 5.0G 131M 4.9G 3% /mnt/gfs2 [root@roth-01 ../bob/cluster/gfs2/mkfs]# ./gfs2_grow /mnt/gfs2 ; df -h FS: Mount Point: /mnt/gfs2 FS: Device: /dev/mapper/roth_vg-roth_lv FS: Size: 5242878 (0x4ffffe) FS: RG size: 262140 (0x3fffc) DEV: Size: 1078984704 (0x40500000) The file system grew by 1048576MB. gfs2_grow complete. Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 71G 58G 9.1G 87% / /dev/sda1 99M 94M 220K 100% /boot tmpfs 279M 0 279M 0% /dev/shm /dev/mapper/roth_vg-roth_lv 1.1T 131M 1.1T 1% /mnt/gfs2 [root@roth-01 ../bob/cluster/gfs2/mkfs]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 71G 58G 9.1G 87% / /dev/sda1 99M 94M 220K 100% /boot tmpfs 279M 0 279M 0% /dev/shm /dev/mapper/roth_vg-roth_lv 1.1T 131M 1.1T 1% /mnt/gfs2 [root@roth-01 ../bob/cluster/gfs2/mkfs]# --- Additional comment from nstraz on 2009-01-21 09:50:06 EDT --- I re-ran our growfs test script and was able to reproduce this with gfs2-utils-0.1.53-1.el5. The test script does multiple file system grows in a row. In this case the second grow did not immediately return the new size. Starting io load to filesystems adding /dev/sdb10 to VG growfs on dash-03 lvextend -l +50%FREE growfs/gfs1 on dash-02 growing gfs1 on dash-03 verifying grow lvextend -l +50%FREE growfs/gfs2 on dash-03 growing gfs2 on dash-01 verifying grow size of gfs /mnt/gfs2 did not increase, was: 265702, is now: 265702 To Reproduce: 1. /usr/tests/sts-rhel5.3/gfs/bin/growfs -2 -i 1 --- Additional comment from rpeterso on 2009-01-21 11:11:37 EDT --- Created an attachment (id=329621) Patch for the block size problem There are two problems to be fixed: (1) The non-default block size problem, and (2) The fact that OTHER NODES do not see changes made by gfs2_grow until some time after gfs2_grow ends, due to fast_statfs. The previously posted patch does not fix problem 2. This patch fixes problem 1 only. After some discussion on irc, we decided that problem 2 should be fixed rather than documented around, and that the solution may very well involve the gfs2 kernel module. So it's likely we'll need a kernel bug record as well. Steve's suggestion was to make statfs check the rindex to see if it has changed. "In the unlikely event that it has changed, we go back to slow statfs [code path] for just the one call." Nate also discovered that gfs's fast_statfs feature has the same problem but it's apparently worse: it never re-syncs on the other nodes. If fast_statfs is not used for gfs, the file system size is cluster coherent (i.e. the bug does not recreate on gfs1 unless fast_statfs is used). I think we've know this is broken for a very long time. I'm not sure it's easy to fix for gfs, and I'm not sure it's worth it. But we do need to fix gfs2. When we come up with a solution for problem 2, I'll likely use this bugzilla to fix that, and open another for problem 1.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Created attachment 367528 [details] Fix to reinitialize the resource group index after growing the filesystem This problem actually exists on both single node and cluster setups. The first problem, which caused it to fail on cluster setups, is that the rindex list was supposed to get invalidated when nodes dropped their rindex glock, but the code to do that was in meta_go_inval() instead of inode_go_inval(). I can't see any reason why that code was in meta_go_inval(). It never got called during my testing, and I can't see any way that it could get called, but I dislike removing code that I don't understand (and like I said, I have no idea why that code was there). So if there's a reason for that meta_go_inval() code, someone please let me know, and I'll add it back. The second problem is that one single node setups, the node never needs to drop the rindex glock. There are multiple ways to solve this. I could have added code that manually updated the rindex list when you grew the filesystem. Instead, I just forced the node to actually drop its rindex glock, which invalidates the rindex list. The next time the node needs to allocate memory, it will pick the glock back up and reinitialize the list. This is not the fastest way to do things, but it does mean that all nodes in a cluster do the same thing to invalidate and reinitialize their rindex list, and since growing a filesystem is a pretty rare event, the additional overhead seems acceptable.
Posted
in kernel-2.6.18-174.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
NOTE: From the customer: The hotfix given to me by Jeremy West and Linda did not fix the customer issue. Becasue ti was a grow issue, I did have them install the hotfix kernel on both nodes. They attempted to grow a gfs2 volume, and were still unable to use the new space immediately. New sosreports and stack traces for the grow have been attached to this ticket. Will be attaching the following: sosreport-mageshkumar.gajapathy.804042671-17973-811dea.tar.bz2 sosreport-mageshkumar.gajapathy.804042671-8183-a23186.tar.bz2 gfs2_grow.strace1
Created attachment 388382 [details] gfs2_grow strace with hotfix installed
Event posted on 02-02-2010 04:14pm EST by dejohnso Verified from sosreport that hotfix is installed.. [dejohnso@dhcp242-193 mageshkumar.gajapathy.804042671-17973]$ cat uname Linux sbici 2.6.18-174.el5 #1 SMP Mon Nov 16 22:54:31 EST 2009 x86_64 x86_64 x86_64 GNU/Linux [dejohnso@dhcp242-193 mageshkumar.gajapathy.804042671-17973]$ This event sent from IssueTracker by dejohnso issue 336608
NOTE: Verified that hotfix has the code by extracting the src rpm and checking it. I went over linux-2.6-gfs2-drop-rindex-glock-on-grows.patch line by line and it is all there. So why are they not seeing the grow?
Are they reproducing this the same way as before? Do they still have to wait for the filesystem to be remounted to see the space, or does it appear if they wait a little bit.
Would it be possible to get a copy of all the commands that they run, and the output of all of them, including running lvdisplay and vgdisplay both at the start and the end of the testing?
Created attachment 388868 [details] vgdisplay of the customer's system
Thanks, but I would really like this in the context of running all of the commands. I'd also like to see what they used when they created the filesystem. Also, looking at the vgdisplay command, it looks like they don't have clvmd running. However, they do have two nodes, right? Or are they testing with just a single node now? If they are running in a cluster with two nodes accessing the storage, they need to have clvmd running, or things can go very wrong. I'm not saying that this is the cause of their issue, but live-growing a shared volume in a cluster without clvmd running is a bad idea.
If the customer isn't running IO on both nodes (assuming that they are actually using both nodes), can they try doing some IO on the node that they didn't grow the filesystem on, after the grow completes, and see if that makes them able to see the new space? This shouldn't be necessary to see the new space, but if this clears up the problem, that narrows down where the it could be. Also, are they mounting the filesystem with any mount options?
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~ RHEL 5.5 Beta has been released! There should be a fix present in this release that addresses your request. Please test and report back results here, by March 3rd 2010 (2010-03-03) or sooner. Upon successful verification of this request, post your results and update the Verified field in Bugzilla with the appropriate value. If you encounter any issues while testing, please describe them and set this bug into NEED_INFO. If you encounter new defects or have additional patch(es) to request for inclusion, please clone this bug per each request and escalate through your support representative.
If this problem is still reproduceable, I need the information from the debug kernel to have a chance at solving it, since I am unable to reproduce it myself.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html
From the information in the last two comments, this doesn't look like the original bug. I trust that the output from Comment #57 is from the command that caused the error in Comment #56, meaning the filesystem didn't grow the full size that it was supposed to. After this happened, did the customer unmount and remount the filesystem? If so, did it fix the problem? If unmounting and remounting didn't fix the problem, then this is a completely different bug than was originally reported. This actually sounds a lot like bz #469773, which was a problem in the gfs2 utils, that caused filesystems to grow less than they should. It was fixed in gfs2-utils-0.1.58-1.el5. According to the sosreports from the time of the original bug the customer was using gfs2-utils-0.1.53-1.el5_3.3-x86_64. Can you check if they are currently using an updated gfs2-utils package? If they are not, could they try using gfs2-utils-0.1.58-1.el5 or newer, and seeing if that solves their problem? If they saw this while using gfs2-utils-0.1.58-1.el5 or a newer version, and the problem did not fix itself when they unmounted and remounted the filesystem, can you please either open a new bug or reopen #469773. If remounting the filesystem did fix the problem, then we can probably keep the discussion under this bugzilla for now. In that case, I'd really like them to run my debug kernel, so I can see what happened to the resource group index.
The only entry I saw was: Apr 20 15:54:46 sbidb kernel: GFS2: fsid=pbi_prd:ora_pbi_saporg.0: File system extended by 256160 blocks. This can be found in the file messages.debugkernel.
Created attachment 409306 [details] messages from gfs2_grow with the debug kernel