Hide Forgot
+++ This bug was initially created as a clone of Bug #660661 +++ Cloned so we can crosswrite the patch to RHEL6. Description of problem: Filesystem erorr on fsck.gfs2 right after gfs2_grow. It seems gfs2_grow failed silently . Version-Release number of selected component (if applicable): gfs2-utils-0.1.62-20.el5 How reproducible: Always Steps to Reproduce: # fdisk -l /dev/sda Disk /dev/sda: 8912 MB, 8912896000 bytes 64 heads, 32 sectors/track, 8500 cylinders Units = cylinders of 2048 * 512 = 1048576 bytes Device Boot Start End Blocks Id System /dev/sda1 1 2862 2930672 83 Linux /dev/sda2 2863 4770 1953792 83 Linux # pvcreate /dev/sda1 ; vgcreate engvg /dev/sda1 ; lvcreate -l 100%FREE -n englv engvg Physical volume "/dev/sda1" successfully created Volume group "engvg" successfully created Logical volume "englv" created # vgchange -cy engvg ; lvmconf --enable-cluster Volume group "engvg" successfully changed # /etc/init.d/clvmd start Starting clvmd: [ OK ] Activating VGs: 2 logical volume(s) in volume group "VolGroup00" now active 1 logical volume(s) in volume group "engvg" now active [ OK ] # vgs VG #PV #LV #SN Attr VSize VFree VolGroup00 1 2 0 wz--n- 5.25G 0 engvg 1 1 0 wz--nc 2.79G 0 # mkfs.gfs2 -t domxen:mygfs2 -p lock_dlm -j 2 /dev/engvg/englv This will destroy any data on /dev/engvg/englv. Are you sure you want to proceed? [y/n] y Device: /dev/engvg/englv Blocksize: 4096 Device Size 2.79 GB (732160 blocks) Filesystem Size: 2.79 GB (732157 blocks) Journals: 2 Resource Groups: 12 Locking Protocol: "lock_dlm" Lock Table: "domxen:mygfs2" UUID: 5075EFB0-8D05-2E07-B737-EE1F1AA6919A # fsck.gfs2 /dev/engvg/englv Initializing fsck Validating Resource Group index. Level 1 RG check. (level 1 passed) Starting pass1 Pass1 complete Starting pass1b Pass1b complete Starting pass1c Pass1c complete Starting pass2 Pass2 complete Starting pass3 Pass3 complete Starting pass4 Pass4 complete Starting pass5 Pass5 complete gfs2_fsck complete [root@dhcp209-170 ~]# # mount /dev/engvg/englv /gfs2/ ; df -h /gfs2 Filesystem Size Used Avail Use% Mounted on /dev/mapper/engvg-englv 2.8G 259M 2.6G 10% /gfs2 # pvcreate /dev/sda2 ; vgextend engvg /dev/sda2 ; lvextend -L 4G /dev/engvg/englv Physical volume "/dev/sda2" successfully created Volume group "engvg" successfully extended Extending logical volume englv to 4.00 GB Logical volume englv successfully resized # lvdisplay /dev/engvg/englv --- Logical volume --- LV Name /dev/engvg/englv VG Name engvg LV UUID Y0XJ55-iOGW-b8eJ-gbee-T2z8-jGwD-4guRAO LV Write Access read/write LV Status available # open 1 LV Size 4.00 GB Current LE 1024 Segments 2 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:3 # gfs2_grow /gfs2 FS: Mount Point: /gfs2 FS: Device: /dev/mapper/engvg-englv FS: Size: 732157 (0xb2bfd) FS: RG size: 61011 (0xee53) DEV: Size: 1048576 (0x100000) The file system grew by 1236MB. gfs2_grow complete. # mount /dev/engvg/englv /gfs2/ ; df -h /gfs2 Filesystem Size Used Avail Use% Mounted on /dev/mapper/engvg-englv 3.8G 259M 3.5G 7% /gfs2 # umount /gfs2 ; fsck.gfs2 /dev/engvg/englv Initializing fsck Validating Resource Group index. Level 1 RG check. (level 1 passed) Starting pass1 Pass1 complete Starting pass1b Pass1b complete Starting pass1c Pass1c complete Starting pass2 Pass2 complete Starting pass3 Pass3 complete Starting pass4 Pass4 complete Starting pass5 Pass5 complete The statfs file is wrong: Current statfs values: blocks: 976076 (0xee4cc) free: 909882 (0xde23a) dinodes: 16 (0x10) Calculated statfs values: blocks: 1037080 (0xfd318) free: 970886 (0xed086) dinodes: 16 (0x10) Okay to fix the master statfs file? (y/n)n The statfs file was not fixed. gfs2_fsck complete Actual results: fsck.gfs2 reported filesystem corruption even gfs2_grow returned success. Expected results: filesystem check should return clean status. - dominic --- Additional comment from swhiteho@redhat.com on 2010-12-07 09:59:36 EST --- Looks like the new fs size is being correctly cached locally, but probably not written back to disk on umount like it ought to be. --- Additional comment from rpeterso@redhat.com on 2010-12-07 11:46:11 EST --- I recreated the problem on roth-01. The statfs file is definitely being updated by adjust_fs_space, from the dmesg: GFS2: fsid=bobs_roth:roth_lv.0: File system extended by 244016 blocks. Before: [root@roth-01 ~]# gfs2_edit -p statfs /dev/roth_vg/roth_lv | tail -4 sc_total 732060 0xb2b9c sc_free 665866 0xa290a sc_dinodes 16 0x10 After: [root@roth-01 ~]# gfs2_edit -p statfs /dev/roth_vg/roth_lv | tail -4 sc_total 976076 0xee4cc sc_free 909882 0xde23a sc_dinodes 16 0x10 The math is right: 665866 + 244016 = 909882 So it looks more like the file is getting written back, but the new value is wrong. But the new value is calculated by function adjust_fs_space as: fs_total = gfs2_ri_total(sdp); ... new_free = fs_total - (m_sc->sc_total + l_sc->sc_total); And that's the value printed in the "File system extended by" --- Additional comment from rpeterso@redhat.com on 2010-12-07 13:14:58 EST --- During the gfs2_grow, five new rgrps were added to the rindex. Each of them was 61004 blocks. So the correct value for the amount of new free space is 5 * 61004 = 305020, which is what fsck.gfs2 calculated. On the other hand, 4 * 61004 = 244016, which is what gfs2 kernel decided. Therefore, this is an off-by-one error. The kernel code did not take one of the new rgrps into account for some reason. --- Additional comment from rpeterso@redhat.com on 2010-12-07 13:26:42 EST --- Created attachment 466435 [details] Patch to fix the problem Solved. Here is a patch to fix the problem. --- Additional comment from rpeterso@redhat.com on 2010-12-07 13:28:36 EST --- Requesting ack flags. --- Additional comment from rpeterso@redhat.com on 2010-12-07 13:32:54 EST --- Tested on roth-01 where I could recreate the problem. [root@roth-01 ../gfs-kernel/src/gfs]# dmesg | tail -3 FS2: fsid=bobs_roth:roth_lv.0: File system extended by 305020 blocks. dlm: roth_lv: leaving the lockspace group... dlm: roth_lv: group event done 0 0 [root@roth-01 ../gfs-kernel/src/gfs]# fsck.gfs2 /dev/roth_vg/roth_lv Initializing fsck Validating Resource Group index. Level 1 RG check. (level 1 passed) Starting pass1 Pass1 complete Starting pass1b Pass1b complete Starting pass1c Pass1c complete Starting pass2 Pass2 complete Starting pass3 Pass3 complete Starting pass4 Pass4 complete Starting pass5 Pass5 complete gfs2_fsck complete
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Created attachment 467279 [details] Patch to fix the problem An upstream patch was sent. Here is the RHEL6 patch.
I POSTed this patch to rhkernel-list for inclusion into RHEL6.1. Changing status to POST.
Patch(es) available on kernel-2.6.32-92.el6
Verified test case with RHEL 6.0. Verified fix with kernel-2.6.32-131.0.9.el6.x86_64. SCENARIO - [after_grow] Check that fsck is clean after growfs Creating 2G LV aftergrow on dash-01 Creating file system on /dev/fsck/aftergrow with options '-p lock_dlm -j 1 -t dash:aftergrow' on dash-01 Device: /dev/fsck/aftergrow Blocksize: 4096 Device Size 2.00 GB (524288 blocks) Filesystem Size: 2.00 GB (524288 blocks) Journals: 1 Resource Groups: 8 Locking Protocol: "lock_dlm" Lock Table: "dash:aftergrow" UUID: 4A6C2AC1-4C0F-5154-700D-BB1CC349516D Mounting gfs2 /dev/fsck/aftergrow on dash-01 with opts '' Extending LV aftergrow by +2G on dash-01 Growing /dev/fsck/aftergrow on dash-01 FS: Mount Point: /mnt/fsck FS: Device: /dev/dm-3 FS: Size: 524288 (0x80000) FS: RG size: 65533 (0xfffd) DEV: Size: 1048576 (0x100000) The file system grew by 2048MB. gfs2_grow complete. Unmounting /mnt/fsck on dash-01 Starting fsck of /dev/fsck/aftergrow on dash-01 fsck output in /tmp/gfs_fsck_stress.9688/1.after_grow/1.fsck-dash-01.log Removing LV aftergrow on dash-01 2 disk(s) to be used: dash-01=/dev/sdb /dev/sdc dash-02=/dev/sdb /dev/sdc dash-03=/dev/sdb /dev/sdc removing VG fsck on dash-02 removing PV /dev/sdb1 on dash-02 removing PV /dev/sdc1 on dash-02 [nstraz@try sts-root]$ cat /tmp/gfs_fsck_stress.9688/1.after_grow/1.fsck-dash-01.log Initializing fsck Validating Resource Group index. Level 1 rgrp check: Checking if all rgrp and rindex values are good. (level 1 passed) Starting pass1 Pass1 complete Starting pass1b Pass1b complete Starting pass1c Pass1c complete Starting pass2 Pass2 complete Starting pass3 Pass3 complete Starting pass4 Pass4 complete Starting pass5 Pass5 complete gfs2_fsck complete
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html