Bug 660661

Summary: fsck.gfs2 reported statfs error after gfs2_grow
Product: Red Hat Enterprise Linux 5 Reporter: Dominic Geevarghese <dgeevarg>
Component: kernelAssignee: Robert Peterson <rpeterso>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 5.5CC: adas, anton, bmarzins, cww, dhoward, edamato, jpirko, jwest, liko, mmahudha, pyaduvan, ssaha, swhiteho, syeghiay
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Due to an off-by-one error, gfs2_grow failed to take the very last "rgrp" parameter into account when adding up the new free space. With this update, the GFS2 kernel properly counts all the new resource groups and fixes the "statfs" file correctly.
Story Points: ---
Clone Of:
: 661048 719762 (view as bug list) Environment:
Last Closed: 2011-07-21 09:57:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 661048, 666792, 719762    
Attachments:
Description Flags
Patch to fix the problem none

Description Dominic Geevarghese 2010-12-07 14:40:59 UTC
Description of problem:

Filesystem erorr on fsck.gfs2 right after gfs2_grow. It seems gfs2_grow failed silently .

Version-Release number of selected component (if applicable):

gfs2-utils-0.1.62-20.el5

How reproducible:

Always

Steps to Reproduce:

# fdisk -l /dev/sda

Disk /dev/sda: 8912 MB, 8912896000 bytes
64 heads, 32 sectors/track, 8500 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Device Boot Start End Blocks Id System
/dev/sda1 1 2862 2930672 83 Linux
/dev/sda2 2863 4770 1953792 83 Linux

# pvcreate /dev/sda1 ; vgcreate engvg /dev/sda1 ; lvcreate -l 100%FREE -n englv engvg
Physical volume "/dev/sda1" successfully created
Volume group "engvg" successfully created
Logical volume "englv" created

# vgchange -cy engvg ; lvmconf --enable-cluster
Volume group "engvg" successfully changed

# /etc/init.d/clvmd start
Starting clvmd: [ OK ]
Activating VGs: 2 logical volume(s) in volume group "VolGroup00" now active
1 logical volume(s) in volume group "engvg" now active  [ OK ]

# vgs
VG #PV #LV #SN Attr VSize VFree
VolGroup00 1 2 0 wz--n- 5.25G 0
engvg 1 1 0 wz--nc 2.79G 0

# mkfs.gfs2 -t domxen:mygfs2 -p lock_dlm -j 2 /dev/engvg/englv
This will destroy any data on /dev/engvg/englv.

Are you sure you want to proceed? [y/n] y

Device: /dev/engvg/englv
Blocksize: 4096
Device Size 2.79 GB (732160 blocks)
Filesystem Size: 2.79 GB (732157 blocks)
Journals: 2
Resource Groups: 12
Locking Protocol: "lock_dlm"
Lock Table: "domxen:mygfs2"
UUID: 5075EFB0-8D05-2E07-B737-EE1F1AA6919A

# fsck.gfs2 /dev/engvg/englv
Initializing fsck
Validating Resource Group index.
Level 1 RG check.
(level 1 passed)
Starting pass1
Pass1 complete
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c complete
Starting pass2
Pass2 complete
Starting pass3
Pass3 complete
Starting pass4
Pass4 complete
Starting pass5
Pass5 complete
gfs2_fsck complete
[root@dhcp209-170 ~]#

# mount /dev/engvg/englv /gfs2/ ; df -h /gfs2
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/engvg-englv
2.8G 259M 2.6G 10% /gfs2

# pvcreate /dev/sda2 ; vgextend engvg /dev/sda2 ; lvextend -L 4G /dev/engvg/englv
Physical volume "/dev/sda2" successfully created
Volume group "engvg" successfully extended
Extending logical volume englv to 4.00 GB
Logical volume englv successfully resized

# lvdisplay /dev/engvg/englv 
  --- Logical volume ---
  LV Name                /dev/engvg/englv
  VG Name                engvg
  LV UUID                Y0XJ55-iOGW-b8eJ-gbee-T2z8-jGwD-4guRAO
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                4.00 GB
  Current LE             1024
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:3

# gfs2_grow /gfs2
FS: Mount Point: /gfs2
FS: Device: /dev/mapper/engvg-englv
FS: Size: 732157 (0xb2bfd)
FS: RG size: 61011 (0xee53)
DEV: Size: 1048576 (0x100000)
The file system grew by 1236MB.
gfs2_grow complete.

# mount /dev/engvg/englv /gfs2/ ; df -h /gfs2 
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/engvg-englv
                      3.8G  259M  3.5G   7% /gfs2

# umount /gfs2 ; fsck.gfs2 /dev/engvg/englv
Initializing fsck
Validating Resource Group index.
Level 1 RG check.
(level 1 passed)
Starting pass1
Pass1 complete
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c complete
Starting pass2
Pass2 complete
Starting pass3
Pass3 complete
Starting pass4
Pass4 complete
Starting pass5
Pass5 complete
The statfs file is wrong:

Current statfs values:
blocks: 976076 (0xee4cc)
free: 909882 (0xde23a)
dinodes: 16 (0x10)

Calculated statfs values:
blocks: 1037080 (0xfd318)
free: 970886 (0xed086)
dinodes: 16 (0x10)
Okay to fix the master statfs file? (y/n)n
The statfs file was not fixed.
gfs2_fsck complete

  
Actual results:

fsck.gfs2 reported filesystem corruption even gfs2_grow returned success. 

Expected results:

filesystem check should return clean status. 

- dominic

Comment 1 Steve Whitehouse 2010-12-07 14:59:36 UTC
Looks like the new fs size is being correctly cached locally, but probably not written back to disk on umount like it ought to be.

Comment 2 Robert Peterson 2010-12-07 16:46:11 UTC
I recreated the problem on roth-01.  The statfs file is
definitely being updated by adjust_fs_space, from the dmesg:
GFS2: fsid=bobs_roth:roth_lv.0: File system extended by 244016 blocks.

Before:
[root@roth-01 ~]# gfs2_edit -p statfs /dev/roth_vg/roth_lv | tail -4
  sc_total              732060              0xb2b9c
  sc_free               665866              0xa290a
  sc_dinodes            16                  0x10
After:
[root@roth-01 ~]# gfs2_edit -p statfs /dev/roth_vg/roth_lv | tail -4
  sc_total              976076              0xee4cc
  sc_free               909882              0xde23a
  sc_dinodes            16                  0x10
The math is right: 665866 + 244016 = 909882

So it looks more like the file is getting written back, but
the new value is wrong.  But the new value is calculated by
function adjust_fs_space as:
fs_total = gfs2_ri_total(sdp);
...
new_free = fs_total - (m_sc->sc_total + l_sc->sc_total);
And that's the value printed in the "File system extended by"

Comment 3 Robert Peterson 2010-12-07 18:14:58 UTC
During the gfs2_grow, five new rgrps were added to the rindex.
Each of them was 61004 blocks.  So the correct value for the
amount of new free space is 5 * 61004 = 305020, which is what
fsck.gfs2 calculated.  On the other hand, 4 * 61004 = 244016,
which is what gfs2 kernel decided.  Therefore, this is an
off-by-one error.  The kernel code did not take one of the new
rgrps into account for some reason.

Comment 4 Robert Peterson 2010-12-07 18:26:42 UTC
Created attachment 466435 [details]
Patch to fix the problem

Solved.  Here is a patch to fix the problem.

Comment 5 Robert Peterson 2010-12-07 18:28:36 UTC
Requesting ack flags.

Comment 6 Robert Peterson 2010-12-07 18:32:54 UTC
Tested on roth-01 where I could recreate the problem.

[root@roth-01 ../gfs-kernel/src/gfs]# dmesg | tail -3
FS2: fsid=bobs_roth:roth_lv.0: File system extended by 305020 blocks.
dlm: roth_lv: leaving the lockspace group...
dlm: roth_lv: group event done 0 0
[root@roth-01 ../gfs-kernel/src/gfs]# fsck.gfs2 /dev/roth_vg/roth_lv
Initializing fsck
Validating Resource Group index.
Level 1 RG check.
(level 1 passed)
Starting pass1
Pass1 complete      
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c complete
Starting pass2
Pass2 complete      
Starting pass3
Pass3 complete      
Starting pass4
Pass4 complete      
Starting pass5
Pass5 complete      
gfs2_fsck complete

Comment 7 Robert Peterson 2010-12-07 20:45:31 UTC
Cloned as bug #661048 for crosswriting to RHEL6.x.

Comment 8 Robert Peterson 2010-12-07 21:10:51 UTC
The patch was accepted into the upstream -nmw git tree.
I posted the patch for inclusion into RHEL5.7.  Changing
status to POST.

Comment 10 Robert Peterson 2010-12-08 15:01:57 UTC
The experimental version temporarily may be downloaded here:

http://people.redhat.com/rpeterso/Experimental/RHEL5.x/gfs2/gfs2.660661.ko

This module has not gone through the Red Hat Quality Engineering
process, so as always, use at your own risk.

Comment 13 Robert Peterson 2010-12-22 17:13:00 UTC
Changing component to kernel as it should be.

Comment 17 Dominic Geevarghese 2011-01-10 16:07:52 UTC
Hi,

I have used hotfix kernel per bz 660661, comment # 16 . Unfortunately my test environment reported the same "statfs" error while testing the patch.

[root@dhcp210-53 ~]# uname -a 
Linux dhcp210-53.gsslab.pnq.redhat.com 2.6.18-238.1.1.el5 #1 SMP Tue Jan 4 13:32:19 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
[root@dhcp210-53 ~]# lsmod | grep gfs
gfs2                  524204  1 lock_dlm
configfs               62045  2 dlm

I failed to get the modinfo at first place by simple "modinfo gfs2"

[root@dhcp210-53 ~]# modinfo gfs2

[root@dhcp210-53 ~]# 
[root@dhcp210-53 ~]# modinfo /lib/modules/2.6.18-238.1.1.el5/kernel/fs/gfs2/gfs2.ko 
filename:       /lib/modules/2.6.18-238.1.1.el5/kernel/fs/gfs2/gfs2.ko
license:        GPL
author:         Red Hat, Inc.
description:    Global File System
srcversion:     39378B8C32BD3F6A7DDDBBA
depends:        
vermagic:       2.6.18-238.1.1.el5 SMP mod_unload gcc-4.1
module_sig:	883f3504d236b93286a51799a18fc80112986f09f5db76d36a58112b15063494c37c41de94e84ab3709f5c29552feab9d1aed9c0af898527bfb574de92b3
[root@dhcp210-53 ~]# 
[root@dhcp210-53 ~]# mkfs -t gfs2 -p lock_dlm -t domxen:gfs2 -j 2 /dev/domvg/domlv 
This will destroy any data on /dev/domvg/domlv.

Are you sure you want to proceed? [y/n] y

Device:                    /dev/domvg/domlv
Blocksize:                 4096
Device Size                1.87 GB (489472 blocks)
Filesystem Size:           1.87 GB (489471 blocks)
Journals:                  2
Resource Groups:           8
Locking Protocol:          "lock_dlm"
Lock Table:                "domxen:gfs2"
UUID:                      8CF32642-97A5-B434-57F7-15580F846192

[root@dhcp210-53 ~]# gfs2_edit -p statfs /dev/domvg/domlv | tail -4
  sc_total              489416              0x777c8
  sc_free               423222              0x67536
  sc_dinodes            16                  0x10
------------------------------------------------------

[root@dhcp210-53 ~]# pvcreate /dev/sda2 ; vgextend domvg /dev/sda2 ; lvextend -L 3G /dev/domvg/domlv 
  Physical volume "/dev/sda2" successfully created
  Volume group "domvg" successfully extended
  Extending logical volume domlv to 3.00 GB
  Logical volume domlv successfully resized
[root@dhcp210-53 ~]# gfs2_grow /gfs 
FS: Mount Point: /gfs
FS: Device:      /dev/mapper/domvg-domlv
FS: Size:        489471 (0x777ff)
FS: RG size:     61181 (0xeefd)
DEV: Size:       786432 (0xc0000)
The file system grew by 1160MB.
gfs2_grow complete.

[root@dhcp210-53 ~]# df -h
...
/dev/mapper/domvg-domlv   2.6G  259M  2.4G  10% /gfs
...

[root@dhcp210-53 ~]# umount /gfs ; fsck.gfs2 /dev/domvg/domlv 
Initializing fsck
Validating Resource Group index.
Level 1 RG check.
(level 1 passed)
Starting pass1
Pass1 complete      
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c complete
Starting pass2
Pass2 complete      
Starting pass3
Pass3 complete      
Starting pass4
Pass4 complete      
Starting pass5
Pass5 complete      
The statfs file is wrong:

Current statfs values:
blocks:  672944 (0xa44b0)
free:    606750 (0x9421e)
dinodes: 16 (0x10)

Calculated statfs values:
blocks:  734120 (0xb33a8)
free:    667926 (0xa3116)
dinodes: 16 (0x10)
Okay to fix the master statfs file? (y/n)n
The statfs file was not fixed.
gfs2_fsck complete 

[root@dhcp210-53 ~]# gfs2_edit -p statfs /dev/domvg/domlv | tail -4
  sc_total              672944              0xa44b0
  sc_free               606750              0x9421e
  sc_dinodes            16                  0x10
------------------------------------------------------

Thanks, Dominic

Comment 18 Robert Peterson 2011-01-10 16:10:37 UTC
Dominic had some problems testing the hotfix, but it turns
out that the gfs2 overlay module was still loaded.

Be sure before trying the patch that you (1) remove the
overlay from the disk (2) remove the overlay from memory,
and (3) make sure the new version is running in memory
with dmesg | grep GFS2

[root@dhcp210-53 ~]# dmesg | grep built | grep GFS2
GFS2 Overlay (built May 29 2008 16:48:00) installed
[root@dhcp210-53 ~]# rpm -e kmod-gfs2
[root@dhcp210-53 ~]# rmmod lock_dlm gfs2
[root@dhcp210-53 ~]#

Comment 19 Jarod Wilson 2011-01-26 21:08:28 UTC
in kernel-2.6.18-241.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 21 Nate Straz 2011-05-11 15:31:48 UTC
SCENARIO - [after_grow]
Check that fsck is clean after growfs
Creating 2G LV aftergrow on dash-01
Creating file system on /dev/fsck/aftergrow with options '-p lock_dlm -j 1 -t dash:aftergrow' on dash-01
Device:                    /dev/fsck/aftergrow
Blocksize:                 4096
Device Size                2.00 GB (524288 blocks)
Filesystem Size:           2.00 GB (524288 blocks)
Journals:                  1
Resource Groups:           8
Locking Protocol:          "lock_dlm"
Lock Table:                "dash:aftergrow"
UUID:                      8FAD9CAD-A522-3FCA-D80C-4166288756C4

Mounting gfs2 /dev/fsck/aftergrow on dash-01 with opts ''
Extending LV aftergrow by +2G on dash-01
Growing /dev/fsck/aftergrow on dash-01
FS: Mount Point: /mnt/fsck
FS: Device:      /dev/mapper/fsck-aftergrow
FS: Size:        524288 (0x80000)
FS: RG size:     65533 (0xfffd)
DEV: Size:       1048576 (0x100000)
The file system grew by 2048MB.
gfs2_grow complete.
Unmounting /mnt/fsck on dash-01
Starting fsck.gfs2 of /dev/fsck/aftergrow on dash-01
fsck.gfs2 output in /tmp/gfs_fsck_stress.24262/3.after_grow/1.fsck-dash-01.log
Removing LV aftergrow on dash-01

$ cat /tmp/gfs_fsck_stress.24262/3.after_grow/1.fsck-dash-01.log
Initializing fsck
Validating Resource Group index.
Level 1 RG check.
(level 1 passed)
Starting pass1
Pass1 complete
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c complete
Starting pass2
Pass2 complete
Starting pass3
Pass3 complete
Starting pass4
Pass4 complete
Starting pass5
Pass5 complete
gfs2_fsck complete

Comment 22 Martin Prpič 2011-07-13 20:28:24 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Due to an off-by-one error, gfs2_grow failed to take the very last "rgrp" parameter into account when adding up the new free space. With this update, the GFS2 kernel properly counts all the new resource groups and fixes the "statfs" file correctly.

Comment 23 errata-xmlrpc 2011-07-21 09:57:09 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1065.html