Bug 661048

Summary: fsck.gfs2 reported statfs error after gfs2_grow
Product: Red Hat Enterprise Linux 6 Reporter: Robert Peterson <rpeterso>
Component: kernelAssignee: Robert Peterson <rpeterso>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: low    
Version: 6.0CC: adas, bmarzins, dgeevarg, edamato, swhiteho
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.32-92.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 660661 Environment:
Last Closed: 2011-05-23 20:30:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 660661    
Bug Blocks:    
Attachments:
Description Flags
Patch to fix the problem none

Description Robert Peterson 2010-12-07 18:48:39 UTC
+++ This bug was initially created as a clone of Bug #660661 +++
Cloned so we can crosswrite the patch to RHEL6.

Description of problem:

Filesystem erorr on fsck.gfs2 right after gfs2_grow. It seems gfs2_grow failed silently .

Version-Release number of selected component (if applicable):

gfs2-utils-0.1.62-20.el5

How reproducible:

Always

Steps to Reproduce:

# fdisk -l /dev/sda

Disk /dev/sda: 8912 MB, 8912896000 bytes
64 heads, 32 sectors/track, 8500 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Device Boot Start End Blocks Id System
/dev/sda1 1 2862 2930672 83 Linux
/dev/sda2 2863 4770 1953792 83 Linux

# pvcreate /dev/sda1 ; vgcreate engvg /dev/sda1 ; lvcreate -l 100%FREE -n englv engvg
Physical volume "/dev/sda1" successfully created
Volume group "engvg" successfully created
Logical volume "englv" created

# vgchange -cy engvg ; lvmconf --enable-cluster
Volume group "engvg" successfully changed

# /etc/init.d/clvmd start
Starting clvmd: [ OK ]
Activating VGs: 2 logical volume(s) in volume group "VolGroup00" now active
1 logical volume(s) in volume group "engvg" now active  [ OK ]

# vgs
VG #PV #LV #SN Attr VSize VFree
VolGroup00 1 2 0 wz--n- 5.25G 0
engvg 1 1 0 wz--nc 2.79G 0

# mkfs.gfs2 -t domxen:mygfs2 -p lock_dlm -j 2 /dev/engvg/englv
This will destroy any data on /dev/engvg/englv.

Are you sure you want to proceed? [y/n] y

Device: /dev/engvg/englv
Blocksize: 4096
Device Size 2.79 GB (732160 blocks)
Filesystem Size: 2.79 GB (732157 blocks)
Journals: 2
Resource Groups: 12
Locking Protocol: "lock_dlm"
Lock Table: "domxen:mygfs2"
UUID: 5075EFB0-8D05-2E07-B737-EE1F1AA6919A

# fsck.gfs2 /dev/engvg/englv
Initializing fsck
Validating Resource Group index.
Level 1 RG check.
(level 1 passed)
Starting pass1
Pass1 complete
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c complete
Starting pass2
Pass2 complete
Starting pass3
Pass3 complete
Starting pass4
Pass4 complete
Starting pass5
Pass5 complete
gfs2_fsck complete
[root@dhcp209-170 ~]#

# mount /dev/engvg/englv /gfs2/ ; df -h /gfs2
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/engvg-englv
2.8G 259M 2.6G 10% /gfs2

# pvcreate /dev/sda2 ; vgextend engvg /dev/sda2 ; lvextend -L 4G /dev/engvg/englv
Physical volume "/dev/sda2" successfully created
Volume group "engvg" successfully extended
Extending logical volume englv to 4.00 GB
Logical volume englv successfully resized

# lvdisplay /dev/engvg/englv 
  --- Logical volume ---
  LV Name                /dev/engvg/englv
  VG Name                engvg
  LV UUID                Y0XJ55-iOGW-b8eJ-gbee-T2z8-jGwD-4guRAO
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                4.00 GB
  Current LE             1024
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:3

# gfs2_grow /gfs2
FS: Mount Point: /gfs2
FS: Device: /dev/mapper/engvg-englv
FS: Size: 732157 (0xb2bfd)
FS: RG size: 61011 (0xee53)
DEV: Size: 1048576 (0x100000)
The file system grew by 1236MB.
gfs2_grow complete.

# mount /dev/engvg/englv /gfs2/ ; df -h /gfs2 
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/engvg-englv
                      3.8G  259M  3.5G   7% /gfs2

# umount /gfs2 ; fsck.gfs2 /dev/engvg/englv
Initializing fsck
Validating Resource Group index.
Level 1 RG check.
(level 1 passed)
Starting pass1
Pass1 complete
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c complete
Starting pass2
Pass2 complete
Starting pass3
Pass3 complete
Starting pass4
Pass4 complete
Starting pass5
Pass5 complete
The statfs file is wrong:

Current statfs values:
blocks: 976076 (0xee4cc)
free: 909882 (0xde23a)
dinodes: 16 (0x10)

Calculated statfs values:
blocks: 1037080 (0xfd318)
free: 970886 (0xed086)
dinodes: 16 (0x10)
Okay to fix the master statfs file? (y/n)n
The statfs file was not fixed.
gfs2_fsck complete

  
Actual results:

fsck.gfs2 reported filesystem corruption even gfs2_grow returned success. 

Expected results:

filesystem check should return clean status. 

- dominic

--- Additional comment from swhiteho on 2010-12-07 09:59:36 EST ---

Looks like the new fs size is being correctly cached locally, but probably not written back to disk on umount like it ought to be.

--- Additional comment from rpeterso on 2010-12-07 11:46:11 EST ---

I recreated the problem on roth-01.  The statfs file is
definitely being updated by adjust_fs_space, from the dmesg:
GFS2: fsid=bobs_roth:roth_lv.0: File system extended by 244016 blocks.

Before:
[root@roth-01 ~]# gfs2_edit -p statfs /dev/roth_vg/roth_lv | tail -4
  sc_total              732060              0xb2b9c
  sc_free               665866              0xa290a
  sc_dinodes            16                  0x10
After:
[root@roth-01 ~]# gfs2_edit -p statfs /dev/roth_vg/roth_lv | tail -4
  sc_total              976076              0xee4cc
  sc_free               909882              0xde23a
  sc_dinodes            16                  0x10
The math is right: 665866 + 244016 = 909882

So it looks more like the file is getting written back, but
the new value is wrong.  But the new value is calculated by
function adjust_fs_space as:
fs_total = gfs2_ri_total(sdp);
...
new_free = fs_total - (m_sc->sc_total + l_sc->sc_total);
And that's the value printed in the "File system extended by"

--- Additional comment from rpeterso on 2010-12-07 13:14:58 EST ---

During the gfs2_grow, five new rgrps were added to the rindex.
Each of them was 61004 blocks.  So the correct value for the
amount of new free space is 5 * 61004 = 305020, which is what
fsck.gfs2 calculated.  On the other hand, 4 * 61004 = 244016,
which is what gfs2 kernel decided.  Therefore, this is an
off-by-one error.  The kernel code did not take one of the new
rgrps into account for some reason.

--- Additional comment from rpeterso on 2010-12-07 13:26:42 EST ---

Created attachment 466435 [details]
Patch to fix the problem

Solved.  Here is a patch to fix the problem.

--- Additional comment from rpeterso on 2010-12-07 13:28:36 EST ---

Requesting ack flags.

--- Additional comment from rpeterso on 2010-12-07 13:32:54 EST ---

Tested on roth-01 where I could recreate the problem.

[root@roth-01 ../gfs-kernel/src/gfs]# dmesg | tail -3
FS2: fsid=bobs_roth:roth_lv.0: File system extended by 305020 blocks.
dlm: roth_lv: leaving the lockspace group...
dlm: roth_lv: group event done 0 0
[root@roth-01 ../gfs-kernel/src/gfs]# fsck.gfs2 /dev/roth_vg/roth_lv
Initializing fsck
Validating Resource Group index.
Level 1 RG check.
(level 1 passed)
Starting pass1
Pass1 complete      
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c complete
Starting pass2
Pass2 complete      
Starting pass3
Pass3 complete      
Starting pass4
Pass4 complete      
Starting pass5
Pass5 complete      
gfs2_fsck complete

Comment 1 RHEL Program Management 2010-12-07 19:00:06 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 2 Robert Peterson 2010-12-07 19:05:02 UTC
Created attachment 467279 [details]
Patch to fix the problem

An upstream patch was sent.  Here is the RHEL6 patch.

Comment 3 Robert Peterson 2010-12-07 21:01:35 UTC
I POSTed this patch to rhkernel-list for inclusion into RHEL6.1.
Changing status to POST.

Comment 4 Aristeu Rozanski 2010-12-16 20:54:12 UTC
Patch(es) available on kernel-2.6.32-92.el6

Comment 7 Nate Straz 2011-04-27 19:07:29 UTC
Verified test case with RHEL 6.0.

Verified fix with kernel-2.6.32-131.0.9.el6.x86_64.

SCENARIO - [after_grow]
Check that fsck is clean after growfs
Creating 2G LV aftergrow on dash-01
Creating file system on /dev/fsck/aftergrow with options '-p lock_dlm -j 1 -t dash:aftergrow' on dash-01
Device:                    /dev/fsck/aftergrow
Blocksize:                 4096
Device Size                2.00 GB (524288 blocks)
Filesystem Size:           2.00 GB (524288 blocks)
Journals:                  1
Resource Groups:           8
Locking Protocol:          "lock_dlm"
Lock Table:                "dash:aftergrow"
UUID:                      4A6C2AC1-4C0F-5154-700D-BB1CC349516D

Mounting gfs2 /dev/fsck/aftergrow on dash-01 with opts ''
Extending LV aftergrow by +2G on dash-01
Growing /dev/fsck/aftergrow on dash-01
FS: Mount Point: /mnt/fsck
FS: Device:      /dev/dm-3
FS: Size:        524288 (0x80000)
FS: RG size:     65533 (0xfffd)
DEV: Size:       1048576 (0x100000)
The file system grew by 2048MB.
gfs2_grow complete.
Unmounting /mnt/fsck on dash-01
Starting fsck of /dev/fsck/aftergrow on dash-01
fsck output in /tmp/gfs_fsck_stress.9688/1.after_grow/1.fsck-dash-01.log
Removing LV aftergrow on dash-01


2 disk(s) to be used:
        dash-01=/dev/sdb /dev/sdc
        dash-02=/dev/sdb /dev/sdc
        dash-03=/dev/sdb /dev/sdc
removing VG fsck on dash-02
removing PV /dev/sdb1 on dash-02
removing PV /dev/sdc1 on dash-02
[nstraz@try sts-root]$ cat /tmp/gfs_fsck_stress.9688/1.after_grow/1.fsck-dash-01.log
Initializing fsck
Validating Resource Group index.
Level 1 rgrp check: Checking if all rgrp and rindex values are good.
(level 1 passed)
Starting pass1
Pass1 complete
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c complete
Starting pass2
Pass2 complete
Starting pass3
Pass3 complete
Starting pass4
Pass4 complete
Starting pass5
Pass5 complete
gfs2_fsck complete

Comment 8 errata-xmlrpc 2011-05-23 20:30:33 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html