Bug 482756

Summary:

GFS2: After gfs2_grow, new size is not seen immediately

Product:

Red Hat Enterprise Linux 5

Reporter:

Robert Peterson <rpeterso>

Component:

kernel

Assignee:

Ben Marzinski <bmarzins>

Status:

CLOSED ERRATA

QA Contact:

Cluster QE <mspqa-list>

Severity:

high

Docs Contact:

Priority:

high

Version:

5.3

CC:

bturner, ctatman, cward, dejohnso, dzickus, edamato, jtluka, qcai, rpeterso, swhiteho, tao, tdunnon

Target Milestone:

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

469773

Environment:

Last Closed:

2010-03-30 07:10:32 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

526947, 533192

Attachments:

Description	Flags
Fix to reinitialize the resource group index after growing the filesystem	none
gfs2_grow strace with hotfix installed	none
vgdisplay of the customer's system	none
messages from gfs2_grow with the debug kernel	none

Description Robert Peterson 2009-01-27 21:39:59 UTC

+++ This bug was initially created as a clone of Bug #469773 +++
Since the symptom is nearly the same, I decided to use the original
bz record, bug #469773 for the user space fix.  This new bug record
may be used for the gfs2 kernel changes needed to fix the problem.
See the bottom of this description text for a breakdown on the two
problems and how we plan to fix them.

Original text follows:

Description of problem:

Our test suite for growfs on GFS doesn't work on GFS2 after updating the commands because the file system size doesn't update until after the gfs2_grow command exits.

Version-Release number of selected component (if applicable):
gfs2-utils-0.1.49-1.el5

How reproducible:
100%

Steps to Reproduce:
1. df /mnt/gfs2
2. lvextend
3. gfs2_grow /mnt/gfs2; df /mnt/gfs2
4. Compare output from 1 and 3
  
Actual results:
lvextend -l +50%FREE growfs/gfs2 on west-02
growing gfs2 on west-01
verifying grow
size of gfs /mnt/gfs2 did not increase,
was: 79008, is now: 79008
after 1 seconds


Expected results:
The new size should be available immediately after gfs2_grow exits.

Additional info:

--- Additional comment from nstraz on 2008-11-04 16:51:03 EDT ---

Moving this out to RHEL 5.4.  This could cause problems with management tools which expect the grow to work right away, but it's too late in the 5.3 cycle to get this in.

--- Additional comment from swhiteho on 2008-12-03 05:18:30 EDT ---

We also need to look into what happens when we add new journals to a live filesystem. Currently they seem to be ignored, so that if a node were to mount the newly created journal and then fail, its journal might not be recoverable by one of the previously existing nodes.

This is a result of changing the jindex from a special file to a directory I think, as we no longer keep the shared lock on it all the time, like we used to.

I spotted this recently when looking at the recovery code.

--- Additional comment from rpeterso on 2009-01-21 08:52:02 EDT ---

While fixing this bug and testing the fix, I found another related
nasty bug in gfs2_grow.  It relates to alternate block sizes.  Here
is the symptom:

[root@roth-01 ../src/redhat/RPMS/x86_64]# lvcreate --name roth_lv -L 5G /dev/roth_vg
  Logical volume "roth_lv" created
[root@roth-01 ../src/redhat/RPMS/x86_64]# mkfs.gfs2 -O -b1024 -t bobs_roth:test_gfs -p lock_dlm -j 1 /dev/roth_vg/roth_lv
Device:                    /dev/roth_vg/roth_lv
Blocksize:                 1024
Device Size                5.00 GB (5242880 blocks)
Filesystem Size:           5.00 GB (5242878 blocks)
Journals:                  1
Resource Groups:           20
Locking Protocol:          "lock_dlm"
Lock Table:                "bobs_roth:test_gfs"

[root@roth-01 ../src/redhat/RPMS/x86_64]# mount -tgfs2 /dev/roth_vg/roth_lv /mnt/gfs2
[root@roth-01 ../src/redhat/RPMS/x86_64]# /usr/sbin/lvresize -L +1T /dev/roth_vg/roth_lv
  Extending logical volume roth_lv to 1.00 TB
  Logical volume roth_lv successfully resized
[root@roth-01 ../src/redhat/RPMS/x86_64]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                       71G   58G  9.1G  87% /
/dev/sda1              99M   94M  220K 100% /boot
tmpfs                 279M     0  279M   0% /dev/shm
/dev/mapper/roth_vg-roth_lv
                      5.0G  131M  4.9G   3% /mnt/gfs2
[root@roth-01 ../src/redhat/RPMS/x86_64]# gfs2_grow /mnt/gfs2 ; df -h
FS: Mount Point: /mnt/gfs2
FS: Device:      /dev/mapper/roth_vg-roth_lv
FS: Size:        5242878 (0x4ffffe)
FS: RG size:     262140 (0x3fffc)
DEV: Size:       269746176 (0x10140000)
The file system grew by 258304MB.
gfs2_grow complete.
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                       71G   58G  9.1G  87% /
/dev/sda1              99M   94M  220K 100% /boot
tmpfs                 279M     0  279M   0% /dev/shm
/dev/mapper/roth_vg-roth_lv
                      257G  131M  257G   1% /mnt/gfs2
[root@roth-01 ../src/redhat/RPMS/x86_64]# 

So I extended the partition size by 1TB, but gfs2_grow only allocated
enough resource groups for one fourth of that, or 256G.

I debugged that problem and will post a patch shortly.  We will
definitely want to z-stream this one for 5.3.z.

--- Additional comment from rpeterso on 2009-01-21 08:54:54 EDT ---

Created an attachment (id=329604)
Patch to fix the problem

This patch was tested on system roth-01.

--- Additional comment from rpeterso on 2009-01-21 08:57:32 EDT ---

The same commands/output from comment #3, but with the patch applied:

[root@roth-01 ../bob/cluster/gfs2/mkfs]# lvcreate --name roth_lv -L 5G /dev/roth_vg
  Logical volume "roth_lv" created
[root@roth-01 ../bob/cluster/gfs2/mkfs]# mkfs.gfs2 -O -b1024 -t bobs_roth:test_gfs -p lock_dlm -j 1 /dev/roth_vg/roth_lv
Device:                    /dev/roth_vg/roth_lv
Blocksize:                 1024
Device Size                5.00 GB (5242880 blocks)
Filesystem Size:           5.00 GB (5242878 blocks)
Journals:                  1
Resource Groups:           20
Locking Protocol:          "lock_dlm"
Lock Table:                "bobs_roth:test_gfs"

[root@roth-01 ../bob/cluster/gfs2/mkfs]# mount -tgfs2 /dev/roth_vg/roth_lv /mnt/gfs2
[root@roth-01 ../bob/cluster/gfs2/mkfs]# /usr/sbin/lvresize -L +1T /dev/roth_vg/roth_lv
  Extending logical volume roth_lv to 1.00 TB
  Logical volume roth_lv successfully resized
[root@roth-01 ../bob/cluster/gfs2/mkfs]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                       71G   58G  9.1G  87% /
/dev/sda1              99M   94M  220K 100% /boot
tmpfs                 279M     0  279M   0% /dev/shm
/dev/mapper/roth_vg-roth_lv
                      5.0G  131M  4.9G   3% /mnt/gfs2
[root@roth-01 ../bob/cluster/gfs2/mkfs]# ./gfs2_grow /mnt/gfs2 ; df -h
FS: Mount Point: /mnt/gfs2
FS: Device:      /dev/mapper/roth_vg-roth_lv
FS: Size:        5242878 (0x4ffffe)
FS: RG size:     262140 (0x3fffc)
DEV: Size:       1078984704 (0x40500000)
The file system grew by 1048576MB.
gfs2_grow complete.
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                       71G   58G  9.1G  87% /
/dev/sda1              99M   94M  220K 100% /boot
tmpfs                 279M     0  279M   0% /dev/shm
/dev/mapper/roth_vg-roth_lv
                      1.1T  131M  1.1T   1% /mnt/gfs2
[root@roth-01 ../bob/cluster/gfs2/mkfs]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                       71G   58G  9.1G  87% /
/dev/sda1              99M   94M  220K 100% /boot
tmpfs                 279M     0  279M   0% /dev/shm
/dev/mapper/roth_vg-roth_lv
                      1.1T  131M  1.1T   1% /mnt/gfs2
[root@roth-01 ../bob/cluster/gfs2/mkfs]#

--- Additional comment from nstraz on 2009-01-21 09:50:06 EDT ---

I re-ran our growfs test script and was able to reproduce this with gfs2-utils-0.1.53-1.el5.  The test script does multiple file system grows in a row.  In this case the second grow did not immediately return the new size.

Starting io load to filesystems
adding /dev/sdb10 to VG growfs on dash-03
lvextend -l +50%FREE growfs/gfs1 on dash-02
growing gfs1 on dash-03
verifying grow
lvextend -l +50%FREE growfs/gfs2 on dash-03
growing gfs2 on dash-01
verifying grow
size of gfs /mnt/gfs2 did not increase,
was: 265702, is now: 265702

To Reproduce:

1. /usr/tests/sts-rhel5.3/gfs/bin/growfs -2 -i 1

--- Additional comment from rpeterso on 2009-01-21 11:11:37 EDT ---

Created an attachment (id=329621)
Patch for the block size problem

There are two problems to be fixed: (1) The non-default block size
problem, and (2) The fact that OTHER NODES do not see changes made
by gfs2_grow until some time after gfs2_grow ends, due to fast_statfs.

The previously posted patch does not fix problem 2.
This patch fixes problem 1 only.  After some discussion on irc,
we decided that problem 2 should be fixed rather than documented
around, and that the solution may very well involve the gfs2 kernel
module.  So it's likely we'll need a kernel bug record as well.

Steve's suggestion was to make statfs check the rindex to see if it
has changed.  "In the unlikely event that it has changed, we go back
to slow statfs [code path] for just the one call."

Nate also discovered that gfs's fast_statfs feature has the same problem
but it's apparently worse: it never re-syncs on the other nodes.
If fast_statfs is not used for gfs, the file system size is cluster
coherent (i.e. the bug does not recreate on gfs1 unless fast_statfs is
used).  I think we've know this is broken for a very long time.
I'm not sure it's easy to fix for gfs, and I'm not sure it's worth it.
But we do need to fix gfs2.

When we come up with a solution for problem 2, I'll likely use this
bugzilla to fix that, and open another for problem 1.

Comment 3 RHEL Program Management 2009-10-27 14:32:12 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 8 Ben Marzinski 2009-11-04 19:24:02 UTC

Created attachment 367528 [details]
Fix to reinitialize the resource group index after growing the filesystem

This problem actually exists on both single node and cluster setups.

The first problem, which caused it to fail on cluster setups, is that the rindex list was supposed to get invalidated when nodes dropped their rindex glock, but the code to do that was in meta_go_inval() instead of inode_go_inval().  I can't see any reason why that code was in meta_go_inval(). It never got called during my testing, and I can't see any way that it could get called, but I dislike removing code that I don't understand (and like I said, I have no idea why that code was there).  So if there's a reason for that meta_go_inval() code, someone please let me know, and I'll add it back.

The second problem is that one single node setups, the node never needs to drop the rindex glock.  There are multiple ways to solve this. I could have added code that manually updated the rindex list when you grew the filesystem.  Instead, I just forced the node to actually drop its rindex glock, which invalidates the rindex list. The next time the node needs to allocate memory, it will pick the glock back up and reinitialize the list.  This is not the fastest way to do things, but it does mean that all nodes in a cluster do the same thing to invalidate and reinitialize their rindex list, and since growing a filesystem is a pretty rare event, the additional overhead seems acceptable.

Comment 9 Ben Marzinski 2009-11-10 20:16:36 UTC

Posted

Comment 12 Don Zickus 2009-11-17 21:55:36 UTC

in kernel-2.6.18-174.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 17 Debbie Johnson 2010-02-02 20:53:09 UTC

NOTE:  From the customer:

The hotfix given to me by Jeremy West and Linda did not fix the customer issue.  Becasue ti was a grow issue, I did have them install the hotfix kernel on both nodes.  They attempted to grow a gfs2 volume, and were still unable to use the new space immediately.  New sosreports and stack traces for the grow have been attached to this ticket.

Will be attaching the following:

sosreport-mageshkumar.gajapathy.804042671-17973-811dea.tar.bz2
sosreport-mageshkumar.gajapathy.804042671-8183-a23186.tar.bz2
gfs2_grow.strace1

Comment 18 Debbie Johnson 2010-02-02 21:01:54 UTC

Created attachment 388382 [details]
gfs2_grow strace with hotfix installed

Comment 19 Issue Tracker 2010-02-02 21:14:42 UTC

Event posted on 02-02-2010 04:14pm EST by dejohnso

Verified from sosreport that hotfix is installed..

[dejohnso@dhcp242-193 mageshkumar.gajapathy.804042671-17973]$ cat uname
Linux sbici 2.6.18-174.el5 #1 SMP Mon Nov 16 22:54:31 EST 2009 x86_64
x86_64 x86_64 GNU/Linux
[dejohnso@dhcp242-193 mageshkumar.gajapathy.804042671-17973]$ 



This event sent from IssueTracker by dejohnso 
 issue 336608

Comment 20 Debbie Johnson 2010-02-03 14:40:07 UTC

NOTE:  Verified that hotfix has the code by extracting the src rpm and checking it.  I went over 
linux-2.6-gfs2-drop-rindex-glock-on-grows.patch
line by line and it is all there.

So why are they not seeing the grow?

Comment 21 Ben Marzinski 2010-02-03 15:02:15 UTC

Are they reproducing this the same way as before?  Do they still have to wait for the filesystem to be remounted to see the space, or does it appear if they wait a little bit.

Comment 25 Ben Marzinski 2010-02-04 18:33:10 UTC

Would it be possible to get a copy of all the commands that they run, and the output of all of them, including running lvdisplay and vgdisplay both at the start and the end of the testing?

Comment 26 Debbie Johnson 2010-02-04 18:58:48 UTC

Created attachment 388868 [details]
vgdisplay of the customer's system

Comment 27 Ben Marzinski 2010-02-04 19:48:53 UTC

Thanks, but I would really like this in the context of running all of the commands.  I'd also like to see what they used when they created the filesystem.

Also, looking at the vgdisplay command, it looks like they don't have clvmd running. However, they do have two nodes, right? Or are they testing with just a single node now?

If they are running in a cluster with two nodes accessing the storage, they need to have clvmd running, or things can go very wrong.  I'm not saying that this is the cause of their issue, but live-growing a shared volume in a cluster without clvmd running is a bad idea.

Comment 28 Ben Marzinski 2010-02-04 20:22:38 UTC

If the customer isn't running IO on both nodes (assuming that they are actually using both nodes), can they try doing some IO on the node that they didn't grow the filesystem on, after the grow completes, and see if that makes them able to see the new space?  This shouldn't be necessary to see the new space, but if this clears up the problem, that narrows down where the it could be.

Also, are they mounting the filesystem with any mount options?

Comment 50 Chris Ward 2010-02-11 10:25:23 UTC

~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~

RHEL 5.5 Beta has been released! There should be a fix present in this 
release that addresses your request. Please test and report back results 
here, by March 3rd 2010 (2010-03-03) or sooner.

Upon successful verification of this request, post your results and update 
the Verified field in Bugzilla with the appropriate value.

If you encounter any issues while testing, please describe them and set 
this bug into NEED_INFO. If you encounter new defects or have additional 
patch(es) to request for inclusion, please clone this bug per each request
and escalate through your support representative.

Comment 51 Ben Marzinski 2010-02-23 18:36:57 UTC

If this problem is still reproduceable, I need the information from the debug kernel to have a chance at solving it, since I am unable to reproduce it myself.

Comment 54 errata-xmlrpc 2010-03-30 07:10:32 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html

Comment 58 Ben Marzinski 2010-04-12 16:45:47 UTC

From the information in the last two comments, this doesn't look like the original bug.  I trust that the output from Comment #57 is from the command that caused the error in Comment #56, meaning the filesystem didn't grow the full size that it was supposed to.

After this happened, did the customer unmount and remount the filesystem? If so, did it fix the problem?  If unmounting and remounting didn't fix the problem, then this is a completely different bug than was originally reported.

This actually sounds a lot like bz #469773, which was a problem in the gfs2 utils, that caused filesystems to grow less than they should. It was fixed in gfs2-utils-0.1.58-1.el5.  According to the sosreports from the time of the original bug the customer was using gfs2-utils-0.1.53-1.el5_3.3-x86_64.

Can you check if they are currently using an updated gfs2-utils package?  If they are not, could they try using gfs2-utils-0.1.58-1.el5 or newer, and seeing if that solves their problem?

If they saw this while using gfs2-utils-0.1.58-1.el5 or a newer version, and the problem did not fix itself when they unmounted and remounted the filesystem, can you please either open a new bug or reopen #469773.  If remounting the filesystem did fix the problem, then we can probably keep the discussion under this bugzilla for now. In that case, I'd really like them to run my debug kernel, so I can see what happened to the resource group index.

Comment 65 Ben Turner 2010-04-26 22:15:35 UTC

The only entry I saw was:

Apr 20 15:54:46 sbidb kernel: GFS2: fsid=pbi_prd:ora_pbi_saporg.0: File system extended by 256160 blocks.

This can be found in the file messages.debugkernel.

Comment 66 Ben Turner 2010-04-26 22:17:09 UTC

Created attachment 409306 [details]
messages from gfs2_grow with the debug kernel