Bug 1017381

Summary: gfs2_grow uses wrong resource group size
Product: Red Hat Enterprise Linux 7 Reporter: Nate Straz <nstraz>
Component: gfs2-utilsAssignee: Andrew Price <anprice>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: cluster-maint, swhiteho
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: gfs2-utils-3.1.7-1.el7 Doc Type: Bug Fix
Doc Text:
Cause: gfs2_grow made assumptions about the existing resource group size which were no longer valid after recent mkfs.gfs2 changes. Consequence: gfs2_grow could create resource groups which were too small and unaligned, which could effect file system efficiency. Fix: gfs2_grow was updated to use the new resource group placement and alignment strategy. Result: gfs2_grow now places new resource groups consistent with the existing resource group size and the impact on fs efficiency is avoided.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-03-05 09:26:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1112342    
Bug Blocks:    
Attachments:
Description Flags
Script for experimenting with gfs2_grow none

Description Nate Straz 2013-10-09 18:22:58 UTC
Description of problem:

When growing a GFS2 file system, gfs2_grow uses the size of the last RG, even though that RG many not be as large as the rest of the RGs in the file system.

Here is a file system that started with 128MB RGs

RG index entries found: 1729.
RG #0
  ri_addr               65                  0x41
  ri_length             33                  0x21
  ri_data0              98                  0x62
  ri_data               131036              0x1ffdc
  ri_bitbytes           32759               0x7ff7
RG #1
  ri_addr               131137              0x20041
  ri_length             33                  0x21
  ri_data0              131170              0x20062
  ri_data               131036              0x1ffdc
  ri_bitbytes           32759               0x7ff7
...
RG #663
  ri_addr               86900801            0x52e0041
  ri_length             33                  0x21
  ri_data0              86900834            0x52e0062
  ri_data               131036              0x1ffdc
  ri_bitbytes           32759               0x7ff7
RG #664
  ri_addr               87031873            0x5300041
  ri_length             21                  0x15
  ri_data0              87031894            0x5300056
  ri_data               81832               0x13fa8
  ri_bitbytes           20458               0x4fea

The last original RG was a partial

RG #665
  ri_addr               87113726            0x5313ffe
  ri_length             21                  0x15
  ri_data0              87113747            0x5314013
  ri_data               81832               0x13fa8
  ri_bitbytes           20458               0x4fea
...
RG #1728
  ri_addr               174123465           0xa60e9c9
  ri_length             21                  0x15
  ri_data0              174123486           0xa60e9de
  ri_data               81832               0x13fa8
  ri_bitbytes           20458               0x4fea

The new RGs ended up being the same small size.

Version-Release number of selected component (if applicable):
gfs2-utils-3.1.6-5.el7.x86_64

How reproducible:
Easily

Steps to Reproduce:
1. mkfs -t gfs2 $dev
2. gfs2_grow $dev
3. gfs2_edit -p rindex $dev

Actual results:
See above

Expected results:
New RGs should match the originally intended RG size.

Additional info:

Comment 2 Andrew Price 2013-10-12 21:25:39 UTC
This is going to be a little tricky as we don't store the original requested rgrp size and with the latest mkfs work we might even expand some rgrps to accommodate single-extent journals. So it's always going to be best-guess (use the most common rgrp size ignoring ones containing journals?) unless we add an option to gfs2_grow to specify the rg size explicitly.

Comment 3 Steve Whitehouse 2013-10-14 11:58:54 UTC
Well I hope that you can use the same functions are are being used in mkfs in order to size and location the new rgrps at appropriate points. That way it doesn't need to depend upon what is already on disk, and we can lay things out in the most efficient way.

Comment 4 Andrew Price 2013-10-14 12:41:32 UTC
We can use the same code but the problem is that, in mkfs.gfs2, we have the -r <rgsize> option to specify the base size of the rgrps and once mkfs.gfs2 is done, the rgrp size the user specified is forgotten because we don't store it (e.g. in the superblock like xfs does). We need that value to accurately work out the size of the resource groups before applying alignment and adjustment, and since we don't have it we'll always have to guess it in gfs2_grow based on what's already on disk.

Comment 5 Andrew Price 2014-01-26 00:47:27 UTC
I've been thinking about this some more and experimenting with some ideas and I've just about convinced myself that the new mkfs.gfs2 strategy of resizing the last couple of resource groups to fit on the device needs a rethink.

I wonder if it would be better to have a consistent resource group size (beyond the eventual specially-sized initial resource groups containing journals) such that, when the final resource group is placed, it would have the same number of bitmap blocks (i.e. the same ri_length) as the others but different ri_data, ri_bitbytes and rg_free fields, restricting gfs2's use of the bitmap blocks to only map the remaining device blocks. (I have a mkfs.gfs2 patch which does this.)

gfs2_grow could then rely on the last-but-one resource group having a representative size, and it could expand the final resource group to the same size due to there being enough bitmap blocks placed by mkfs.gfs2.

However I'm not certain that gfs2 would be able to deal with a) the possibly unusual mismatch between these fields and the number of bitmap blocks, and b) the growth of the final resource group in gfs2_grow. I'd like to get some feedback on those points. I guess the relevant question is, is it ok for gfs2_grow to increase the size of the last resource group while the fs is mounted, or is gfs2_grow limited to only adding new resource groups after the last one?

Comment 6 Andrew Price 2014-04-03 15:34:51 UTC
Patches posted upstream: https://www.redhat.com/archives/cluster-devel/2014-April/msg00053.html

There are 14 patches but some of them are quite small. They'll all be required as the fix depends on the libgfs2 rgrp API fixes and improvements which came before.

Comment 7 Ludek Smid 2014-06-26 10:48:17 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Comment 8 Andrew Price 2014-06-26 10:56:06 UTC
(In reply to Ludek Smid from comment #7)
> This request was resolved in Red Hat Enterprise Linux 7.0.

Incorrect. If it was resolved it wouldn't still be in ASSIGNED state. Requesting the rhel-7.1.0 flag.

Comment 12 Nate Straz 2014-10-14 19:18:01 UTC
Created attachment 947006 [details]
Script for experimenting with gfs2_grow

I did some experimenting with different grow patterns and gfs2_grow appears to be doing the right thing.  As long as it can create an RG, it will and use all of the added space it can.   None of the old RGs are modified.  The new RG sizes are only based on the amount of space added.

Verified with gfs2-utils-3.1.7-1.el7.x86_64

[root@host-079 ~]# sh grow_ri_diff.sh nate 5G 50M 50M 1G 5G
--- create 5G LV ---
  Wiping gfs2 signature on /dev/nate/grow.1840.
  Logical volume "grow.1840" created
/dev/nate/grow.1840 is a symbolic link to /dev/dm-2
This will destroy any data on /dev/dm-2
Device:                    /dev/nate/grow.1840
Block size:                4096
Device size:               5.00 GB (1310720 blocks)
Filesystem size:           5.00 GB (1310716 blocks)
Journals:                  1
Resource groups:           4
Locking protocol:          "lock_nolock"
Lock table:                ""
UUID:                      4ec2452c-6c0f-2160-7040-a446d7101d91
=== Original rindex ===
RG index entries found: 4.
RG #0
  ri_data0              20                  0x14
  ri_data               32836               0x8044
RG #1
  ri_data0              32883               0x8073
  ri_data               425928              0x67fc8
RG #2
  ri_data0              458838              0x70056
  ri_data               425924              0x67fc4
RG #3
  ri_data0              884792              0xd8038
  ri_data               425924              0x67fc4
--- grow by 50M ---
  Rounding size to boundary between physical extents: 52.00 MiB
  Size of logical volume nate/grow.1840 changed from 5.00 GiB (1280 extents) to 5.05 GiB (1293 extents).
  Logical volume grow.1840 successfully resized
FS: Mount point:             /mnt/grow.1840
FS: Device:                  /dev/mapper/nate-grow.1840
FS: Size:                    1310717 (0x13fffd)
FS: New resource group size: 13315 (0x3403)
DEV: Length:                 1324032 (0x143400)
The file system will grow by 52MB.
gfs2_grow complete.
=== rindex diff 1 ===
--- /tmp/grow_rindex.dpOb/rindex.0      2014-10-14 14:14:46.058959121 -0500
+++ /tmp/grow_rindex.dpOb/rindex.1      2014-10-14 14:14:46.287959114 -0500
@@ -1 +1 @@
-RG index entries found: 4.
+RG index entries found: 5.
@@ -13,0 +14,3 @@
+RG #4
+  ri_data0              1310718             0x13fffe
+  ri_data               13312               0x3400
--- grow by 50M ---
  Rounding size to boundary between physical extents: 52.00 MiB
  Size of logical volume nate/grow.1840 changed from 5.05 GiB (1293 extents) to 5.10 GiB (1306 extents).
  Logical volume grow.1840 successfully resized
FS: Mount point:             /mnt/grow.1840
FS: Device:                  /dev/mapper/nate-grow.1840
FS: Size:                    1324031 (0x1433ff)
FS: New resource group size: 13313 (0x3401)
DEV: Length:                 1337344 (0x146800)
The file system will grow by 52MB.
gfs2_grow complete.
=== rindex diff 2 ===
--- /tmp/grow_rindex.dpOb/rindex.1      2014-10-14 14:14:46.287959114 -0500
+++ /tmp/grow_rindex.dpOb/rindex.2      2014-10-14 14:14:46.515959106 -0500
@@ -1 +1 @@
-RG index entries found: 5.
+RG index entries found: 6.
@@ -16,0 +17,3 @@
+RG #5
+  ri_data0              1324032             0x143400
+  ri_data               13308               0x33fc
--- grow by 1G ---
  Size of logical volume nate/grow.1840 changed from 5.10 GiB (1306 extents) to 6.10 GiB (1562 extents).
  Logical volume grow.1840 successfully resized
FS: Mount point:             /mnt/grow.1840
FS: Device:                  /dev/mapper/nate-grow.1840
FS: Size:                    1337341 (0x1467fd)
FS: New resource group size: 262147 (0x40003)
DEV: Length:                 1599488 (0x186800)
The file system will grow by 1024MB.
gfs2_grow complete.
=== rindex diff 3 ===
--- /tmp/grow_rindex.dpOb/rindex.2      2014-10-14 14:14:46.515959106 -0500
+++ /tmp/grow_rindex.dpOb/rindex.3      2014-10-14 14:14:46.735959099 -0500
@@ -1 +1 @@
-RG index entries found: 6.
+RG index entries found: 7.
@@ -19,0 +20,3 @@
+RG #6
+  ri_data0              1337358             0x14680e
+  ri_data               262128              0x3fff0
--- grow by 5G ---
  Size of logical volume nate/grow.1840 changed from 6.10 GiB (1562 extents) to 11.10 GiB (2842 extents).
  Logical volume grow.1840 successfully resized
FS: Mount point:             /mnt/grow.1840
FS: Device:                  /dev/mapper/nate-grow.1840
FS: Size:                    1599487 (0x1867ff)
FS: New resource group size: 436907 (0x6aaab)
DEV: Length:                 2910208 (0x2c6800)
The file system will grow by 5120MB.
gfs2_grow complete.
=== rindex diff 4 ===
--- /tmp/grow_rindex.dpOb/rindex.3      2014-10-14 14:14:46.735959099 -0500
+++ /tmp/grow_rindex.dpOb/rindex.4      2014-10-14 14:14:46.969959092 -0500
@@ -1 +1 @@
-RG index entries found: 7.
+RG index entries found: 10.
@@ -22,0 +23,9 @@
+RG #7
+  ri_data0              1599514             0x18681a
+  ri_data               436880              0x6aa90
+RG #8
+  ri_data0              2036421             0x1f12c5
+  ri_data               436880              0x6aa90
+RG #9
+  ri_data0              2473328             0x25bd70
+  ri_data               436876              0x6aa8c
  Logical volume "grow.1840" successfully removed

Comment 14 errata-xmlrpc 2015-03-05 09:26:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0428.html