1162216 – GFS2: improve performance of gfs2_edit savemeta

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1162216 - GFS2: improve performance of gfs2_edit savemeta

Summary: GFS2: improve performance of gfs2_edit savemeta

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	gfs2-utils
Sub Component:
Version:	7.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Andrew Price
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:	1184482
Blocks:	1111393 1497636
TreeView+	depends on / blocked

Reported:	2014-11-10 14:43 UTC by Nate Straz
Modified:	2017-10-02 09:59 UTC (History)
CC List:	5 users (show)
Fixed In Version:	gfs2-utils-3.1.8-6.el7
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1266604 (view as bug list)
Environment:
Last Closed:	2015-11-19 03:52:54 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
iowatcher graph of gfs2_edit restoremeta on 100TB file system (8.37 MB, image/svg+xml) 2014-11-19 21:28 UTC, Nate Straz	no flags	Details
Patch submitted upstream (6.20 KB, patch) 2015-02-26 14:39 UTC, Andrew Price	no flags	Details \| Diff
Patch to speed up is_block_in_per_node() (3.67 KB, patch) 2015-09-03 15:03 UTC, Andrew Price	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2015:2178	0	normal	SHIPPED_LIVE	gfs2-utils bug fix and enhancement update	2015-11-19 07:52:21 UTC

Description Nate Straz 2014-11-10 14:43:31 UTC

Description of problem:

gfs2_edit savemeta takes a long time to capture very large file systems.  I captured the metadata from a 80% full 100TB file system and it took just over 141 hours (~6 days).  According to the storage array, 1464GB were used on the sparse LUN.  This comes out to a ~3MB/s transfer rate.

Version-Release number of selected component (if applicable):
gfs2-utils-3.1.7-1.el7.x86_64

How reproducible:
Easily

Steps to Reproduce:
1. Create a large file system with lots of files
2. time gfs2_edit savemeta /dev/foo


Actual results:

[root@buzz-02 fsck]# /usr/bin/time gfs2_edit savemeta -z9 /dev/buzzez/big buzzez-100TB.savemeta
There are 26843545600 blocks of 4096 bytes in the destination device.
Reading resource groups...Done. File system size: 99.1022TB

26843185520 blocks processed, 183199987 saved (100%)

Metadata saved to file buzzez-100TB.savemeta (gzipped, level 9).
250128.84user 7283.33system 141:01:53elapsed 50%CPU (0avgtext+0avgdata 6987092maxresident)k
7765026392inputs+106332592outputs (2major+183697275minor)pagefaults 0swaps
[root@buzz-02 fsck]# ls -lh
total 51G
-rw-r--r--. 1 root root 51G Nov  9 11:27 buzzez-100TB.savemeta

Expected results:

savemeta should be able to read metadata near the top speed of the storage.

Additional info:

Comment 3 Nate Straz 2014-11-19 21:28:18 UTC

Created attachment 959177 [details]
iowatcher graph of gfs2_edit restoremeta on 100TB file system

The performance of the restoremeta operation could probably be improved too, but it's not nearly as bad as the savemeta operation.

Comment 4 Andrew Price 2015-02-26 14:39:29 UTC

Created attachment 995650 [details]
Patch submitted upstream

From the commit log:

    By the time savemeta() is called the rindex has already been read into
    memory, and before savemeta() processes each resource group it calls
    gfs2_rgrp_read() to read in the rgrp header. It frees the rgrp once it's
    done with it.
    
    Strangely, before processing the resource groups, savemeta() reads the
    superblock a second time and calls ri_update() which reads the rindex
    again, as well as reading in every rgrp header to be kept in memory for
    the duration. This caused high memory usage and a noticeable performance
    reduction when saving metadata of large file systems.
    
    To solve these problems, this patch removes the code which re-reads the
    superblock, rindex and the rgrps. The code which reads the rindex has
    been reorganised for clarity and the sbd.fssize field is now set
    properly at that point.
    
    With this patch, using a large fs, I'm seeing improvements similar to:
    
    Before: 43:36.12elapsed 10822268maxresident k
    After:  28:44.67elapsed   226980maxresident k

Comment 8 Nate Straz 2015-08-24 00:30:09 UTC

I ran gfs2_edit savemeta on several file systems of increasing size, all full from our mockup script.  I did see an improvement of memory usage, but I did not see a significant speed improvement.


                gfs2-utils-3.1.7-6.el7          gfs2-utils-3.1.8-4.el7.x86_64
Size            Time            Memory          Time            Memory
10G                 2:54.20     11880k             2:57.47      11140k
100G               29:30.37     19648k            30:02.53      11300k
1000G            5:04:58        95812k          4:59:14         12808k
100T            23:30:17      8670492k         23:18:11        183832k

Comment 9 Andrew Price 2015-09-01 09:53:50 UTC

Hm ok, I'll take another look at this. If you can provide a perf report and/or the mockup script it would be useful. I would still like to keep the memory usage patch in 7.2 though, so if time runs out perhaps we can re-title this bz to reflect the memory usage improvement and open a new one to improve performance in 7.3.

Comment 10 Andrew Price 2015-09-03 15:03:54 UTC

Created attachment 1069920 [details]
Patch to speed up is_block_in_per_node()

I've submitted this patch upstream. savemeta was spending a surprising amount of time in is_block_in_per_node() and it was at the top of my perf reports by a mile. With this patch I've seen improvements similar to:

Before:
13672.51user 183.71system 4:24:43elapsed 87%CPU (0avgtext+0avgdata 12488maxresident)k
152634312inputs+2194408outputs (7major+181883minor)pagefaults 0swaps

13530.38user 81.83system 4:20:39elapsed 87%CPU (0avgtext+0avgdata 12488maxresident)k
152634432inputs+2194408outputs (7major+142380minor)pagefaults 0swaps


After:
3160.85user 64.13system 1:28:56elapsed 60%CPU (0avgtext+0avgdata 12492maxresident)k
152555736inputs+2194408outputs (0major+16440minor)pagefaults 0swaps

3159.36user 66.71system 1:29:00elapsed 60%CPU (0avgtext+0avgdata 12492maxresident)k
152559760inputs+2194408outputs (7major+16620minor)pagefaults 0swaps

Comment 12 Nate Straz 2015-09-10 20:25:40 UTC

I'm seeing some improvement, but it's still taking over 4 hours to run savemeta on a 1TB file system that's 80% full.  That's equivalent to ~7MB/s if I read the entire device with dd.  That's better than the 3MB/s I estimated in the original bug, but still not very good.


gfs2-utils-3.1.8-4.el7.x86_64 (current 7.2 candidate)

Filesystem                    Size  Used Avail Use% Mounted on
/dev/mapper/fsck-perf-nodata  1.0T  820G  205G  81% /mnt/perf
Filesystem                     Inodes   IUsed    IFree IUse% Mounted on
/dev/mapper/fsck-perf-nodata 58064682 4437952 53626730    8% /mnt/perf
=== savemeta ===
There are 268437428 blocks of 4096 bytes in the filesystem.
Filesystem size: 1024.7GB
268437428 blocks processed, 9172069 saved (100%)

Metadata saved to file /home/test/savemeta (uncompressed).
2756.90user 189.99system 6:47:31elapsed 12%CPU (0avgtext+0avgdata 12672maxresident)k
442288320inputs+7618736outputs (2major+202083minor)pagefaults 0swaps


gfs2-utils-3.1.8-6.el7.x86_64 (test package from Andy)

Filesystem                     Size  Used Avail Use% Mounted on
/dev/mapper/fsck-perf3-nodata  1.0T  820G  205G  81% /mnt/perf
Filesystem                      Inodes   IUsed    IFree IUse% Mounted on
/dev/mapper/fsck-perf3-nodata 58006936 4441920 53565016    8% /mnt/perf
=== savemeta ===
There are 268437428 blocks of 4096 bytes in the filesystem.
Filesystem size: 1024.7GB
268437428 blocks processed, 9180621 saved (100%)

Metadata saved to file /home/test/savemeta (uncompressed).
68.24user 205.07system 4:08:37elapsed 1%CPU (0avgtext+0avgdata 12676maxresident)k
219760008inputs+7618728outputs (2major+27760minor)pagefaults 0swaps

Comment 13 Andrew Price 2015-09-11 14:03:08 UTC

I'm not convinced that it's bottlenecked on the read side any more. My perf report after applying the patch showed most of the time was spent in zlib, even though I was saving to /dev/null so it must have been plain compression time. What timings do you get with -z 0 ?

Comment 14 Nate Straz 2015-09-11 14:51:47 UTC

My runs from comment 12 are with -z 0.

I still waiting on the 10TB tests to complete.  I ran perf trace on them and I'm seeing each block being read twice.  

   597.322 ( 0.004 ms): preadv(fd: 3</dev/dm-17>, vec: 0x7fffa66bb770, vlen: 1, pos_l: 8197245685760, pos_h: 1) = 4096
   597.342 ( 0.012 ms): write(fd: 4</home/test/savemeta>, buf: 0x2172390, count: 1386         ) = 1386
   597.353 ( 0.004 ms): preadv(fd: 3</dev/dm-17>, vec: 0x7fffa66bb6c0, vlen: 1, pos_l: 8197245685760, pos_h: 1) = 4096
   597.364 ( 0.004 ms): preadv(fd: 3</dev/dm-17>, vec: 0x7fffa66bb6c0, vlen: 1, pos_l: 8197245689856, pos_h: 1) = 4096
   597.371 ( 0.004 ms): preadv(fd: 3</dev/dm-17>, vec: 0x7fffa66bb630, vlen: 1, pos_l: 8197245689856, pos_h: 1) = 4096
   597.384 ( 0.004 ms): write(fd: 4</home/test/savemeta>, buf: 0x2174520, count: 93           ) = 93
   597.603 ( 0.216 ms): preadv(fd: 3</dev/dm-17>, vec: 0x7fffa66bb770, vlen: 1, pos_l: 8197246279680, pos_h: 1) = 4096
   597.615 ( 0.004 ms): write(fd: 4</home/test/savemeta>, buf: 0x2172390, count: 402          ) = 402
   597.619 ( 0.002 ms): preadv(fd: 3</dev/dm-17>, vec: 0x7fffa66bb6c0, vlen: 1, pos_l: 8197246279680, pos_h: 1) = 4096
   597.867 ( 0.246 ms): preadv(fd: 3</dev/dm-17>, vec: 0x7fffa66bb6c0, vlen: 1, pos_l: 8197246283776, pos_h: 1) = 4096
   597.875 ( 0.003 ms): preadv(fd: 3</dev/dm-17>, vec: 0x7fffa66bb630, vlen: 1, pos_l: 8197246283776, pos_h: 1) = 4096

Comment 18 Nate Straz 2015-09-25 19:57:35 UTC

Verified against gfs2-utils-3.1.8-6.el7.

File system filled to 80% before savemeta was run.

 FS   |   gfs2-utils-3.1.7-6.el7 |    gfs2-utils-3.1.8-6.el7
Size  |  elapsed   CPU%   Memory |  elapsed   CPU%   Memory
=============================================================
 10GB |    0:57.72  46%   11852k |    0:15.30  10%   11012k
100GB |   16:48.19  27%   19504k |    9:30.99   3%   11140k
  1TB | 3:27:41     22%   97700k | 2:32:31      2%   12672k

Comment 20 errata-xmlrpc 2015-11-19 03:52:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2178.html

Note You need to log in before you can comment on or make changes to this bug.