1194446 – GFS2: mkfs.gfs2 scalability issue on large devices

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1194446 - GFS2: mkfs.gfs2 scalability issue on large devices

Summary: GFS2: mkfs.gfs2 scalability issue on large devices

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	gfs2-utils
Sub Component:
Version:	7.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Andrew Price
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:	1184482
Blocks:	1111393 1497636
TreeView+	depends on / blocked

Reported:	2015-02-19 20:08 UTC by Nate Straz
Modified:	2017-10-02 09:59 UTC (History)
CC List:	2 users (show)
Fixed In Version:	gfs2-utils-3.1.8-1.el7
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-11-19 03:53:50 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Patch submitted upstream (1.98 KB, patch) 2015-02-20 00:36 UTC, Andrew Price	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2015:2178	0	normal	SHIPPED_LIVE	gfs2-utils bug fix and enhancement update	2015-11-19 07:52:21 UTC

Description Nate Straz 2015-02-19 20:08:22 UTC

Description of problem:

A mkfs.gfs2 on a 250TB device took 7 hours last night and used a lot of CPU time.

# /usr/bin/time mkfs -t gfs2 -p lock_nolock -O /dev/XL/gfs2
24723.05user 59.37system 7:08:46elapsed 96%CPU (0avgtext+0avgdata 417080maxresident)k

I collected perf data on file systems from 100GB to 100TB and found that lgfs2_rgrps_append quickly dominates all other functions.

==> perf-100G.txt <==
# Overhead    Command                        Symbol
# ........  .........  ............................
     3.15%  mkfs.gfs2  [.] gfs2_disk_hash
     0.30%  mkfs.gfs2  [.] gfs2_meta_header_out_bh

==> perf-1T.txt <==
# Overhead    Command                        Symbol
# ........  .........  ............................
    17.48%  mkfs.gfs2  [.] lgfs2_rgrps_append
     1.29%  mkfs.gfs2  [.] gfs2_disk_hash
     0.12%  mkfs.gfs2  [.] lgfs2_rgrp_write

==> perf-10T.txt <==
# Overhead    Command                        Symbol
# ........  .........  ............................

    84.46%  mkfs.gfs2  [.] lgfs2_rgrps_append
     0.03%  mkfs.gfs2  [.] lgfs2_rgrp_bitbuf_alloc

==> perf-100T.txt <==
# Overhead    Command                        Symbol
# ........  .........  ............................
    98.85%  mkfs.gfs2  [.] lgfs2_rgrps_append
     0.00%  mkfs.gfs2  [.] __errno_location@plt

perf annotate:

Sorted summary for file /usr/sbin/mkfs.gfs2
----------------------------------------------

   97.15 /usr/src/debug/gfs2-utils-3.1.7/gfs2/libgfs2/../../gfs2/include/osi_tree.h:320
    2.81 /usr/src/debug/gfs2-utils-3.1.7/gfs2/libgfs2/../../gfs2/include/osi_tree.h:321


313 static inline struct osi_node *osi_last(struct osi_root *root)
...
320         while (n->osi_right)
321                 n = n->osi_right;
322         return n;

It looks like the tree being used isn't balanced and probably has turned into a list which is traversed every time an RG is added.  That ends up being a list 400k entries deep on a 100TB file system w/ 256MB RGs.


Version-Release number of selected component (if applicable):
gfs2-utils-3.1.7-6.el7.x86_64

How reproducible:
Easily

Steps to Reproduce:
1. perf record mkfs.gfs2 -O -p lock_nolock -j 1 /dev/foo
2. perf report --stdio -d mkfs.gfs2 | head -n 30

Actual results:



Expected results:


Additional info:

Comment 1 Andrew Price 2015-02-20 00:36:47 UTC

Created attachment 993801 [details]
Patch submitted upstream

With this patch I'm seeing much better performance with a 250T volume and far lower CPU usage:

Before: 13034.77user 41.25system 3:47:21elapsed 95%CPU (0avgtext+0avgdata 416248maxresident)k
2840inputs+41337136outputs (0major+449613minor)pagefaults 0swaps

After: 7.07user 32.58system 29:16.12elapsed 2%CPU (0avgtext+0avgdata 416308maxresident)k
3368inputs+41337136outputs (1major+105705minor)pagefaults 0swaps

Comment 3 Andrew Price 2015-02-24 16:15:00 UTC

Patch is now upstream and will land in RHEL7 with the gfs2-utils rebase.

Comment 6 Nate Straz 2015-08-25 11:26:38 UTC

BEFORE with gfs2-utils-3.1.7-6.el7.x86_64

[root@dash-02 ~]# /usr/bin/time blkdiscard /dev/fsck/large
0.00user 66.44system 5:10.35elapsed 21%CPU (0avgtext+0avgdata 632maxresident)k
40inputs+0outputs (1major+198minor)pagefaults 0swaps
[root@dash-02 ~]# /usr/bin/time mkfs.gfs2 -O -p lock_nolock -j 1 -K /dev/fsck/large
/dev/fsck/large is a symbolic link to /dev/dm-16
This will destroy any data on /dev/dm-16
Device:                    /dev/fsck/large
Block size:                4096
Device size:               256000.00 GB (67108866048 blocks)
Filesystem size:           255999.98 GB (67108860932 blocks)
Journals:                  1
Resource groups:           1023251
Locking protocol:          "lock_nolock"
Lock table:                ""
UUID:                      ec8d8197-816a-fafc-330f-1d2e3543dfb3
20854.75user 21.14system 5:48:04elapsed 99%CPU (0avgtext+0avgdata 416772maxresident)k
2960inputs+41387248outputs (0major+576524minor)pagefaults 0swaps
[root@dash-02 ~]# /usr/bin/time mkfs.gfs2 -O -p lock_nolock -j 1 -K /dev/fsck/large
It appears to contain an existing filesystem (gfs2)
/dev/fsck/large is a symbolic link to /dev/dm-16
This will destroy any data on /dev/dm-16
Device:                    /dev/fsck/large
Block size:                4096
Device size:               256000.00 GB (67108866048 blocks)
Filesystem size:           255999.98 GB (67108860932 blocks)
Journals:                  1
Resource groups:           1023251
Locking protocol:          "lock_nolock"
Lock table:                ""
UUID:                      8e15a54a-dc63-acb5-c92b-1a16059f7218
21262.27user 20.98system 5:54:51elapsed 99%CPU (0avgtext+0avgdata 416812maxresident)k
2952inputs+41387248outputs (0major+633204minor)pagefaults 0swaps


AFTER with gfs2-utils-3.1.8-4.el7.x86_64

[root@dash-03 ~]# blkdiscard /dev/fsck/large
[root@dash-03 ~]# /usr/bin/time mkfs.gfs2 -O -p lock_nolock -j 1 -K /dev/fsck/large
/dev/fsck/large is a symbolic link to /dev/dm-16
This will destroy any data on /dev/dm-16
Device:                    /dev/fsck/large
Block size:                4096
Device size:               256000.00 GB (67108866048 blocks)
Filesystem size:           255999.98 GB (67108860932 blocks)
Journals:                  1
Resource groups:           1023251
Locking protocol:          "lock_nolock"
Lock table:                ""
UUID:                      feb57a73-338d-70bf-eacc-212d200340b1
2.39user 24.40system 3:07.09elapsed 14%CPU (0avgtext+0avgdata 416800maxresident)k
2960inputs+41387248outputs (0major+170243minor)pagefaults 0swaps
[root@dash-03 ~]# /usr/bin/time mkfs.gfs2 -O -p lock_nolock -j 1 -K /dev/fsck/large
It appears to contain an existing filesystem (gfs2)
/dev/fsck/large is a symbolic link to /dev/dm-16
This will destroy any data on /dev/dm-16
Device:                    /dev/fsck/large
Block size:                4096
Device size:               256000.00 GB (67108866048 blocks)
Filesystem size:           255999.98 GB (67108860932 blocks)
Journals:                  1
Resource groups:           1023251
Locking protocol:          "lock_nolock"
Lock table:                ""
UUID:                      9d17c0bc-57b7-dcca-fe5a-667ae8e98ced
2.31user 24.08system 1:13.30elapsed 36%CPU (0avgtext+0avgdata 416816maxresident)k
2952inputs+41387248outputs (0major+193953minor)pagefaults 0swaps
[root@dash-03 ~]# /usr/bin/time mkfs.gfs2 -O -p lock_nolock -j 1 -K /dev/fsck/large
It appears to contain an existing filesystem (gfs2)
/dev/fsck/large is a symbolic link to /dev/dm-16
This will destroy any data on /dev/dm-16
Device:                    /dev/fsck/large
Block size:                4096
Device size:               256000.00 GB (67108866048 blocks)
Filesystem size:           255999.98 GB (67108860932 blocks)
Journals:                  1
Resource groups:           1023251
Locking protocol:          "lock_nolock"
Lock table:                ""
UUID:                      9f8bb1f9-7a75-13ab-8df9-dadfefa59650
2.47user 21.97system 1:06.49elapsed 36%CPU (0avgtext+0avgdata 416808maxresident)k
2952inputs+41387248outputs (0major+204423minor)pagefaults 0swaps

Comment 8 errata-xmlrpc 2015-11-19 03:53:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2178.html

Note You need to log in before you can comment on or make changes to this bug.