1257625 – fsck.gfs2 pass1c time scalability issue

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1257625 - fsck.gfs2 pass1c time scalability issue

Summary: fsck.gfs2 pass1c time scalability issue

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	gfs2-utils
Sub Component:
Version:	7.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Robert Peterson
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:	1271674
Blocks:	1111393 1497636
TreeView+	depends on / blocked

Reported:	2015-08-27 13:31 UTC by Nate Straz
Modified:	2017-10-02 09:59 UTC (History)
CC List:	4 users (show)
Fixed In Version:	gfs2-utils-3.1.9-1.el7
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-11-04 06:30:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Early prototype #1 (10.39 KB, patch) 2015-09-09 16:21 UTC, Robert Peterson	no flags	Details \| Diff
Set of 20 RHEL7 patches - posted and tested (20.71 KB, application/octet-stream) 2015-09-28 15:08 UTC, Robert Peterson	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:2438	0	normal	SHIPPED_LIVE	gfs2-utils bug fix and enhancement update	2016-11-03 14:01:57 UTC

Description Nate Straz 2015-08-27 13:31:08 UTC

Description of problem:

While testing 80% file systems I found that fsck.gfs2 pass1c time increased dramatically between 512GB and 1TB file system sizes.

128GB pass1c 1.184s
256GB pass1c 2.513s
512GB pass1c 5.510s
1TB pass1c 1h25m1.088s
2TB pass1c 4h50m5.258s


Version-Release number of selected component (if applicable):
gfs2-utils-3.1.8-4.el7.x86_64

How reproducible:
Easily

Steps to Reproduce:
1. create 1TB GFS2 file system, lock_nolock is okay
2. mockup -p 4 -F 80 -a fill -n 5000000 /mnt/gfs2
3. umount /mnt/gfs2
4. fsck.gfs2 /dev/foo

Actual results:
  Using default stripesize 64.00 KiB.
  Rounding size (262144 extents) up to stripe boundary size (262146 extents).
  Logical volume "perf" created.
=== mkfs.gfs2 1T ===
/dev/fsck/perf is a symbolic link to /dev/dm-16
This will destroy any data on /dev/dm-16
Device:                    /dev/fsck/perf
Block size:                4096
Device size:               1024.01 GB (268437504 blocks)
Filesystem size:           1024.01 GB (268437428 blocks)
Journals:                  1
Resource groups:           4094
Locking protocol:          "lock_nolock"
Lock table:                ""
UUID:                      aa55dafa-87d6-59ec-2fdf-728dd4197c99
0.01user 0.24system 0:00.87elapsed 29%CPU (0avgtext+0avgdata 2592maxresident)k
2248inputs+429456outputs (1major+816minor)pagefaults 0swaps
=== fsck.gfs2 0% full ===
Initializing fsck
Validating Resource Group index.
Level 1 rgrp check: Checking if all rgrp and rindex values are good.
(level 1 passed)
Starting pass1
pass1 completed in 0.031s
Starting pass1b
pass1b completed in 0.000s
Starting pass1c
pass1c completed in 0.000s
Starting pass2
pass2 completed in 1.309s
Starting pass3
pass3 completed in 0.000s
Starting pass4
pass4 completed in 0.000s
Starting pass5
pass5 completed in 1.457s
Starting check_statfs
check_statfs completed in 0.000s
gfs2_fsck complete
2.97user 0.23system 0:03.82elapsed 83%CPU (0avgtext+0avgdata 150064maxresident)k
432112inputs+272outputs (0major+43745minor)pagefaults 0swaps
=== mockup ===
Creating pool with 4 processes
4432439 files created
80% full
221.73user 2967.60system 21:12.31elapsed 250%CPU (0avgtext+0avgdata 83076maxresident)k
186296inputs+1646006424outputs (0major+4601262minor)pagefaults 0swaps
Filesystem                    Size  Used Avail Use% Mounted on
/dev/mapper/fsck-perf-nodata  1.0T  820G  205G  81% /mnt/perf
Filesystem                     Inodes   IUsed    IFree IUse% Mounted on
/dev/mapper/fsck-perf-nodata 58023007 4447075 53575932    8% /mnt/perf
=== fsck.gfs2 80% full ===
Initializing fsck
Validating Resource Group index.
Level 1 rgrp check: Checking if all rgrp and rindex values are good.
(level 1 passed)
Starting pass1
pass1 completed in 51m53.889s
Starting pass1b
pass1b completed in 0.000s
Starting pass1c
pass1c completed in 1h25m1.088s
Starting pass2
pass2 completed in 11.174s
Starting pass3
pass3 completed in 0.004s
Starting pass4
pass4 completed in 0.224s
Starting pass5
pass5 completed in 1.461s
Starting check_statfs
check_statfs completed in 0.000s
gfs2_fsck complete
179.73user 225.50system 2:17:17elapsed 4%CPU (0avgtext+0avgdata 566280maxresident)k
411350688inputs+272outputs (2major+301511minor)pagefaults 0swaps


Expected results:


Additional info:

Comment 2 Robert Peterson 2015-08-27 14:04:51 UTC

Pass1c is responsible for analyzing extended attributes, and
it works off a linked list of items built in pass1. I see no
reason why we can't eliminate the linked list and have pass1
do the analysis while it's checking each dinode. It seems logical
to assume that it was moved to follow pass1b (duplicate reference
processing) for a reason, so we need to determine if that's still
necessary, given today's improved pass1 logic.

Comment 4 Robert Peterson 2015-09-08 15:33:13 UTC

I'm going to investigate whether we can eliminate pass1c altogether.
Between pass1 and pass1b, I think we do almost everything necessary
from pass1c with regard to extended attributes.

I plan to do a detailed examination of pass1c to see if it does
anything important, and if so, whether that can be safely moved
to pass1.

Reassigning to myself.

Comment 5 Robert Peterson 2015-09-09 16:19:55 UTC

I've done a thorough examination of pass1c. In my opinion, there's
no reason not to perform its checks inside of pass1. That saves
us from building the linked list, chewing up all that memory and
time to make another pass through the file system.

I've coded a patch that completely eliminates pass1c from fsck
and moves its checks to the appropriate places in pass1c.c.
I'm testing it now against my latest collection of metadata.

Comment 6 Robert Peterson 2015-09-09 16:21:22 UTC

Created attachment 1071846 [details]
Early prototype #1

This prototype eliminates pass1c in favor of simple checks
added to pass1.

Comment 7 Robert Peterson 2015-09-17 14:48:40 UTC

I'm making good progress, but this turns out to be a bigger project
than I anticipated. I'm finding some nasty bugs in fsck.gfs2 related
to extended attributes and how we deal with them. I'm testing a new
prototype now. I'll attach it when it gets further along in testing.

Comment 8 Robert Peterson 2015-09-28 15:08:29 UTC

Created attachment 1077949 [details]
Set of 20 RHEL7 patches - posted and tested

This tarball contains a set of 20 patches that I posted today:

0000-cover-letter.patch
0001-libgfs2-Check-block-range-when-inserting-into-rgrp-t.patch
0002-libgfs2-Check-rgd-bits-before-referencing-it.patch
0003-fsck.gfs2-Add-check-for-gfs1-invalid-inode-refs-in-d.patch
0004-fsck.gfs2-Make-debug-messages-more-succinct-wrt-exte.patch
0005-fsck.gfs2-Break-up-funtion-handle_dup_blk.patch
0006-fsck.gfs2-Only-preserve-the-_first_-acceptable-inode.patch
0007-fsck.gfs2-Don-t-just-assume-the-remaining-EA-referen.patch
0008-fsck.gfs2-Don-t-delete-inode-for-duplicate-reference.patch
0009-fsck.gfs2-Don-t-traverse-EAs-that-belong-to-another-.patch
0010-fsck.gfs2-Refactor-function-check_indirect_eattr.patch
0011-fsck.gfs2-Once-an-indirect-ea-error-is-found-flag-al.patch
0012-fsck.gfs2-Always-restore-saved-value-for-di_eattr.patch
0013-fsck.gfs2-Remove-redundancy-in-add_duplicate_ref.patch
0014-fsck.gfs2-Don-t-remove-duplicate-eattr-blocks.patch
0015-fsck.gfs2-Refactor-check_eattr_entries-and-add-error.patch
0016-fsck.gfs2-remove-bad-EAs-at-the-end-not-as-you-go.patch
0017-fsck.gfs2-Combine-remove_inode_eattr-with-its-only-c.patch
0018-fsck.gfs2-Print-debug-message-to-dilineate-metadata-.patch
0019-fsck.gfs2-Remove-pass1c-in-favor-of-processing-in-pa.patch
0020-fsck.gfs2-Clone-duplicate-data-block-pointers.patch

Comment 9 Robert Peterson 2015-10-06 15:27:23 UTC

Bumping to 7.3.

Comment 10 Robert Peterson 2015-10-14 13:22:34 UTC

Changing status to POST. These patches were tested against my entire
metadata collection. They were posted to cluster-devel, and then
pushed to the gfs2-utils master branch. Our current plan is to
rebase RHEL7.3 from master, as per a discussion I had with Andy Price
this morning. Therefore I'm setting this to POST and we'll make it
dependent on the rebase bz.

Comment 12 Mike McCune 2016-03-28 23:30:47 UTC

This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 13 Andrew Price 2016-06-07 15:32:15 UTC

https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=498444

Comment 15 Nate Straz 2016-08-10 15:19:51 UTC

pass1c was removed and testing on gfs2-utils-3.1.9-3.el7.x86_64 shows that pass1 is scaling nearly linear from 256GB to 8TB.

256G fsck80: elapsed =  0:29:24.42, pass1 =   1760, pass1b = 0, pass2 =   2, pass3 = 0, pass4 = 0, maxresident = 0.02GB
512G fsck80: elapsed =  0:56:01.83, pass1 =   3353, pass1b = 0, pass2 =   4, pass3 = 0, pass4 = 0, maxresident = 0.04GB
  1T fsck80: elapsed =  2:31:13.00, pass1 =   9055, pass1b = 0, pass2 =  11, pass3 = 0, pass4 = 0, maxresident = 0.09GB
  2T fsck80: elapsed =  6:25:06.00, pass1 =  23068, pass1b = 0, pass2 =  22, pass3 = 0, pass4 = 1, maxresident = 0.18GB
  4T fsck80: elapsed = 14:52:42.00, pass1 =  53481, pass1b = 0, pass2 =  53, pass3 = 0, pass4 = 2, maxresident = 0.35GB
  8T fsck80: elapsed = 29:20:49.00, pass1 = 105445, pass1b = 0, pass2 = 169, pass3 = 0, pass4 = 5, maxresident = 0.68GB

Comment 17 errata-xmlrpc 2016-11-04 06:30:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2438.html

Note You need to log in before you can comment on or make changes to this bug.