Bug 1257625 - fsck.gfs2 pass1c time scalability issue
fsck.gfs2 pass1c time scalability issue
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: gfs2-utils (Show other bugs)
7.2
Unspecified Unspecified
medium Severity unspecified
: rc
: ---
Assigned To: Robert Peterson
cluster-qe@redhat.com
:
Depends On: 1271674
Blocks: 1111393 1497636
  Show dependency treegraph
 
Reported: 2015-08-27 09:31 EDT by Nate Straz
Modified: 2017-10-02 05:59 EDT (History)
4 users (show)

See Also:
Fixed In Version: gfs2-utils-3.1.9-1.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-11-04 02:30:11 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Early prototype #1 (10.39 KB, patch)
2015-09-09 12:21 EDT, Robert Peterson
no flags Details | Diff
Set of 20 RHEL7 patches - posted and tested (20.71 KB, application/octet-stream)
2015-09-28 11:08 EDT, Robert Peterson
no flags Details

  None (edit)
Description Nate Straz 2015-08-27 09:31:08 EDT
Description of problem:

While testing 80% file systems I found that fsck.gfs2 pass1c time increased dramatically between 512GB and 1TB file system sizes.

128GB pass1c 1.184s
256GB pass1c 2.513s
512GB pass1c 5.510s
1TB pass1c 1h25m1.088s
2TB pass1c 4h50m5.258s


Version-Release number of selected component (if applicable):
gfs2-utils-3.1.8-4.el7.x86_64

How reproducible:
Easily

Steps to Reproduce:
1. create 1TB GFS2 file system, lock_nolock is okay
2. mockup -p 4 -F 80 -a fill -n 5000000 /mnt/gfs2
3. umount /mnt/gfs2
4. fsck.gfs2 /dev/foo

Actual results:
  Using default stripesize 64.00 KiB.
  Rounding size (262144 extents) up to stripe boundary size (262146 extents).
  Logical volume "perf" created.
=== mkfs.gfs2 1T ===
/dev/fsck/perf is a symbolic link to /dev/dm-16
This will destroy any data on /dev/dm-16
Device:                    /dev/fsck/perf
Block size:                4096
Device size:               1024.01 GB (268437504 blocks)
Filesystem size:           1024.01 GB (268437428 blocks)
Journals:                  1
Resource groups:           4094
Locking protocol:          "lock_nolock"
Lock table:                ""
UUID:                      aa55dafa-87d6-59ec-2fdf-728dd4197c99
0.01user 0.24system 0:00.87elapsed 29%CPU (0avgtext+0avgdata 2592maxresident)k
2248inputs+429456outputs (1major+816minor)pagefaults 0swaps
=== fsck.gfs2 0% full ===
Initializing fsck
Validating Resource Group index.
Level 1 rgrp check: Checking if all rgrp and rindex values are good.
(level 1 passed)
Starting pass1
pass1 completed in 0.031s
Starting pass1b
pass1b completed in 0.000s
Starting pass1c
pass1c completed in 0.000s
Starting pass2
pass2 completed in 1.309s
Starting pass3
pass3 completed in 0.000s
Starting pass4
pass4 completed in 0.000s
Starting pass5
pass5 completed in 1.457s
Starting check_statfs
check_statfs completed in 0.000s
gfs2_fsck complete
2.97user 0.23system 0:03.82elapsed 83%CPU (0avgtext+0avgdata 150064maxresident)k
432112inputs+272outputs (0major+43745minor)pagefaults 0swaps
=== mockup ===
Creating pool with 4 processes
4432439 files created
80% full
221.73user 2967.60system 21:12.31elapsed 250%CPU (0avgtext+0avgdata 83076maxresident)k
186296inputs+1646006424outputs (0major+4601262minor)pagefaults 0swaps
Filesystem                    Size  Used Avail Use% Mounted on
/dev/mapper/fsck-perf-nodata  1.0T  820G  205G  81% /mnt/perf
Filesystem                     Inodes   IUsed    IFree IUse% Mounted on
/dev/mapper/fsck-perf-nodata 58023007 4447075 53575932    8% /mnt/perf
=== fsck.gfs2 80% full ===
Initializing fsck
Validating Resource Group index.
Level 1 rgrp check: Checking if all rgrp and rindex values are good.
(level 1 passed)
Starting pass1
pass1 completed in 51m53.889s
Starting pass1b
pass1b completed in 0.000s
Starting pass1c
pass1c completed in 1h25m1.088s
Starting pass2
pass2 completed in 11.174s
Starting pass3
pass3 completed in 0.004s
Starting pass4
pass4 completed in 0.224s
Starting pass5
pass5 completed in 1.461s
Starting check_statfs
check_statfs completed in 0.000s
gfs2_fsck complete
179.73user 225.50system 2:17:17elapsed 4%CPU (0avgtext+0avgdata 566280maxresident)k
411350688inputs+272outputs (2major+301511minor)pagefaults 0swaps


Expected results:


Additional info:
Comment 2 Robert Peterson 2015-08-27 10:04:51 EDT
Pass1c is responsible for analyzing extended attributes, and
it works off a linked list of items built in pass1. I see no
reason why we can't eliminate the linked list and have pass1
do the analysis while it's checking each dinode. It seems logical
to assume that it was moved to follow pass1b (duplicate reference
processing) for a reason, so we need to determine if that's still
necessary, given today's improved pass1 logic.
Comment 4 Robert Peterson 2015-09-08 11:33:13 EDT
I'm going to investigate whether we can eliminate pass1c altogether.
Between pass1 and pass1b, I think we do almost everything necessary
from pass1c with regard to extended attributes.

I plan to do a detailed examination of pass1c to see if it does
anything important, and if so, whether that can be safely moved
to pass1.

Reassigning to myself.
Comment 5 Robert Peterson 2015-09-09 12:19:55 EDT
I've done a thorough examination of pass1c. In my opinion, there's
no reason not to perform its checks inside of pass1. That saves
us from building the linked list, chewing up all that memory and
time to make another pass through the file system.

I've coded a patch that completely eliminates pass1c from fsck
and moves its checks to the appropriate places in pass1c.c.
I'm testing it now against my latest collection of metadata.
Comment 6 Robert Peterson 2015-09-09 12:21:22 EDT
Created attachment 1071846 [details]
Early prototype #1

This prototype eliminates pass1c in favor of simple checks
added to pass1.
Comment 7 Robert Peterson 2015-09-17 10:48:40 EDT
I'm making good progress, but this turns out to be a bigger project
than I anticipated. I'm finding some nasty bugs in fsck.gfs2 related
to extended attributes and how we deal with them. I'm testing a new
prototype now. I'll attach it when it gets further along in testing.
Comment 8 Robert Peterson 2015-09-28 11:08 EDT
Created attachment 1077949 [details]
Set of 20 RHEL7 patches - posted and tested

This tarball contains a set of 20 patches that I posted today:

0000-cover-letter.patch
0001-libgfs2-Check-block-range-when-inserting-into-rgrp-t.patch
0002-libgfs2-Check-rgd-bits-before-referencing-it.patch
0003-fsck.gfs2-Add-check-for-gfs1-invalid-inode-refs-in-d.patch
0004-fsck.gfs2-Make-debug-messages-more-succinct-wrt-exte.patch
0005-fsck.gfs2-Break-up-funtion-handle_dup_blk.patch
0006-fsck.gfs2-Only-preserve-the-_first_-acceptable-inode.patch
0007-fsck.gfs2-Don-t-just-assume-the-remaining-EA-referen.patch
0008-fsck.gfs2-Don-t-delete-inode-for-duplicate-reference.patch
0009-fsck.gfs2-Don-t-traverse-EAs-that-belong-to-another-.patch
0010-fsck.gfs2-Refactor-function-check_indirect_eattr.patch
0011-fsck.gfs2-Once-an-indirect-ea-error-is-found-flag-al.patch
0012-fsck.gfs2-Always-restore-saved-value-for-di_eattr.patch
0013-fsck.gfs2-Remove-redundancy-in-add_duplicate_ref.patch
0014-fsck.gfs2-Don-t-remove-duplicate-eattr-blocks.patch
0015-fsck.gfs2-Refactor-check_eattr_entries-and-add-error.patch
0016-fsck.gfs2-remove-bad-EAs-at-the-end-not-as-you-go.patch
0017-fsck.gfs2-Combine-remove_inode_eattr-with-its-only-c.patch
0018-fsck.gfs2-Print-debug-message-to-dilineate-metadata-.patch
0019-fsck.gfs2-Remove-pass1c-in-favor-of-processing-in-pa.patch
0020-fsck.gfs2-Clone-duplicate-data-block-pointers.patch
Comment 9 Robert Peterson 2015-10-06 11:27:23 EDT
Bumping to 7.3.
Comment 10 Robert Peterson 2015-10-14 09:22:34 EDT
Changing status to POST. These patches were tested against my entire
metadata collection. They were posted to cluster-devel, and then
pushed to the gfs2-utils master branch. Our current plan is to
rebase RHEL7.3 from master, as per a discussion I had with Andy Price
this morning. Therefore I'm setting this to POST and we'll make it
dependent on the rebase bz.
Comment 12 Mike McCune 2016-03-28 19:30:47 EDT
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune@redhat.com with any questions
Comment 15 Nate Straz 2016-08-10 11:19:51 EDT
pass1c was removed and testing on gfs2-utils-3.1.9-3.el7.x86_64 shows that pass1 is scaling nearly linear from 256GB to 8TB.

256G fsck80: elapsed =  0:29:24.42, pass1 =   1760, pass1b = 0, pass2 =   2, pass3 = 0, pass4 = 0, maxresident = 0.02GB
512G fsck80: elapsed =  0:56:01.83, pass1 =   3353, pass1b = 0, pass2 =   4, pass3 = 0, pass4 = 0, maxresident = 0.04GB
  1T fsck80: elapsed =  2:31:13.00, pass1 =   9055, pass1b = 0, pass2 =  11, pass3 = 0, pass4 = 0, maxresident = 0.09GB
  2T fsck80: elapsed =  6:25:06.00, pass1 =  23068, pass1b = 0, pass2 =  22, pass3 = 0, pass4 = 1, maxresident = 0.18GB
  4T fsck80: elapsed = 14:52:42.00, pass1 =  53481, pass1b = 0, pass2 =  53, pass3 = 0, pass4 = 2, maxresident = 0.35GB
  8T fsck80: elapsed = 29:20:49.00, pass1 = 105445, pass1b = 0, pass2 = 169, pass3 = 0, pass4 = 5, maxresident = 0.68GB
Comment 17 errata-xmlrpc 2016-11-04 02:30:11 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2438.html

Note You need to log in before you can comment on or make changes to this bug.