234627 – Speed up gfs2_fsck

Bug 234627 - Speed up gfs2_fsck

Summary: Speed up gfs2_fsck

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	gfs2-utils
Sub Component:
Version:	5.1
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Robert Peterson
QA Contact:	GFS Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-03-30 17:57 UTC by Robert Peterson
Modified:	2010-01-12 03:37 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-07-24 16:28:55 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
First go at a fix (5.31 KB, patch) 2007-03-30 18:13 UTC, Robert Peterson	no flags	Details \| Diff
First go at a fix (5.31 KB, patch) 2007-03-30 18:16 UTC, Robert Peterson	no flags	Details \| Diff
View All

Description Robert Peterson 2007-03-30 17:57:28 UTC

Description of problem:
I was doing some testing for bug 223893:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=223893
and even I was surprised at how long it took to run gfs2_fsck on
an 8G partition.  Now granted, because I was unit testing, I had
gdb debugging enabled, which slowed it down from its normal time.

Just for the fun of it, I profiled gfs2_fsck with gprof to see where
it was spending its time.  I was surprised to find that most of its 
cpu time was spent doing bitmap manipulation.

Digging into the code, I discovered that it was using some extremely
inefficient math in the bitmap manipulation routines.

Just for the heck of it, I rewrote a couple of functions to be longer
and more complex, but more cpu efficient.  For example, rather than
doing 64-bit divides and 64-bit modulus, I used masking and shifting.
In doing so, I greatly improved the efficiency of these functions.

Then, when doing various testing, I discovered that the Makefile for
libgfs2 accidentally still had debugging turned on for our RHEL5 gfs2
tools.  So that should get turned off.

Version-Release number of selected component (if applicable):
RHEL5

How reproducible:
Always

Steps to Reproduce:
1. gfs2_fsck
  
Actual results:
Very slow

Expected results:
A tiny bit less slow.

Additional info:

I have three ideas for improving gfs2_fsck at the moment, but I'm
sure there will be more:

1. Change the libgfs2 Makefile so it uses -O2 rather than -ggdb
2. Change the bitmap.c function to do more efficient math
3. Maybe change the libgfs2 buffering scheme so it finds block buffers
   more efficiently.  Perhaps using a btree rather than a linked list.
   (Just something to experiment with).

Comment 1 Robert Peterson 2007-03-30 18:13:52 UTC

Created attachment 151303 [details]
First go at a fix

This patch probably needs a little cleanup: I have long horrid comments
in there explaining the math reduction (mostly for my own sanity).

Comment 2 Robert Peterson 2007-03-30 18:16:35 UTC

Created attachment 151304 [details]
First go at a fix

This patch probably needs a little cleanup: I have long horrid comments
in there explaining the math reduction (mostly for my own sanity).

Comment 3 Robert Peterson 2007-03-30 18:26:23 UTC

Here's some gprof profiling data for gfs2_fsck:

Flat profile excerpts:

Each sample counts as 0.01 seconds.

Without the patch: --------------------------------------------------------

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 31.57      1.19     1.19                             gfs2_bitmap_get
 27.06      2.21     1.02                             __udivdi3
 13.53      2.72     0.51                             __umoddi3
  2.92      2.83     0.11                             bget_generic
  2.65      2.93     0.10   139301     0.00     0.00  inode_hash_search
  2.65      3.03     0.10                             gfs2_bitmap_set
  2.39      3.12     0.09                             gfs2_block_check
  2.12      3.20     0.08                             bfind
  1.99      3.28     0.08    26594     0.00     0.00  build_and_check_metalist
  1.86      3.35     0.07                             gfs2_bitmap_clear
  1.06      3.39     0.04      160     0.25     0.34  check_block_status
  0.80      3.42     0.03  4247564     0.00     0.00  warm_fuzzy_stuff
  0.80      3.45     0.03                             gfs2_block_mark
  0.80      3.48     0.03                             gfs2_disk_hash
  0.80      3.51     0.03                             gfs2_find_next_block_type
  0.80      3.54     0.03                             __divdi3

With the patch: --------------------------------------------------------

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 30.48      0.57     0.57                             gfs2_bitmap_get
 11.23      0.78     0.21                             bget_generic
  7.49      0.92     0.14    26594     0.01     0.01  build_and_check_metalist
  5.88      1.03     0.11   139301     0.00     0.00  inode_hash_search
  5.35      1.13     0.10                             gfs2_block_check
  4.81      1.22     0.09                             gfs2_disk_hash
  4.28      1.30     0.08                             gfs2_bitmap_set
  3.74      1.37     0.07                             gfs2_bitmap_clear
  3.21      1.43     0.06   217620     0.00     0.00  hash_more_internal
  2.14      1.47     0.04                             print_fsck_log
  1.60      1.50     0.03                             add_buffer
  1.60      1.53     0.03                             bfind
  1.60      1.56     0.03                             gfs2_bitfit
  1.60      1.59     0.03                             gfs2_block_set
  1.60      1.62     0.03                             gfs2_find_next_block_type
  1.34      1.65     0.03  1880692     0.00     0.00  check_data
  1.07      1.67     0.02    83750     0.00     0.00  check_dentry
  1.07      1.69     0.02    55551     0.00     0.00  handle_di
  1.07      1.71     0.02      160     0.13     0.22  check_block_status
  1.07      1.73     0.02                             gfs2_check_meta

Call graph excerpts:

granularity: each sample hit covers 2 byte(s) for 0.27% of 3.77 seconds
(Note: This is only the top 4 in the list):
Without fix: -------------------------------------------------------

index % time    self  children    called     name
[1]     31.6    1.19    0.00                 gfs2_bitmap_get [1]
[2]     27.1    1.02    0.00                 __udivdi3 [2]
[3]     13.5    0.51    0.00                 __umoddi3 [3]
[4]      9.3    0.00    0.35                 main [4]

With the fix: -------------------------------------------------------
[1]     30.5    0.57    0.00                 gfs2_bitmap_get [1]
[2]     25.1    0.00    0.47                 main [2]
                0.00    0.29       1/1           pass1 [3]
                0.01    0.12       1/1           pass2 [10]
                0.00    0.03       1/1           pass5 [27]
                0.00    0.01       1/1           pass4 [44]
                0.00    0.01       1/1           destroy [41]
                0.00    0.00       1/1           pass3 [56]
                0.00    0.00       1/1           read_cmdline [74]
                0.00    0.00       1/1           initialize [71]
                0.00    0.00       1/1           pass1b [72]
                0.00    0.00       1/1           pass1c [73]
                0.00    0.29       1/1           main [2]
[3]     15.3    0.00    0.29       1         pass1 [3]
                0.01    0.28   55551/55551       scan_meta [4]
                0.00    0.00   55551/4247564     warm_fuzzy_stuff [38]
                0.01    0.28   55551/55551       pass1 [3]
[4]     15.3    0.01    0.28   55551         scan_meta [4]
                0.02    0.26   55551/55551       handle_di [5]

As you can see from the data, this change is saving lots of cpu time.
However, with gfs2_fsck, the bottleneck is probably more in IO rather
than cpu time.  So I don't know if this will make a different in real
run time.  Also, the above profile was taken before I fixed some
things in the patch that made it run slower.

Comment 4 Robert Peterson 2007-03-30 18:43:19 UTC

Okay, here are some real numbers:

Without the fix:
real    2m20.487s
user    0m6.144s
sys     0m1.464s

With the fix:
real    2m18.141s
user    0m3.356s
sys     0m1.476s

So the cpu time was almost cut in half, but it only saved 2 seconds
of "real time".  This is approximately a one percent increase in 
speed for 8G.  If this was an 8-hour gfs_fsck, this would equate to 
saving about 5 minutes.

Comment 6 Steve Whitehouse 2008-07-09 14:46:21 UTC

devel & 5.3 ACKs please.

Comment 7 Robert Peterson 2008-07-09 18:29:00 UTC

Deferring to 5.4; there are some other things that would make gfs_fsck
more efficient.  For example, getting rid of the buf.c interface that
mimics kernel buffer_heads would likely improve performance by a lot.

Comment 8 Robert Peterson 2008-07-21 14:14:12 UTC

There are two major things we can do to improve the speed of gfs2_fsck.

(1) Reduce the memory requirements.  A huge part of the slowdown on many
    customer systems is the fact that it uses way too much memory.  That
    pushes the system into using swap space, which drags the system down
    to its knees.  I'm working on the memory issue.
(2) Make gfs2_fsck multi-threaded.  For a lot of its operation, gfs2_fsck
    is using 1% of the CPU, and mostly what it's doing is waiting for
    disk IO.  So if I make it create several threads, all working on
    separate Resource Groups simultaneously, maybe it can use the cpu(s)
    more effectively.

Comment 9 Robert Peterson 2008-07-24 16:28:55 UTC

I've decided to close this bug.  The userland bitmap speedup code
(attachments 1 and 2) has been incorporated into the fix for bug #404611.
I did this to try to get my fsck testing faster for that bug.  At one
point I had a 2TB file system that had been running benchp14 for days,
and I tried to run gfs2_fsck on it.  After days of no progress, problems
running out of memory (swapping) and CPU gridlocked at 100%, I killed it.
So I reduced the memory requirements, found a machine that had more
memory, and to reduce the thing from being CPU-bound, I implemented the
fix attached below.  It did help, but not enough for that particular
file system.  It managed to make it to 7% through pass 1 after running
overnight, but it was now page-bound rather than being CPU bound.
I ran the code on a smaller file system and my gfs2_fsck time went from
19 seconds down to 6 seconds with this fix.

This fix has never been shipped because there are larger issues with
speeding up gfs2_fsck.  With bug #404611, I decided it was better to
get something out in the field while I worked on the other issues.
The other issues will be addressed in bug #455300.

Note You need to log in before you can comment on or make changes to this bug.