This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours
Bug 683155 - gfs2: creating large files suddenly slow to a crawl
gfs2: creating large files suddenly slow to a crawl
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.7
All Linux
urgent Severity high
: rc
: ---
Assigned To: Robert Peterson
Cluster QE
: ZStream
: 704046 (view as bug list)
Depends On:
Blocks: 690237 690239
  Show dependency treegraph
 
Reported: 2011-03-08 12:23 EST by Robert Peterson
Modified: 2013-10-03 20:27 EDT (History)
16 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
GFS2 (Global File System 2) keeps track of the list of resource groups to allow better performance when allocating blocks. Previously, when the user created a large file in GFS2, GFS2 could have run out of allocation space because it was confined to the recently-used resource groups. With this update, GFS2 uses the MRU (Most Recently Used) list instead of the list of the recently-used resource groups. The MRU list allows GFS2 to use all available resource groups and if a large span of blocks is in use, GFS2 uses allocation blocks of another resource group.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-07-21 05:57:31 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
Proposed patch (5.85 KB, patch)
2011-03-08 13:02 EST, Robert Peterson
no flags Details | Diff

  None (edit)
Description Robert Peterson 2011-03-08 12:23:28 EST
Description of problem:
When you create a large file in gfs2, it can suddenly slow
down to a crawl.  Performance can drop dramatically and not
recover.

Version-Release number of selected component (if applicable):
RHEL5.x

How reproducible:
Always

Steps to Reproduce:
1. lvcreate -l 500G 
2. mkfs.gfs2 -O -t bobs_roth:roth_lv -p lock_dlm -j 16 /dev/roth_vg/roth_lv
3. mount -tgfs2 /dev/roth_vg/roth_lv /mnt/gfs2
4. dd if=/dev/zero of=/mnt/gfs2/zeroes bs=1M count=512000
  
Actual results:
When the file hits 32G, it will stop making progress.

Expected results:
The file should continue to grow.

Additional info:
This upstream patch fixes the problem:
http://git.kernel.org/?p=linux/kernel/git/steve/gfs2-2.6-nmw.git;a=commitdiff;h=9cabcdbd4638cf884839ee4cd15780800c223b90
Comment 1 Robert Peterson 2011-03-08 12:33:11 EST
Requesting ack flags for 5.7.
Comment 2 RHEL Product and Program Management 2011-03-08 12:40:35 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 3 Robert Peterson 2011-03-08 13:02:56 EST
Created attachment 482968 [details]
Proposed patch

Here is the initial RHEL5 patch for this issue.
Comment 4 Robert Peterson 2011-03-08 16:19:23 EST
I built and (quickly) tested a kernel that contains this patch.
The kernel is: 2.6.18-247.el5.bz683155.x86_64.rpm
I uploaded it to my people page:
http://people.redhat.com/rpeterso/Experimental/RHEL5.x/gfs2/*247*

It contains these patches
gfs2: filesystem sporadic slow performance problem with large files [683155]
gfs2: avoid hangs while reclaiming unlinked metadata                [656032]
gfs2: release rgrp glocks properly on unlink error                  [656032]
gfs2: auto-tune glock hold time                                     [650494]
make queue_delayed_work execute immediately if delay==0             [650494]
Comment 6 Steve Whitehouse 2011-03-09 06:57:58 EST
I have confirmed that this doesn't occur in RHEL6 or upstream so is specific to RHEL5.
Comment 7 Robert Peterson 2011-03-10 11:01:56 EST
I posted this patch to rhkernel-list for review.  It was tested
on my roth cluster and received positive customer feedback.
Changing status to POST.
Comment 10 Jarod Wilson 2011-03-16 14:02:45 EDT
Patch(es) available in kernel-2.6.18-248.el5
Detailed testing feedback is always welcomed.
Comment 22 Alan Brown 2011-05-13 17:47:41 EDT
How's progress for this going into the release kernels?
Comment 23 Steve Whitehouse 2011-05-16 05:12:47 EDT
It is scheduled for 5.7
Comment 24 Steve Whitehouse 2011-05-16 06:30:49 EDT
*** Bug 704046 has been marked as a duplicate of this bug. ***
Comment 25 Alan Brown 2011-05-17 14:39:07 EDT
Can we have a hotfix kernel for evaluation please?
Comment 26 Robert Peterson 2011-05-17 15:24:59 EDT
Hi Alan,

Feel free to try the kernel rpms located on my people page:

http://people.redhat.com/rpeterso/Experimental/RHEL5.x/gfs2/kernel*

This kernel has not undergone Red Hat's Quality testing process,
so there are no guarantees, but it contains all the latest GFS2
fixes.
Comment 27 Bryn M. Reeves 2011-05-18 07:03:58 EDT
Please note that as Bob mentions, this is a test kernel rather than a hotfix (hotfixes can only be issues once we have verified that a change addresses the reported problem; we can issue a test kernel via a support case and any supported hotfix also needs to be requested/provided via the support tickets also.

I don't see a support case linked to this bug for you - please could you file one (or update the existing one if there is one already and for some reason it has not been linked to this bug - just a very brief note referencing bug 683155)?

Thanks,
Comment 28 Alan Brown 2011-05-18 07:13:18 EDT
Bob, 

Does this follow on from the -256 kernel in
http://people.redhat.com/jwilson/el5/256.el5/ or is it a parallel development
stream?

(The -256 kernel has a kdump fix we need, as well as possibly incorporating
fixes for bz #666080 (checking on that at the moment))


Bryn,

I've noted these BZs in tx 00333468 as I suspect they have a bearing on the hangs we're seeing (but not necessarily the slow read performance - which appears to be directory structure related)

Thanks
AB
Comment 29 Alan Brown 2011-05-18 14:05:56 EDT
Bryn: see also Tx 00335837 - identical symptoms to the start of this BZ
Comment 30 Nate Straz 2011-05-26 08:57:03 EDT
Checked the file size and calculated I/O rate while streaming a large file:

2.6.18-238.el5
filename                bytes/second    file size
/mnt/dash0/filler       68036061.866667 31331344384
/mnt/dash0/filler       68036061.866667 33372426240
/mnt/dash0/filler       3697322.666667  33483345920
/mnt/dash0/filler       695773.866667   33504219136
/mnt/dash0/filler       832443.733333   33529192448
/mnt/dash0/filler       834901.333333   33554239488


2.6.18-262.el5
/mnt/dash0/filler         74798421      30412554240
/mnt/dash0/filler         71687373      32563175424
/mnt/dash0/filler         68123853      34606891008
/mnt/dash0/filler         73819750      36821483520
/mnt/dash0/filler         70883738      38947995648
/mnt/dash0/filler         70185916      41053573120
/mnt/dash0/filler         73120700      43247194112
...
/mnt/dash0/filler         67897071      520676028416
/mnt/dash0/filler         68036062      522717110272
500000+0 records in
500000+0 records out
Comment 31 Alan Brown 2011-05-26 09:09:32 EDT
I see similar results on -262
Comment 32 Martin Prpic 2011-06-02 09:32:31 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
GFS2 (Global File System 2) keeps track of the list of resource groups to allow better performance when allocating blocks. Previously, when the user created a large file in GFS2, GFS2 could have run out of allocation space because it was confined to the recently-used resource groups. With this update, GFS2 uses the MRU (Most Recently Used) list instead of the list of the recently-used resource groups. The MRU list allows GFS2 to use all available resource groups and if a large span of blocks is in use, GFS2 uses allocation blocks of another resource group.
Comment 33 errata-xmlrpc 2011-07-21 05:57:31 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1065.html
Comment 34 Red Hat Bugzilla 2013-10-03 20:26:53 EDT
Removing external tracker bug with the id 'DOC-65879' as it is not valid for this tracker
Comment 35 Red Hat Bugzilla 2013-10-03 20:27:01 EDT
Removing external tracker bug with the id 'DOC-61679' as it is not valid for this tracker

Note You need to log in before you can comment on or make changes to this bug.