Red Hat Bugzilla – Bug 683155
gfs2: creating large files suddenly slow to a crawl
Last modified: 2013-10-03 20:27:01 EDT
Description of problem:
When you create a large file in gfs2, it can suddenly slow
down to a crawl. Performance can drop dramatically and not
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. lvcreate -l 500G
2. mkfs.gfs2 -O -t bobs_roth:roth_lv -p lock_dlm -j 16 /dev/roth_vg/roth_lv
3. mount -tgfs2 /dev/roth_vg/roth_lv /mnt/gfs2
4. dd if=/dev/zero of=/mnt/gfs2/zeroes bs=1M count=512000
When the file hits 32G, it will stop making progress.
The file should continue to grow.
This upstream patch fixes the problem:
Requesting ack flags for 5.7.
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
Created attachment 482968 [details]
Here is the initial RHEL5 patch for this issue.
I built and (quickly) tested a kernel that contains this patch.
The kernel is: 2.6.18-247.el5.bz683155.x86_64.rpm
I uploaded it to my people page:
It contains these patches
gfs2: filesystem sporadic slow performance problem with large files 
gfs2: avoid hangs while reclaiming unlinked metadata 
gfs2: release rgrp glocks properly on unlink error 
gfs2: auto-tune glock hold time 
make queue_delayed_work execute immediately if delay==0 
I have confirmed that this doesn't occur in RHEL6 or upstream so is specific to RHEL5.
I posted this patch to rhkernel-list for review. It was tested
on my roth cluster and received positive customer feedback.
Changing status to POST.
Patch(es) available in kernel-2.6.18-248.el5
Detailed testing feedback is always welcomed.
How's progress for this going into the release kernels?
It is scheduled for 5.7
*** Bug 704046 has been marked as a duplicate of this bug. ***
Can we have a hotfix kernel for evaluation please?
Feel free to try the kernel rpms located on my people page:
This kernel has not undergone Red Hat's Quality testing process,
so there are no guarantees, but it contains all the latest GFS2
Please note that as Bob mentions, this is a test kernel rather than a hotfix (hotfixes can only be issues once we have verified that a change addresses the reported problem; we can issue a test kernel via a support case and any supported hotfix also needs to be requested/provided via the support tickets also.
I don't see a support case linked to this bug for you - please could you file one (or update the existing one if there is one already and for some reason it has not been linked to this bug - just a very brief note referencing bug 683155)?
Does this follow on from the -256 kernel in
http://people.redhat.com/jwilson/el5/256.el5/ or is it a parallel development
(The -256 kernel has a kdump fix we need, as well as possibly incorporating
fixes for bz #666080 (checking on that at the moment))
I've noted these BZs in tx 00333468 as I suspect they have a bearing on the hangs we're seeing (but not necessarily the slow read performance - which appears to be directory structure related)
Bryn: see also Tx 00335837 - identical symptoms to the start of this BZ
Checked the file size and calculated I/O rate while streaming a large file:
filename bytes/second file size
/mnt/dash0/filler 68036061.866667 31331344384
/mnt/dash0/filler 68036061.866667 33372426240
/mnt/dash0/filler 3697322.666667 33483345920
/mnt/dash0/filler 695773.866667 33504219136
/mnt/dash0/filler 832443.733333 33529192448
/mnt/dash0/filler 834901.333333 33554239488
/mnt/dash0/filler 74798421 30412554240
/mnt/dash0/filler 71687373 32563175424
/mnt/dash0/filler 68123853 34606891008
/mnt/dash0/filler 73819750 36821483520
/mnt/dash0/filler 70883738 38947995648
/mnt/dash0/filler 70185916 41053573120
/mnt/dash0/filler 73120700 43247194112
/mnt/dash0/filler 67897071 520676028416
/mnt/dash0/filler 68036062 522717110272
500000+0 records in
500000+0 records out
I see similar results on -262
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
GFS2 (Global File System 2) keeps track of the list of resource groups to allow better performance when allocating blocks. Previously, when the user created a large file in GFS2, GFS2 could have run out of allocation space because it was confined to the recently-used resource groups. With this update, GFS2 uses the MRU (Most Recently Used) list instead of the list of the recently-used resource groups. The MRU list allows GFS2 to use all available resource groups and if a large span of blocks is in use, GFS2 uses allocation blocks of another resource group.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.
Removing external tracker bug with the id 'DOC-65879' as it is not valid for this tracker
Removing external tracker bug with the id 'DOC-61679' as it is not valid for this tracker