Description of problem: When you create a large file in gfs2, it can suddenly slow down to a crawl. Performance can drop dramatically and not recover. Version-Release number of selected component (if applicable): RHEL5.x How reproducible: Always Steps to Reproduce: 1. lvcreate -l 500G 2. mkfs.gfs2 -O -t bobs_roth:roth_lv -p lock_dlm -j 16 /dev/roth_vg/roth_lv 3. mount -tgfs2 /dev/roth_vg/roth_lv /mnt/gfs2 4. dd if=/dev/zero of=/mnt/gfs2/zeroes bs=1M count=512000 Actual results: When the file hits 32G, it will stop making progress. Expected results: The file should continue to grow. Additional info: This upstream patch fixes the problem: http://git.kernel.org/?p=linux/kernel/git/steve/gfs2-2.6-nmw.git;a=commitdiff;h=9cabcdbd4638cf884839ee4cd15780800c223b90
Requesting ack flags for 5.7.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Created attachment 482968 [details] Proposed patch Here is the initial RHEL5 patch for this issue.
I built and (quickly) tested a kernel that contains this patch. The kernel is: 2.6.18-247.el5.bz683155.x86_64.rpm I uploaded it to my people page: http://people.redhat.com/rpeterso/Experimental/RHEL5.x/gfs2/*247* It contains these patches gfs2: filesystem sporadic slow performance problem with large files [683155] gfs2: avoid hangs while reclaiming unlinked metadata [656032] gfs2: release rgrp glocks properly on unlink error [656032] gfs2: auto-tune glock hold time [650494] make queue_delayed_work execute immediately if delay==0 [650494]
I have confirmed that this doesn't occur in RHEL6 or upstream so is specific to RHEL5.
I posted this patch to rhkernel-list for review. It was tested on my roth cluster and received positive customer feedback. Changing status to POST.
Patch(es) available in kernel-2.6.18-248.el5 Detailed testing feedback is always welcomed.
How's progress for this going into the release kernels?
It is scheduled for 5.7
*** Bug 704046 has been marked as a duplicate of this bug. ***
Can we have a hotfix kernel for evaluation please?
Hi Alan, Feel free to try the kernel rpms located on my people page: http://people.redhat.com/rpeterso/Experimental/RHEL5.x/gfs2/kernel* This kernel has not undergone Red Hat's Quality testing process, so there are no guarantees, but it contains all the latest GFS2 fixes.
Please note that as Bob mentions, this is a test kernel rather than a hotfix (hotfixes can only be issues once we have verified that a change addresses the reported problem; we can issue a test kernel via a support case and any supported hotfix also needs to be requested/provided via the support tickets also. I don't see a support case linked to this bug for you - please could you file one (or update the existing one if there is one already and for some reason it has not been linked to this bug - just a very brief note referencing bug 683155)? Thanks,
Bob, Does this follow on from the -256 kernel in http://people.redhat.com/jwilson/el5/256.el5/ or is it a parallel development stream? (The -256 kernel has a kdump fix we need, as well as possibly incorporating fixes for bz #666080 (checking on that at the moment)) Bryn, I've noted these BZs in tx 00333468 as I suspect they have a bearing on the hangs we're seeing (but not necessarily the slow read performance - which appears to be directory structure related) Thanks AB
Bryn: see also Tx 00335837 - identical symptoms to the start of this BZ
Checked the file size and calculated I/O rate while streaming a large file: 2.6.18-238.el5 filename bytes/second file size /mnt/dash0/filler 68036061.866667 31331344384 /mnt/dash0/filler 68036061.866667 33372426240 /mnt/dash0/filler 3697322.666667 33483345920 /mnt/dash0/filler 695773.866667 33504219136 /mnt/dash0/filler 832443.733333 33529192448 /mnt/dash0/filler 834901.333333 33554239488 2.6.18-262.el5 /mnt/dash0/filler 74798421 30412554240 /mnt/dash0/filler 71687373 32563175424 /mnt/dash0/filler 68123853 34606891008 /mnt/dash0/filler 73819750 36821483520 /mnt/dash0/filler 70883738 38947995648 /mnt/dash0/filler 70185916 41053573120 /mnt/dash0/filler 73120700 43247194112 ... /mnt/dash0/filler 67897071 520676028416 /mnt/dash0/filler 68036062 522717110272 500000+0 records in 500000+0 records out
I see similar results on -262
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: GFS2 (Global File System 2) keeps track of the list of resource groups to allow better performance when allocating blocks. Previously, when the user created a large file in GFS2, GFS2 could have run out of allocation space because it was confined to the recently-used resource groups. With this update, GFS2 uses the MRU (Most Recently Used) list instead of the list of the recently-used resource groups. The MRU list allows GFS2 to use all available resource groups and if a large span of blocks is in use, GFS2 uses allocation blocks of another resource group.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1065.html
Removing external tracker bug with the id 'DOC-65879' as it is not valid for this tracker
Removing external tracker bug with the id 'DOC-61679' as it is not valid for this tracker