RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 674286 - mmapping a read only file on a gfs2 filesystem incorrectly acquires an exclusive glock
Summary: mmapping a read only file on a gfs2 filesystem incorrectly acquires an exclus...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.1
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Steve Whitehouse
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On: 672724
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-02-01 09:53 UTC by Steve Whitehouse
Modified: 2011-05-19 12:01 UTC (History)
6 users (show)

Fixed In Version: kernel-2.6.32-117.el6
Doc Type: Bug Fix
Doc Text:
This bug is a performance issue relating to mmap of the same file from multiple nodes at once. The issue affects only the initial call to mmap and not subsequent page faults, so this will only be noticeable in cases where mmap is called frequently from multiple nodes on the same file. This occurs when running BLAST, for example. There is a workaround, which is to alter the application to always use O_NOATIME when opening the files to be mapped. This is only possible if the opening process is the file owner or is root. After applying this patch, the workaround is no longer required. The patch changes the tests applied at mmap time such that for noatime mounts, a glock will not be taken at all. For atime mounts, only a shared glock will be taken at mmap time, although if an atime update is required, an exclusive glock will still be required at a later time to write back the new atime.
Clone Of: 672724
Environment:
Last Closed: 2011-05-19 12:01:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Upstream patch (866 bytes, patch)
2011-02-01 10:34 UTC, Steve Whitehouse
no flags Details | Diff
RHEL6 post of the patch (866 bytes, patch)
2011-02-01 10:41 UTC, Steve Whitehouse
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0542 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 6.1 kernel security, bug fix and enhancement update 2011-05-19 11:58:07 UTC

Description Steve Whitehouse 2011-02-01 09:53:43 UTC
+++ This bug was initially created as a clone of Bug #672724 +++

Created attachment 475319 [details]
Program to demonstrate the problem.

Description of problem: When an application uses mmap to map in a file in a gfs2 filesystem in a read-only mode, it acquires an exclusive glock, even with noatime set on the filesystem.  This has a significant impact on the performance of subsequent invocations of the application if the same file is accessed on multiple nodes.  


Version-Release number of selected component (if applicable): 2.6.18-238.el5


How reproducible: Always


Steps to Reproduce:
1. Start with a file on a gfs2 filesystem that has no cached glocks
2. Run the attached application to map that file in read only
3.
  
Actual results: An exclusive glock will be created


Expected results: Only shared locks should be created

--- Additional comment from swhiteho on 2011-01-31 12:45:19 EST ---

I've tracked down what is going on here....

It is all down to the test used in the ->mmap() function which is supposed to skip the EX lock if there are no atime updates to be performed. The reason that the EX lock is being taken, is that there are a number of different ways in which the noatime state can be set: via the mount flags, via the O_NOATIME file flag and via the S_NOATIME flag (set on a per file basis via setattr)

The code checks only for O_NOATIME (which if set does prevent grabbing the EX lock) but the check is repeated later on in the VFS atime code, so that the actual atime updates are done correctly. Its only the locking that isn't quite correct.

So if you have access to the source code, there is a temporary workaround of opening the files to be mmaped with O_NOATIME. Note that this only happens on mmap() and not on page faults, so if the files are mmap()ed just once and then used many times, only the initial mmap call will require an EX lock. After that point all the locks will be PR (for read-only access, even if the file is mapped read/write).

That should allow you to get on with your BLAST runs. I'll try and get a patch sorted out for this as soon as I can.

--- Additional comment from scooter.edu on 2011-01-31 13:48:28 EST ---

Steve,
   Excellent news!!  We'll change BLAST right away and let you know the impact.  Since the loader uses mmap() quite heavily, we are still interested in a patched kernel.  This explains some symptoms that we had early on that we weren't able to explain (so we worked around them).

--- Additional comment from scooter.edu on 2011-01-31 17:05:17 EST ---

Steve.  It turns out the O_NOATIME can only be used if you are the file owner or root, which is not a good solution for shared databases :-(  We'll go ahead and get the timings to make sure that this works as expected, though.

Comment 1 RHEL Program Management 2011-02-01 10:08:37 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 2 Steve Whitehouse 2011-02-01 10:34:25 UTC
Created attachment 476352 [details]
Upstream patch

Comment 3 Steve Whitehouse 2011-02-01 10:41:38 UTC
Created attachment 476354 [details]
RHEL6 post of the patch

Comment 4 RHEL Program Management 2011-02-01 19:13:28 UTC
This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.

Comment 5 RHEL Program Management 2011-02-01 19:30:45 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 6 Steve Whitehouse 2011-02-03 11:40:01 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
This bug is a performance issue relating to mmap of the same file from multiple nodes at once. The issue affects only the initial call to mmap and not subsequent page faults, so this will only be noticeable in cases where mmap is called frequently from multiple nodes on the same file. This occurs when running BLAST, for example.

There is a workaround, which is to alter the application to always use O_NOATIME when opening the files to be mapped. This is only possible if the opening process is the file owner or is root. After applying this patch, the workaround is no longer required.

The patch changes the tests applied at mmap time such that for noatime mounts, a glock will not be taken at all. For atime mounts, only a shared glock will be taken at mmap time, although if an atime update is required, an exclusive glock will still be required at a later time to write back the new atime.

Comment 7 Aristeu Rozanski 2011-02-18 22:16:30 UTC
Patch(es) available on kernel-2.6.32-117.el6

Comment 11 errata-xmlrpc 2011-05-19 12:01:25 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html


Note You need to log in before you can comment on or make changes to this bug.