Bug 674286

Summary: mmapping a read only file on a gfs2 filesystem incorrectly acquires an exclusive glock
Product: Red Hat Enterprise Linux 6 Reporter: Steve Whitehouse <swhiteho>
Component: kernelAssignee: Steve Whitehouse <swhiteho>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.1CC: adas, bmarzins, rpeterso, rwheeler, scooter, swhiteho
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.32-117.el6 Doc Type: Bug Fix
Doc Text:
This bug is a performance issue relating to mmap of the same file from multiple nodes at once. The issue affects only the initial call to mmap and not subsequent page faults, so this will only be noticeable in cases where mmap is called frequently from multiple nodes on the same file. This occurs when running BLAST, for example. There is a workaround, which is to alter the application to always use O_NOATIME when opening the files to be mapped. This is only possible if the opening process is the file owner or is root. After applying this patch, the workaround is no longer required. The patch changes the tests applied at mmap time such that for noatime mounts, a glock will not be taken at all. For atime mounts, only a shared glock will be taken at mmap time, although if an atime update is required, an exclusive glock will still be required at a later time to write back the new atime.
Story Points: ---
Clone Of: 672724 Environment:
Last Closed: 2011-05-19 12:01:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 672724    
Bug Blocks:    
Attachments:
Description Flags
Upstream patch
none
RHEL6 post of the patch none

Description Steve Whitehouse 2011-02-01 09:53:43 UTC
+++ This bug was initially created as a clone of Bug #672724 +++

Created attachment 475319 [details]
Program to demonstrate the problem.

Description of problem: When an application uses mmap to map in a file in a gfs2 filesystem in a read-only mode, it acquires an exclusive glock, even with noatime set on the filesystem.  This has a significant impact on the performance of subsequent invocations of the application if the same file is accessed on multiple nodes.  


Version-Release number of selected component (if applicable): 2.6.18-238.el5


How reproducible: Always


Steps to Reproduce:
1. Start with a file on a gfs2 filesystem that has no cached glocks
2. Run the attached application to map that file in read only
3.
  
Actual results: An exclusive glock will be created


Expected results: Only shared locks should be created

--- Additional comment from swhiteho on 2011-01-31 12:45:19 EST ---

I've tracked down what is going on here....

It is all down to the test used in the ->mmap() function which is supposed to skip the EX lock if there are no atime updates to be performed. The reason that the EX lock is being taken, is that there are a number of different ways in which the noatime state can be set: via the mount flags, via the O_NOATIME file flag and via the S_NOATIME flag (set on a per file basis via setattr)

The code checks only for O_NOATIME (which if set does prevent grabbing the EX lock) but the check is repeated later on in the VFS atime code, so that the actual atime updates are done correctly. Its only the locking that isn't quite correct.

So if you have access to the source code, there is a temporary workaround of opening the files to be mmaped with O_NOATIME. Note that this only happens on mmap() and not on page faults, so if the files are mmap()ed just once and then used many times, only the initial mmap call will require an EX lock. After that point all the locks will be PR (for read-only access, even if the file is mapped read/write).

That should allow you to get on with your BLAST runs. I'll try and get a patch sorted out for this as soon as I can.

--- Additional comment from scooter.edu on 2011-01-31 13:48:28 EST ---

Steve,
   Excellent news!!  We'll change BLAST right away and let you know the impact.  Since the loader uses mmap() quite heavily, we are still interested in a patched kernel.  This explains some symptoms that we had early on that we weren't able to explain (so we worked around them).

--- Additional comment from scooter.edu on 2011-01-31 17:05:17 EST ---

Steve.  It turns out the O_NOATIME can only be used if you are the file owner or root, which is not a good solution for shared databases :-(  We'll go ahead and get the timings to make sure that this works as expected, though.

Comment 1 RHEL Program Management 2011-02-01 10:08:37 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 2 Steve Whitehouse 2011-02-01 10:34:25 UTC
Created attachment 476352 [details]
Upstream patch

Comment 3 Steve Whitehouse 2011-02-01 10:41:38 UTC
Created attachment 476354 [details]
RHEL6 post of the patch

Comment 4 RHEL Program Management 2011-02-01 19:13:28 UTC
This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.

Comment 5 RHEL Program Management 2011-02-01 19:30:45 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 6 Steve Whitehouse 2011-02-03 11:40:01 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
This bug is a performance issue relating to mmap of the same file from multiple nodes at once. The issue affects only the initial call to mmap and not subsequent page faults, so this will only be noticeable in cases where mmap is called frequently from multiple nodes on the same file. This occurs when running BLAST, for example.

There is a workaround, which is to alter the application to always use O_NOATIME when opening the files to be mapped. This is only possible if the opening process is the file owner or is root. After applying this patch, the workaround is no longer required.

The patch changes the tests applied at mmap time such that for noatime mounts, a glock will not be taken at all. For atime mounts, only a shared glock will be taken at mmap time, although if an atime update is required, an exclusive glock will still be required at a later time to write back the new atime.

Comment 7 Aristeu Rozanski 2011-02-18 22:16:30 UTC
Patch(es) available on kernel-2.6.32-117.el6

Comment 11 errata-xmlrpc 2011-05-19 12:01:25 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html