Bug 458880 - GFS: O_DIRECT writes fail when mixed with mmap reads
GFS: O_DIRECT writes fail when mixed with mmap reads
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: Global_File_System_Guide (Show other bugs)
All Linux
medium Severity medium
: rc
: ---
Assigned To: Steven J. Levine
: Documentation
Depends On:
  Show dependency treegraph
Reported: 2008-08-12 16:42 EDT by Nate Straz
Modified: 2011-07-25 09:18 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2011-07-25 09:18:05 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Nate Straz 2008-08-12 16:42:07 EDT
Description of problem:

coherency is a new test we've been using on GFS2 to verify cluster coherency between different kinds of system calls with different types of I/O.  Upon running these on GFS I found that the following cases are failing when run on a 1k block size file system.


Each one fails on the write system call with "Input/output error."

The I/O generation starts with empty files and writes up to 128k at a time.

d_iogen -I 23617043 -i 120s -f direct -s write -v mmread -p none -T 128k -F 10g:direct-write-mmread

Version-Release number of selected component (if applicable):
kernel-2.6.18-92.el5 (5.2) and kernel-2.6.18-103.el5 (5.3)

How reproducible:
Every time

Steps to Reproduce:
1. mkfs -t gfs -O -b 1024 -j 4 -p lock_dlm -t tank-cluster:brawl0 /dev/brawl/brawl0
2. mount -t gfs -o debug /dev/brawl/brawl0 /mnt/braw
3. coherency -m /mnt/brawl -S REG

Actual results:
direct-*-mm* coherency tags will fail almost immediately and may hang while d_doio tries to connect to d_iogen which already exited.

No SCSI errors were detected and the file system is still usable.

Expected results:
Writes should not fail with input/output errors.  

Additional info:
Comment 1 RHEL Product and Program Management 2008-08-12 17:03:11 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
Comment 2 Nate Straz 2008-08-21 14:35:49 EDT
I've verified that I can reproduce this with the -92.1.10.el5 kernel.  It still requires the 1k block size.  Here is the command I'm using to get all of the failures from all of the coherency log files.

[nstraz@try 4.coherency]$ grep -rh "^Can" .
Can not pwrite() 47104 bytes to 336896 on direct-pwrite-mmindirect: Input/output error
Can not write() 36864 bytes to 580608 on direct-write-mmindirect: Input/output error
Can not writev() 121856 bytes to 1336832 on direct-writev-mmread: Input/output error
Can not writev() 43520 bytes to 1308160 on direct-writev-mmindirect: Input/output error
Can not write() 27136 bytes to 320512 on direct-write-mmread: Input/output error
Can not pwrite() 40448 bytes to 305152 on direct-pwrite-mmread: Input/output error

The file name corresponds to how the file was openned, which syscall was used to write, and which syscall was used to verify the write (i.e. read).  mmindirect is an mmap read which doesn't use a userspace buffer.
Comment 3 Kiersten (Kerri) Anderson 2008-09-15 11:29:01 EDT
Since this isn't a regression and the developers have been consumed on other problems, am defering this to rhel 5.4 consideration.
Comment 5 Robert Peterson 2009-12-22 10:01:39 EST
We don't have a fix yet; retargeting to 5.6.
Comment 9 RHEL Product and Program Management 2010-08-09 15:27:08 EDT
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.
Comment 10 Steve Whitehouse 2011-05-12 06:37:25 EDT
I'm very tempted to suggest that we shouldn't fix this for GFS. There would be a fair amount of work involved and I can't see any use case which is ever likely to want to run both mmap and direct I/O to the same file at the same time. It doesn't make any sense.

So I'm going to suggest that we document that it will not work and then move on. Please let me know if there are any objections to this.
Comment 13 Steve Whitehouse 2011-05-17 05:36:54 EDT
Steven, we'd like to add something along the following lines to the docs for GFS (note, does not apply to GFS2):

Performing I/O through a memory mapping and also via direct I/O to the same file at the same time may result in the direct I/O being failed with an I/O error. This occurs because the page invalidation required for the direct I/O can race with a page fault generated through the mapping. This is only a problem when the memory mapped I/O and the direct I/O are both performed on the same node as each other, and to the same file at the same point in time. A workaround is to use file locking to ensure that memory mapped (i.e. page faults) and direct I/O do not occur simultaneously on the same file.

The Oracle database, which is one of the main direct I/O using applications does not memory map the files to which it uses direct I/O and thus is unaffected. In addition, writing to a file that is memory mapped will succeed, as expected, unless there are page faults in flight at that point in time. The mmap system call on its own is safe when direct I/O is in use.
Comment 14 Steven J. Levine 2011-05-17 15:03:23 EDT
I have added the information in Comment 13 as a note to the current RHEL 5.7 draft of the GFS manual, in the section on direct IO. It can be seen here:

Comment 15 Steven J. Levine 2011-06-02 17:03:50 EDT
Since we're addressing this through documentation (and since I've already updated the draft documentation), I'm changing the component to reflect that.
Comment 16 Steven J. Levine 2011-06-08 12:16:37 EDT
The new note is visible on the link provided in Comment 14 so I am moving this to ON_QA.

Note You need to log in before you can comment on or make changes to this bug.