Red Hat Bugzilla – Bug 456453
GFS2: d_rwdirectempty fails with short read
Last modified: 2009-01-20 15:08:23 EST
Description of problem:
While running distributed I/O test cases on GFS2, the d_rwdirectempty test case
fails. This test case starts with a new zero length file and performs
sequential writes using O_DIRECT to the file. The writes are then verified on a
different node. In the failing case, the read returns 0 bytes.
$ cat r3/8.d_rwdirectempty/*/cmd.log
d_iogen starting up with the following:
Start Time: Wed Jul 23 14:15:11 2008
Session id: 19737
Internal Region Lock Type: clm
Overlap Flag: off
Verify Syscalls: read
IO type: direct
d_doio ior status != expected status
======== msg ========
type: 2 (verify)
status: 0 (nack)
expected status: 1 (ack)
----- xior ----
type: 4 (read)
oflags: 16386 (O_RDWR|O_DIRECT)
Cleanup took 1 seconds.
d_doio(11063) 0 requests finished, exiting...
d_doio(11064) 2 requests finished, exiting...
d_doio(11065) 1 requests finished, exiting...
d_doio(11062) 0 requests finished, exiting...
d_doio(11066) 1 requests finished, exiting...
d_doio(9927) 2 requests finished, exiting...
d_doio(9928) 1 requests finished, exiting...
d_doio(9930) 1 requests finished, exiting...
d_doio(9929) 1 requests finished, exiting...
d_doio(9931) 0 requests finished, exiting...
Short read(), read 0 of 783872 bytes at 36082688 on rwdirectempty
(parent) pid 9932 exited non-zero
d_doio(9973) 1 requests finished, exiting...
d_doio(9972) 1 requests finished, exiting...
d_doio(9976) 1 requests finished, exiting...
d_doio(9974) 1 requests finished, exiting...
d_doio(9975) 1 requests finished, exiting...
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. run dd_io on a GFS2 file system.
The test case passes on GFS.
The problem is that the vfs code does a check to see if the position of a
direct_io read is past the end of a file in __generic_file_aio_read(). If it is,
then it never calls generic_file_direct_IO(), which is what hooks into the gfs2
directio read code. This check is useful for local filesystems, but since gfs2
doesn't have a lock on the file, there's no guarantee that the file size will be
correct for gfs2. This won't be a problem for gfs, since it grabs the locks at
an earlier stage of the system call.
Created attachment 312687 [details]
patch that fixes the short reads
This patch applies on top of the 2.6.18-98.el5 RHEL5 kernel. It adds another
inode flag, S_NOSIZECHK, that skips the test of whether the read position is
past the end of the file. GFS2 sets this on all of its inodes, so that this
check is skipped.
I have been totally unable to recreate this issue on the upstream 2.6.26 kernel. However I can't see any reason why it should be any different.
Created attachment 314123 [details]
Port of upstream fix.
My last patch only dealt with the directio case. This happens for cached reads too. However, this problem is fixed by already existing patch in the upstream kernel. So this is a port of that patch.
You can download this test kernel from http://people.redhat.com/dzickus/el5
Verified against kernel-2.6.18-122.el5.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.