Bug 456453

Summary: GFS2: d_rwdirectempty fails with short read
Product: Red Hat Enterprise Linux 5 Reporter: Nate Straz <nstraz>
Component: kernelAssignee: Ben Marzinski <bmarzins>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.3CC: bstevens, edamato, lwang, swhiteho
Target Milestone: beta   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-20 20:08:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch that fixes the short reads
none
Port of upstream fix. none

Description Nate Straz 2008-07-23 19:30:56 UTC
Description of problem:

While running distributed I/O test cases on GFS2, the d_rwdirectempty test case
fails.  This test case starts with a new zero length file and performs
sequential writes using O_DIRECT to the file.  The writes are then verified on a
different node.  In the failing case, the read returns 0 bytes.

$ cat r3/8.d_rwdirectempty/*/cmd.log
d_iogen starting up with the following:
Start Time:                 Wed Jul 23 14:15:11 2008
Session id:                 19737
Resource file:             
/local/nstraz/svn/sts-rhel5/sts-root/var/share/resource_files/tank-cluster.xml
Internal Region Lock Type:  clm
Iterations:                 30s
Seed:                       11883
Offset-mode:                sequential
Overlap Flag:               off
Mintrans:                   512000
Maxtrans:                   8388608
Requests:                   read,write
Syscalls:                   read,readv,write,writev
Verify Syscalls:            read
IO type:                    direct

Test Files:

Path                                                      Size
                                                        (bytes)
---------------------------------------------------------------
rwdirectempty                                        758990139392
d_doio ior status != expected status

======== msg ========
type: 2 (verify)
status: 0 (nack)
expected status: 1 (ack)
srchost: tank-03
srcpid: 9932
desthost: try
destpid: 0
ior: 
----- xior ----
magic: 0xfeed10
type: 4 (read)
path: rwdirectempty
syscall: read
oflags: 16386 (O_RDWR|O_DIRECT)
offset: 36082688
count: 783872
pattern: N:9974:tank-04:writev*
chksum: 0xa8da8144

=====================
Cleanup took 1 seconds.
d_doio(11063) 0 requests finished, exiting...
d_doio(11064) 2 requests finished, exiting...
d_doio(11065) 1 requests finished, exiting...
d_doio(11062) 0 requests finished, exiting...
d_doio(11066) 1 requests finished, exiting...
d_doio(9927) 2 requests finished, exiting...
d_doio(9928) 1 requests finished, exiting...
d_doio(9930) 1 requests finished, exiting...
d_doio(9929) 1 requests finished, exiting...
d_doio(9931) 0 requests finished, exiting...
Short read(), read 0 of 783872 bytes at 36082688 on rwdirectempty
(parent) pid 9932 exited non-zero
d_doio(9973) 1 requests finished, exiting...
d_doio(9972) 1 requests finished, exiting...
d_doio(9976) 1 requests finished, exiting...
d_doio(9974) 1 requests finished, exiting...
d_doio(9975) 1 requests finished, exiting...



Version-Release number of selected component (if applicable):
kernel-2.6.18-98.el5
kmod-gfs2-1.98-1.1.el5.abhi.4

How reproducible:
Every time.

Steps to Reproduce:
1. run dd_io on a GFS2 file system.
2.
3.
  
Actual results:


Expected results:


Additional info:

The test case passes on GFS.

Comment 2 Ben Marzinski 2008-07-25 15:50:40 UTC
The problem is that the vfs code does a check to see if the position of a
direct_io read is past the end of a file in __generic_file_aio_read(). If it is,
then it never calls generic_file_direct_IO(), which is what hooks into the gfs2
directio read code.  This check is useful for local filesystems, but since gfs2
doesn't have a lock on the file, there's no guarantee that the file size will be
correct for gfs2.  This won't be a problem for gfs, since it grabs the locks at
an earlier stage of the system call.

Comment 3 Ben Marzinski 2008-07-25 22:02:10 UTC
Created attachment 312687 [details]
patch that fixes the short reads

This patch applies on top of the 2.6.18-98.el5 RHEL5 kernel. It adds another
inode flag, S_NOSIZECHK, that skips the test of whether the read position is
past the end of the file.  GFS2 sets this on all of its inodes, so that this
check is skipped.

Comment 4 Ben Marzinski 2008-08-04 18:01:57 UTC
I have been totally unable to recreate this issue on the upstream 2.6.26 kernel. However I can't see any reason why it should be any different.

Comment 5 Ben Marzinski 2008-08-12 18:20:35 UTC
Created attachment 314123 [details]
Port of upstream fix.

My last patch only dealt with the directio case. This happens for cached reads too. However, this problem is fixed by already existing patch in the upstream kernel. So this is a port of that patch.

Comment 6 Don Zickus 2008-09-03 03:40:52 UTC
in kernel-2.6.18-107.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 9 Nate Straz 2008-11-13 22:19:24 UTC
Verified against kernel-2.6.18-122.el5.

Comment 11 errata-xmlrpc 2009-01-20 20:08:23 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html