Bug 456453
| Summary: | GFS2: d_rwdirectempty fails with short read | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Nate Straz <nstraz> | ||||||
| Component: | kernel | Assignee: | Ben Marzinski <bmarzins> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 5.3 | CC: | bstevens, edamato, lwang, swhiteho | ||||||
| Target Milestone: | beta | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2009-01-20 20:08:23 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
The problem is that the vfs code does a check to see if the position of a direct_io read is past the end of a file in __generic_file_aio_read(). If it is, then it never calls generic_file_direct_IO(), which is what hooks into the gfs2 directio read code. This check is useful for local filesystems, but since gfs2 doesn't have a lock on the file, there's no guarantee that the file size will be correct for gfs2. This won't be a problem for gfs, since it grabs the locks at an earlier stage of the system call. Created attachment 312687 [details]
patch that fixes the short reads
This patch applies on top of the 2.6.18-98.el5 RHEL5 kernel. It adds another
inode flag, S_NOSIZECHK, that skips the test of whether the read position is
past the end of the file. GFS2 sets this on all of its inodes, so that this
check is skipped.
I have been totally unable to recreate this issue on the upstream 2.6.26 kernel. However I can't see any reason why it should be any different. Created attachment 314123 [details]
Port of upstream fix.
My last patch only dealt with the directio case. This happens for cached reads too. However, this problem is fixed by already existing patch in the upstream kernel. So this is a port of that patch.
in kernel-2.6.18-107.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Verified against kernel-2.6.18-122.el5. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html |
Description of problem: While running distributed I/O test cases on GFS2, the d_rwdirectempty test case fails. This test case starts with a new zero length file and performs sequential writes using O_DIRECT to the file. The writes are then verified on a different node. In the failing case, the read returns 0 bytes. $ cat r3/8.d_rwdirectempty/*/cmd.log d_iogen starting up with the following: Start Time: Wed Jul 23 14:15:11 2008 Session id: 19737 Resource file: /local/nstraz/svn/sts-rhel5/sts-root/var/share/resource_files/tank-cluster.xml Internal Region Lock Type: clm Iterations: 30s Seed: 11883 Offset-mode: sequential Overlap Flag: off Mintrans: 512000 Maxtrans: 8388608 Requests: read,write Syscalls: read,readv,write,writev Verify Syscalls: read IO type: direct Test Files: Path Size (bytes) --------------------------------------------------------------- rwdirectempty 758990139392 d_doio ior status != expected status ======== msg ======== type: 2 (verify) status: 0 (nack) expected status: 1 (ack) srchost: tank-03 srcpid: 9932 desthost: try destpid: 0 ior: ----- xior ---- magic: 0xfeed10 type: 4 (read) path: rwdirectempty syscall: read oflags: 16386 (O_RDWR|O_DIRECT) offset: 36082688 count: 783872 pattern: N:9974:tank-04:writev* chksum: 0xa8da8144 ===================== Cleanup took 1 seconds. d_doio(11063) 0 requests finished, exiting... d_doio(11064) 2 requests finished, exiting... d_doio(11065) 1 requests finished, exiting... d_doio(11062) 0 requests finished, exiting... d_doio(11066) 1 requests finished, exiting... d_doio(9927) 2 requests finished, exiting... d_doio(9928) 1 requests finished, exiting... d_doio(9930) 1 requests finished, exiting... d_doio(9929) 1 requests finished, exiting... d_doio(9931) 0 requests finished, exiting... Short read(), read 0 of 783872 bytes at 36082688 on rwdirectempty (parent) pid 9932 exited non-zero d_doio(9973) 1 requests finished, exiting... d_doio(9972) 1 requests finished, exiting... d_doio(9976) 1 requests finished, exiting... d_doio(9974) 1 requests finished, exiting... d_doio(9975) 1 requests finished, exiting... Version-Release number of selected component (if applicable): kernel-2.6.18-98.el5 kmod-gfs2-1.98-1.1.el5.abhi.4 How reproducible: Every time. Steps to Reproduce: 1. run dd_io on a GFS2 file system. 2. 3. Actual results: Expected results: Additional info: The test case passes on GFS.