Bug 164338 - fix aio hang when reading beyond EOF
Summary: fix aio hang when reading beyond EOF
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Jeff Moyer
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 156322
TreeView+ depends on / blocked
 
Reported: 2005-07-27 01:47 UTC by Jason Baron
Modified: 2013-03-06 05:58 UTC (History)
1 user (show)

Fixed In Version: RHSA-2005-514
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-10-05 13:45:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2005:514 0 qe-ready SHIPPED_LIVE Important: Updated kernel packages available for Red Hat Enterprise Linux 4 Update 2 2005-10-05 04:00:00 UTC

Description Jason Baron 2005-07-27 01:47:52 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

| Hello all,
|
| I came across the following problem while running ltp-aiodio testcases
| from ltp-full-20050405 on linux-2.6.12-rc3-mm3. I tried running the
| tests with EXT3 as well as JFS filesystems.
|
| One or two fsx-linux testcases were hung after some time. These
| testcases were hanging at wait_for_all_aios().
|
|  From initial debugging I found that there were some iocbs which were
| not getting completed eventhough the last retry for those returned
| -EIOCBQUEUED. Also all such pending iocbs represented READ operation.
|
| Further debugging revealed that all such iocbs hit EOF in the DIO layer.
| To be more precise, the "pos" from which they were trying to read was
| greater than the "size" of the file. So the generic_file_direct_IO
| returned 0.
|
| This happens rarely as there is already a check in
| __generic_file_aio_read(), for whether "pos" < "size" before calling
| direct IO routine.
|
| > size = i_size_read(inode);
| > if (pos < size) {
| >       retval = generic_file_direct_IO(READ, iocb,
| >                                iov, pos, nr_segs);
|
|
| But for READ, we are taking the inode->i_sem only in the DIO layer. So
| it is possible that some other process can change the size of the file
| before we take the i_sem. In such a case ( when "pos" > "size"), the
| __generic_file_aio_read() would return -EIOCBQUEUED even though there
| were no I/O requests submitted by the DIO layer. This would cause the
| AIO layer to expect aio_complete() for THE iocb, which doesnot happen.
| And thus the test hangs forever, waiting for an I/O completion, where
| there are no requests submitted at all.
|
| The following patch makes __generic_file_aio_read() return 0 ( instead
| of returning -EIOCBQUEUED ), on getting 0 from generic_file_direct_IO(),
| so that the AIO layer does the aio_complete().
|
| Testing:
|
| I have tested the patch on a SMP machine(with 2 Pentium 4 (HT)) running
| linux-2.6.12-rc3-mm3. I ran the ltp-aiodio testcases and none of the
| fsx-linux tests hung. Also the aio-stress tests ran without any problem.
|
| --
| thanks,
|
| Suzuki K P
| Linux Technology Centre
| IBM Software Labs
`----

Comment 5 Red Hat Bugzilla 2005-10-05 13:45:30 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-514.html



Note You need to log in before you can comment on or make changes to this bug.