Bug 162094 - read() with count > 0xffffffff panics kernel at fs/direct-io.c:886
Summary: read() with count > 0xffffffff panics kernel at fs/direct-io.c:886
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: ia64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Peter Staubach
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 168429
TreeView+ depends on / blocked
 
Reported: 2005-06-29 20:03 UTC by David Milburn
Modified: 2018-10-19 19:02 UTC (History)
3 users (show)

Fixed In Version: RHSA-2006-0132
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-03-07 19:13:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Program to reproduce the bug (820 bytes, text/plain)
2005-06-29 20:04 UTC, David Milburn
no flags Details
Patch to fix (5.77 KB, patch)
2005-06-29 20:05 UTC, David Milburn
no flags Details | Diff
Proposed patch (2.19 KB, patch)
2005-08-26 14:14 UTC, Peter Staubach
no flags Details | Diff
Proposed patch (2.19 KB, patch)
2005-10-10 15:23 UTC, Peter Staubach
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2005:808 0 normal SHIPPED_LIVE Important: kernel security update 2005-10-27 04:00:00 UTC
Red Hat Product Errata RHSA-2006:0132 0 qe-ready SHIPPED_LIVE Moderate: Updated kernel packages available for Red Hat Enterprise Linux 4 Update 3 2006-03-09 16:31:00 UTC

Description David Milburn 2005-06-29 20:03:03 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050302 Firefox/1.0.1 Fedora/1.0.1-1.3.2

Description of problem:
Using read() system call with large count (> 0xffffffff) against raw
device (or block device file that is opened with O_DIRECT) causes
kernel panic on RHEL4 with the following message:

   kernel BUG at fs/direct-io.c:886!


Version-Release number of selected component (if applicable):
kernel-2.6.9-5.EL

How reproducible:
Always

Steps to Reproduce:
1. Edit reproduce.c with appropriate FILE_NAME and recompile
2. Execute the reproduce program
3.
  

Actual Results:  kernel panics with the following message:

      kernel BUG at fs/direct-io.c:886!


Expected Results:  kernel should not panic


Additional info:

Customer developed fix based upon the following three patches from linux-2.6.11-rc3

http://lia64.bkbits.net:8080/linux-ia64-release-2.6.12/cset@41f6cf91c1R7rbuggBVQLxBuD7m6Aw
http://lia64.bkbits.net:8080/linux-ia64-release-2.6.12/cset@41f71cbbbAqnp67z79i7SSVQGtmQzg
http://lia64.bkbits.net:8080/linux-ia64-release-2.6.12/cset@42026b11ti7KiDM_DMvBv5ZQH_3yLw

Comment 1 David Milburn 2005-06-29 20:04:47 UTC
Created attachment 116144 [details]
Program to reproduce the bug

Comment 2 David Milburn 2005-06-29 20:05:37 UTC
Created attachment 116145 [details]
Patch to fix

Comment 4 Peter Staubach 2005-07-22 18:38:05 UTC
This situation occurs because an unsigned int is used to store the size of
maximum contiguous number of blocks which can be transfered at once.  When
doing a direct-io read on a block device, the size of the transfer is set
to the minimum of the size of the clock device or the requested number of
bytes.

In the test case, the program tries to read 4GB, 0x100000000.  I used a 10G
partition.  Therefore, the code tried to store 0x100000000 in an unsigned
int.  This won't fit and ends up zeroing out the int.

This situation can be addressed either by limiting the read count size,
as the proposed patch does, or by handling the request as several smaller
requests inside of the kernel.  The advantage of this latter approach is
that the system call semantics are maintained and the application does not
need to be aware that it is dealing with a "file" with different
characteristics and the file struct does not have to be modified.

Comment 5 Peter Staubach 2005-08-26 14:14:45 UTC
Created attachment 118154 [details]
Proposed patch

Comment 6 Peter Staubach 2005-08-26 14:27:11 UTC
The proposed patch breaks up the original, single iovec into multiple smaller
iovecs, each capable of being expressed using a 32 bit integer.  This avoids
the overflow that the current system suffers from.

Comment 13 Peter Staubach 2005-10-10 15:23:54 UTC
Created attachment 119775 [details]
Proposed patch

Comment 16 Peter Staubach 2005-10-11 12:32:30 UTC
I don't understand the question.  If it is about which symbol should be used
at the user level, then I don't actually know and will have to defer to some
one else with more experience in the kernel to user level symbol translation.

Comment 24 Red Hat Bugzilla 2006-03-07 19:13:42 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0132.html



Note You need to log in before you can comment on or make changes to this bug.