Bug 162094

Summary: read() with count > 0xffffffff panics kernel at fs/direct-io.c:886
Product: Red Hat Enterprise Linux 4 Reporter: David Milburn <dmilburn>
Component: kernelAssignee: Peter Staubach <staubach>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: jbaron, sct, tao
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2006-0132 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-03-07 19:13:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 168429    
Attachments:
Description Flags
Program to reproduce the bug
none
Patch to fix
none
Proposed patch
none
Proposed patch none

Description David Milburn 2005-06-29 20:03:03 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050302 Firefox/1.0.1 Fedora/1.0.1-1.3.2

Description of problem:
Using read() system call with large count (> 0xffffffff) against raw
device (or block device file that is opened with O_DIRECT) causes
kernel panic on RHEL4 with the following message:

   kernel BUG at fs/direct-io.c:886!


Version-Release number of selected component (if applicable):
kernel-2.6.9-5.EL

How reproducible:
Always

Steps to Reproduce:
1. Edit reproduce.c with appropriate FILE_NAME and recompile
2. Execute the reproduce program
3.
  

Actual Results:  kernel panics with the following message:

&#12288;&#12288;    kernel BUG at fs/direct-io.c:886!


Expected Results:  kernel should not panic


Additional info:

Customer developed fix based upon the following three patches from linux-2.6.11-rc3

http://lia64.bkbits.net:8080/linux-ia64-release-2.6.12/cset@41f6cf91c1R7rbuggBVQLxBuD7m6Aw
http://lia64.bkbits.net:8080/linux-ia64-release-2.6.12/cset@41f71cbbbAqnp67z79i7SSVQGtmQzg
http://lia64.bkbits.net:8080/linux-ia64-release-2.6.12/cset@42026b11ti7KiDM_DMvBv5ZQH_3yLw

Comment 1 David Milburn 2005-06-29 20:04:47 UTC
Created attachment 116144 [details]
Program to reproduce the bug

Comment 2 David Milburn 2005-06-29 20:05:37 UTC
Created attachment 116145 [details]
Patch to fix

Comment 4 Peter Staubach 2005-07-22 18:38:05 UTC
This situation occurs because an unsigned int is used to store the size of
maximum contiguous number of blocks which can be transfered at once.  When
doing a direct-io read on a block device, the size of the transfer is set
to the minimum of the size of the clock device or the requested number of
bytes.

In the test case, the program tries to read 4GB, 0x100000000.  I used a 10G
partition.  Therefore, the code tried to store 0x100000000 in an unsigned
int.  This won't fit and ends up zeroing out the int.

This situation can be addressed either by limiting the read count size,
as the proposed patch does, or by handling the request as several smaller
requests inside of the kernel.  The advantage of this latter approach is
that the system call semantics are maintained and the application does not
need to be aware that it is dealing with a "file" with different
characteristics and the file struct does not have to be modified.

Comment 5 Peter Staubach 2005-08-26 14:14:45 UTC
Created attachment 118154 [details]
Proposed patch

Comment 6 Peter Staubach 2005-08-26 14:27:11 UTC
The proposed patch breaks up the original, single iovec into multiple smaller
iovecs, each capable of being expressed using a 32 bit integer.  This avoids
the overflow that the current system suffers from.

Comment 13 Peter Staubach 2005-10-10 15:23:54 UTC
Created attachment 119775 [details]
Proposed patch

Comment 16 Peter Staubach 2005-10-11 12:32:30 UTC
I don't understand the question.  If it is about which symbol should be used
at the user level, then I don't actually know and will have to defer to some
one else with more experience in the kernel to user level symbol translation.

Comment 24 Red Hat Bugzilla 2006-03-07 19:13:42 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0132.html