Bug 620508 - system crashes due to corrupt net_device_wrapper structure
Summary: system crashes due to corrupt net_device_wrapper structure
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.5
Hardware: All
OS: Linux
urgent
high
Target Milestone: rc
: ---
Assignee: Neil Horman
QA Contact: Gris Ge
URL:
Whiteboard:
Depends On:
Blocks: 637204 637206
TreeView+ depends on / blocked
 
Reported: 2010-08-02 18:02 UTC by Dwight (Bud) Brown
Modified: 2011-01-13 21:47 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, running the 'dd' command on an iSCSI device with the qla3xxx driver may have caused the system to crash. This error has been fixed, and running the 'dd' command on such device no longer crashes the system.
Clone Of: 620485
Environment:
Requires: Qlogic iscsi hba: ISP4032-based Ethernet IPv6 NIC
Last Closed: 2011-01-13 21:47:44 UTC


Attachments (Terms of Use)
patch to shrink inline size of ql3_adapter struct (2.22 KB, patch)
2010-08-05 19:19 UTC, Neil Horman
no flags Details | Diff
fixed patch (2.23 KB, patch)
2010-08-06 00:05 UTC, Neil Horman
no flags Details | Diff
Check sizeof_priv (1.03 KB, patch)
2010-08-06 09:37 UTC, Jerome Marchand
no flags Details | Diff
mered patch (3.46 KB, patch)
2010-08-10 15:03 UTC, Neil Horman
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0017 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.6 kernel security and bug fix update 2011-01-13 10:37:42 UTC

Description Dwight (Bud) Brown 2010-08-02 18:02:44 UTC
Description of problem:
system will crash anytime a large dd command (1-5GB) is done to to
iscsi device with qla3xxx driver.  The net_device_wrapper becomes corrupt within the kernel and system dies in net_rx_action() when attempting to deref .npinfo struct pointer within the net_device_wrapper.

root cause: private data structure of qla3xxx is 0x15c00 in length, this length
is stored in a 32 bit short int field within net_device resulting in a recorded
length of 5c00.  See additional information below.



Version-Release number of selected component (if applicable):
5.5

How reproducible:
100%


Steps to Reproduce:
1. dd if=/dev/zero of=/iscsimnt/bigfile bs=1024K count=10000
2.
3.

Actual results:
system crashes

Expected results:
system doesn't crash


Additional info: (from 4.8 case cloned from)
Created test kernel using char reserved field just before priv_len for customer
to try/test.  Verified that storing the whole 24 bit length stops the crashes.  

RCA:
qla3xxx driver allocates private state >0xFFFF in length (0x15c00) which is
stored in short int priv_len field within net_device structure (0x5c00).

The net_device, private struct, and net_device_wrapper are allocated as a
single unit/alloc via alloc_netdev.  The net_device_wrapper pointer is created
off the truncated priv_len pointer and initialized.  The qla3xxx driver
eventually uses the part of its private state that overlays this area and
writes over the net_device_wrapper.  Next net_rx_action() will crash the system
upon bad pointer deref.

+-------------------------------------------------------------
|  net_device
|
:              .priv -----------------+
.              .priv_len              |
+-------------------------------      |
: <alignment padding, if needed>      |
+-------------------------------  <---+
|
:  <private data area assoc. w/net_device
.
+-------------------------------
: <alignment padding, if needed>
+-------------------------------
|
|  net_device_wrapper structure
|
.     .netpoll_setup = (function)
.     .netpoll_start_xmit = (function)
.     .npinfo = (structure ptr)
+--------------------------------------------------------------

crash> px sizeof(struct ql3_adapter)
$3 = 0x15c00   << ok, bigger than short int.

struct net_device:
    short unsigned int priv_len;

stored value from crash:
  priv_len = 0x5c00,

Comment 1 Neil Horman 2010-08-05 19:16:11 UTC
We can easily change the size of priv_len, as the dev_extended function is inline, and if any 3rd party drivers make use of it, the pre-encoded offset of priv_len there will become wrong, causing errors.  The other solution here is attached.  Basically we just put qla3xxx on a priv diet, taking the big array that it allocates making it dynamic.

Note qla3xxx is _very_ old.  The customer should likely look at using either qla4xxx or qlge.

Comment 2 Neil Horman 2010-08-05 19:19:38 UTC
Created attachment 436946 [details]
patch to shrink inline size of ql3_adapter struct

sorry, last comment should have said we _can't_ resize the priv_len field easily.  Heres a patch to put ql3_adapter on a bit of an inline diet

Comment 3 Neil Horman 2010-08-05 19:31:59 UTC
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=2659163

Test build with the above patch.  Please test and confirm that it solves the problem.

Comment 5 Neil Horman 2010-08-06 00:02:39 UTC
Not currently, since this bz is for 5.5.  But the patch should apply just as easily to 4.8.  Go ahead and build a kernel for testing, or if you want it saved in cvs, let me know and I'll take care of it

Comment 6 Neil Horman 2010-08-06 00:03:49 UTC
by the way, the patch needs some love.  The kmalloc needs a GFP_KERNEL attached to the end of it.

Comment 7 Neil Horman 2010-08-06 00:05:31 UTC
Created attachment 437002 [details]
fixed patch

heres a fixed up version of the patch

Comment 8 Jerome Marchand 2010-08-06 09:35:54 UTC
(In reply to comment #2)
> Created an attachment (id=436946) [details]
> patch to shrink inline size of ql3_adapter struct
> 
> sorry, last comment should have said we _can't_ resize the priv_len field
> easily.  Heres a patch to put ql3_adapter on a bit of an inline diet    

Good point! I just made that mistake (I though that BZ was assigned to me like the 4.8 one).

Anyway, I would also check that sizeof_priv is not too big in alloc_netdev().

Comment 9 Jerome Marchand 2010-08-06 09:37:31 UTC
Created attachment 437084 [details]
Check sizeof_priv

Comment 10 Neil Horman 2010-08-09 14:38:04 UTC
any update on the test patch.  Would like to get this posted asap

Comment 12 Neil Horman 2010-08-09 17:42:29 UTC
Jerome owns the 4.8 bug.  Please ask about not having a 4.8 package there.

Comment 13 Neil Horman 2010-08-10 15:03:34 UTC
Created attachment 437915 [details]
mered patch

Comment 14 RHEL Product and Program Management 2010-08-27 18:30:41 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 16 Jarod Wilson 2010-09-03 19:06:49 UTC
in kernel-2.6.18-215.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 23 Jaromir Hradilek 2010-10-12 22:50:39 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously, running the dd command on an iSCSI device with the qla3xxx driver may have caused the system to crash. This error has been fixed, and running the dd command on such device no longer crashes the system.

Comment 27 Gris Ge 2010-12-21 06:22:10 UTC
Barry,
Any update for the IPv6 iscsi via Qlogic?

Comment 29 Gris Ge 2010-12-22 06:30:37 UTC
path linux-2.6-net-qla3xxx-fix-oops-on-too-long-netdev-priv-structure.patch applied into kernel-2.6.18-236.el5.

Sanity Only.

Comment 30 Martin Prpič 2010-12-22 16:42:08 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-Previously, running the dd command on an iSCSI device with the qla3xxx driver may have caused the system to crash. This error has been fixed, and running the dd command on such device no longer crashes the system.+Previously, running the 'dd' command on an iSCSI device with the qla3xxx driver may have caused the system to crash. This error has been fixed, and running the 'dd' command on such device no longer crashes the system.

Comment 32 errata-xmlrpc 2011-01-13 21:47:44 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html


Note You need to log in before you can comment on or make changes to this bug.