Description of problem: system will crash anytime a large dd command (1-5GB) is done to to iscsi device with qla3xxx driver. The net_device_wrapper becomes corrupt within the kernel and system dies in net_rx_action() when attempting to deref .npinfo struct pointer within the net_device_wrapper. root cause: private data structure of qla3xxx is 0x15c00 in length, this length is stored in a 32 bit short int field within net_device resulting in a recorded length of 5c00. See additional information below. Version-Release number of selected component (if applicable): 5.5 How reproducible: 100% Steps to Reproduce: 1. dd if=/dev/zero of=/iscsimnt/bigfile bs=1024K count=10000 2. 3. Actual results: system crashes Expected results: system doesn't crash Additional info: (from 4.8 case cloned from) Created test kernel using char reserved field just before priv_len for customer to try/test. Verified that storing the whole 24 bit length stops the crashes. RCA: qla3xxx driver allocates private state >0xFFFF in length (0x15c00) which is stored in short int priv_len field within net_device structure (0x5c00). The net_device, private struct, and net_device_wrapper are allocated as a single unit/alloc via alloc_netdev. The net_device_wrapper pointer is created off the truncated priv_len pointer and initialized. The qla3xxx driver eventually uses the part of its private state that overlays this area and writes over the net_device_wrapper. Next net_rx_action() will crash the system upon bad pointer deref. +------------------------------------------------------------- | net_device | : .priv -----------------+ . .priv_len | +------------------------------- | : <alignment padding, if needed> | +------------------------------- <---+ | : <private data area assoc. w/net_device . +------------------------------- : <alignment padding, if needed> +------------------------------- | | net_device_wrapper structure | . .netpoll_setup = (function) . .netpoll_start_xmit = (function) . .npinfo = (structure ptr) +-------------------------------------------------------------- crash> px sizeof(struct ql3_adapter) $3 = 0x15c00 << ok, bigger than short int. struct net_device: short unsigned int priv_len; stored value from crash: priv_len = 0x5c00,
We can easily change the size of priv_len, as the dev_extended function is inline, and if any 3rd party drivers make use of it, the pre-encoded offset of priv_len there will become wrong, causing errors. The other solution here is attached. Basically we just put qla3xxx on a priv diet, taking the big array that it allocates making it dynamic. Note qla3xxx is _very_ old. The customer should likely look at using either qla4xxx or qlge.
Created attachment 436946 [details] patch to shrink inline size of ql3_adapter struct sorry, last comment should have said we _can't_ resize the priv_len field easily. Heres a patch to put ql3_adapter on a bit of an inline diet
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=2659163 Test build with the above patch. Please test and confirm that it solves the problem.
Not currently, since this bz is for 5.5. But the patch should apply just as easily to 4.8. Go ahead and build a kernel for testing, or if you want it saved in cvs, let me know and I'll take care of it
by the way, the patch needs some love. The kmalloc needs a GFP_KERNEL attached to the end of it.
Created attachment 437002 [details] fixed patch heres a fixed up version of the patch
(In reply to comment #2) > Created an attachment (id=436946) [details] > patch to shrink inline size of ql3_adapter struct > > sorry, last comment should have said we _can't_ resize the priv_len field > easily. Heres a patch to put ql3_adapter on a bit of an inline diet Good point! I just made that mistake (I though that BZ was assigned to me like the 4.8 one). Anyway, I would also check that sizeof_priv is not too big in alloc_netdev().
Created attachment 437084 [details] Check sizeof_priv
any update on the test patch. Would like to get this posted asap
Jerome owns the 4.8 bug. Please ask about not having a 4.8 package there.
Created attachment 437915 [details] mered patch
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-215.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Previously, running the dd command on an iSCSI device with the qla3xxx driver may have caused the system to crash. This error has been fixed, and running the dd command on such device no longer crashes the system.
Barry, Any update for the IPv6 iscsi via Qlogic?
path linux-2.6-net-qla3xxx-fix-oops-on-too-long-netdev-priv-structure.patch applied into kernel-2.6.18-236.el5. Sanity Only.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1 @@ -Previously, running the dd command on an iSCSI device with the qla3xxx driver may have caused the system to crash. This error has been fixed, and running the dd command on such device no longer crashes the system.+Previously, running the 'dd' command on an iSCSI device with the qla3xxx driver may have caused the system to crash. This error has been fixed, and running the 'dd' command on such device no longer crashes the system.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html