From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Description of problem: Sometimes when the memory from the kernels dynamic heap is freed, and then re- allocated, and over-written between the time an sg interface i/o is queued and the queuing function completing the queuing function will get a null pointer. This causes the system to panic. Version-Release number of selected component (if applicable): How reproducible: Sometimes Steps to Reproduce: When working with our mutli-pathing product, allowing all I/O types to use passive paths automatically increased the number of SG interface I/Os initiated. This increase caused path failures (reads/writes to NOT_READY devices) and ultimately panics. Expected Results: A proposed patch to fix the problem is included below. This was taken from the vanilla kernel.org v2.4.17 kernel. Additional info: --- sg.c Fri May 3 16:06:49 2002 +++ sg.c.FIXED Mon Oct 7 16:17:08 2002 @@ -645,6 +645,7 @@ Scsi_Request * SRpnt; Sg_device * sdp = sfp->parentdp; sg_io_hdr_t * hp = &srp->header; + request_queue_t * q; srp->data.cmd_opcode = cmnd[0]; /* hold opcode of command */ hp->status = 0; @@ -680,6 +681,7 @@ } srp->my_cmdp = SRpnt; + q = &SRpnt->sr_device->request_queue; SRpnt->sr_request.rq_dev = sdp->i_rdev; SRpnt->sr_request.rq_status = RQ_ACTIVE; SRpnt->sr_sense_buffer[0] = 0; @@ -715,7 +717,8 @@ (void *)SRpnt->sr_buffer, hp->dxfer_len, sg_cmd_done_bh, timeout, SG_DEFAULT_RETRIES); /* dxfer_len overwrites SRpnt->sr_bufflen, hence need for b_malloc_len */ - generic_unplug_device(&SRpnt->sr_device->request_queue); +// generic_unplug_device(&SRpnt->sr_device->request_queue); + generic_unplug_device(q); return 0; }
Is there any update as to whether this fix will be included in an upcoming errata? Thanks.
Will this fix be included in the e.11 errata? Thanks.
Is there still time to get this included into the e.11 errata?
This is not fixed in 2.4.9-e.18, thus will not make Q2 quarterly update.
This bug has gone from being a rarely tripped curiosity to being a showstopper for us in -e.16. We cannot now get our test harness to run without tripping it. I think the prominence has risen because of some of the threading/scheduling changes making it much more likely that the SRpnt will have been re-used before the scsi_do_req returns. The fix listed in this MR is obviously correct and was supplied by Doug Gilbert to fix this very problem, could you please just apply it
The fix is checked in. It is planned to ship in our next errata.
*** Bug 103685 has been marked as a duplicate of this bug. ***
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2003-408.html