Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 2.1 product line. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 75669

Summary: SG queue function getting null pointer
Product: Red Hat Enterprise Linux 2.1 Reporter: Heather Conway <conway_heather>
Component: kernelAssignee: Tom Coughlan <coughlan>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.1CC: gary_lerhaupt, james.bottomley, jneedle, tao
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-12-19 19:25:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 87937    

Description Heather Conway 2002-10-10 21:44:06 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)

Description of problem:
Sometimes when the memory from the kernels dynamic heap is freed, and then re-
allocated, and over-written between the time an sg interface i/o is queued and 
the queuing function completing the queuing function will get a null pointer. 
This causes the system to panic. 

Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
When working with our mutli-pathing product, allowing all I/O types to use 
passive paths automatically increased the number of SG interface I/Os 
initiated.  This increase caused path failures (reads/writes to NOT_READY 
devices) and ultimately panics.  

Expected Results:  A proposed patch to fix the problem is included below.  This 
was taken from the vanilla kernel.org v2.4.17 kernel.

Additional info:

--- sg.c	Fri May  3 16:06:49 2002
+++ sg.c.FIXED	Mon Oct  7 16:17:08 2002
@@ -645,6 +645,7 @@
     Scsi_Request        * SRpnt;
     Sg_device           * sdp = sfp->parentdp;
     sg_io_hdr_t         * hp = &srp->header;
+    request_queue_t	* q;
 
     srp->data.cmd_opcode = cmnd[0];  /* hold opcode of command */
     hp->status = 0;
@@ -680,6 +681,7 @@
     }
 
     srp->my_cmdp = SRpnt;
+    q = &SRpnt->sr_device->request_queue;
     SRpnt->sr_request.rq_dev = sdp->i_rdev;
     SRpnt->sr_request.rq_status = RQ_ACTIVE;
     SRpnt->sr_sense_buffer[0] = 0;
@@ -715,7 +717,8 @@
 		(void *)SRpnt->sr_buffer, hp->dxfer_len,
 		sg_cmd_done_bh, timeout, SG_DEFAULT_RETRIES);
     /* dxfer_len overwrites SRpnt->sr_bufflen, hence need for b_malloc_len */
-    generic_unplug_device(&SRpnt->sr_device->request_queue);
+//    generic_unplug_device(&SRpnt->sr_device->request_queue);
+    generic_unplug_device(q);
     return 0;
 }

Comment 1 Heather Conway 2002-11-27 16:09:45 UTC
Is there any update as to whether this fix will be included in an upcoming 
errata?
Thanks.


Comment 2 Heather Conway 2003-01-14 18:58:40 UTC
Will this fix be included in the e.11 errata?
Thanks.

Comment 3 Heather Conway 2003-02-08 18:56:24 UTC
Is there still time to get this included into the e.11 errata?  

Comment 4 Matt Domsch 2003-04-28 15:53:37 UTC
This is not fixed in 2.4.9-e.18, thus will not make Q2 quarterly update.

Comment 6 James Bottomley 2003-06-08 13:58:44 UTC
This bug has gone from being a rarely tripped curiosity to being a showstopper
for us in -e.16.  We cannot now get our test harness to run without tripping it.
 I think the prominence has risen because of some of the threading/scheduling
changes making it much more likely that the SRpnt will have been re-used before
the scsi_do_req returns.

The fix listed in this MR is obviously correct and was supplied by Doug Gilbert
to fix this very problem, could you please just apply it

Comment 7 Tom Coughlan 2003-06-09 20:42:00 UTC
The fix is checked in.  It is planned to ship in our next errata.  

Comment 8 Jeff Needle 2003-10-17 19:01:13 UTC
*** Bug 103685 has been marked as a duplicate of this bug. ***

Comment 9 John Flanagan 2003-12-19 19:25:56 UTC
An errata has been issued which should help the problem described in this bug report. 
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen 
this bug report if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2003-408.html