Bug 76240 - cat /proc/scsi/gdth/0 causes kernel oops
cat /proc/scsi/gdth/0 causes kernel oops
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 2.1
Classification: Red Hat
Component: kernel (Show other bugs)
2.1
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Larry Woodman
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2002-10-18 11:59 EDT by Jure Pecar
Modified: 2007-11-30 17:06 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2003-06-24 07:39:33 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
decoded oops with 2.4.9-e.8 enterprise kernel (8.64 KB, text/plain)
2002-10-18 12:00 EDT, Jure Pecar
no flags Details
a diff of linux/drivers/scsi between 2.4.18-17.7.x and 2.4.18-17.2 (2.83 KB, patch)
2002-12-12 08:02 EST, Jure Pecar
no flags Details | Diff

  None (edit)
Description Jure Pecar 2002-10-18 11:59:48 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020827

Description of problem:
trying to see /proc/scsi/gdth/0 causes an oops.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
the simples way is to cat /proc/scsi/gdth/0.
or, just halt the machine, you'll see the oops at the end.
	

Actual Results:  decoded oops attached.

Expected Results:  i belive the kernel should print some information about the
card and the status of the array ...

Additional info:

Configuration:

Intel SHG2 board
dual Xeon 2.4ghz
6Gb memory
7 disk raid5 array + 1 hotspare configured in raid controller bios

raid controller:

02:08.0 RAID bus controller: Intel Corporation RAID Controller
	Subsystem: Intel Corporation: Unknown device 01ae
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping-
SERR+ FastB2B+
	Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=slow >TAbort- <TAbort-
<MAbort+ >SERR- <PERR-
	Latency: 64, cache line size 08
	Interrupt: pin A routed to IRQ 24
	Region 0: Memory at fc000000 (32-bit, prefetchable) [size=32M]
	Expansion ROM at <unassigned> [disabled] [size=32K]
	Capabilities: [80] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Comment 1 Jure Pecar 2002-10-18 12:00:49 EDT
Created attachment 80947 [details]
decoded oops with 2.4.9-e.8 enterprise kernel
Comment 2 Jure Pecar 2002-12-04 14:10:16 EST
I was trying to do some kind of bisection on patches applied to the standard
2.4.9-ac10 (which works properly), but figured out that this is near impossible
... patches are one big pile of mess, only aio is done aproximately in the way
i'd expect it to be. I only found out that up to the patch #1000 things still
work, after the #1000 the kernel was becoming a PITA to compile.

Can you reorganize all this patch mess in a way that each patch (or a group of
them) would be a self-sufficient unit that would still allow the kernel to compile?
Comment 3 Jure Pecar 2002-12-12 08:00:45 EST
I did some more work searching the bugzilla and diffing various kernel packages
... The closest thing i came to is diff betwenn 2.4.18-17.7.x and 2.4.18-17.2,
for wich arjanv said in bug #77398 it fixes the problem (and indeed it does).
I'm attaching a diff of drivers/scsi of these two kernels ... there's just a
couple of changes, none of them change the behaviour of the 2.4.9-e.10 kernel in
no way. Gdth oopses still in the same way.
Is there some other place in the source to look at?
Comment 4 Jure Pecar 2002-12-12 08:02:06 EST
Created attachment 88564 [details]
a diff of linux/drivers/scsi between 2.4.18-17.7.x and 2.4.18-17.2
Comment 5 Jure Pecar 2002-12-16 03:35:34 EST
Finally ... i needed some printks to figure out what exactly is going on ... if
i modify the patch applied to 2.4.18-17.2 to look like this:

--- kernel-2.4.18-14/linux/drivers/scsi/scsi.c	Tue Dec 10 14:04:55 2002
+++ kernel-2.4.18-17.2/linux/drivers/scsi/scsi.c	Fri Dec  6 10:47:02 2002
@@ -1470,8 +1470,9 @@
 	int j;
 	Scsi_Cmnd *SCpnt;
 	request_queue_t *q = &SDpnt->request_queue;
-
-	spin_lock_irqsave(q->queue_lock, flags);
+	
+	if (q->queue_lock != NULL)
+		spin_lock_irqsave(q->queue_lock, flags);
 
 	if (SDpnt->queue_depth == 0)
 	{
@@ -1520,7 +1521,8 @@
 	} else {
 		SDpnt->has_cmdblocks = 1;
 	}
-	spin_unlock_irqrestore(q->queue_lock, flags);
+	if (q->queue_lock != NULL)
+		spin_unlock_irqrestore(q->queue_lock, flags);
 }


then it actually works.

But it still looks like some ugly workaround ... hiding the real cause of the
problem ... 

Does anyone care to comment?
Comment 6 Jure Pecar 2002-12-16 08:25:40 EST
add this chunk too:

@@ -2705,13 +2707,12 @@
                 panic("Attempt to delete wrong device\n");
         }
 
-        blk_cleanup_queue(&SDpnt->request_queue);
-
         /*
          * We only have a single SCpnt attached to this device.  Free
          * it now.
          */
 	scsi_release_commandblocks(SDpnt);
+        blk_cleanup_queue(&SDpnt->request_queue);
         kfree(SDpnt);
 }


then it really works :)

Comment 7 Larry Woodman 2003-06-23 13:52:51 EDT
This was fixed quite a while ago.  Did you try this with the latest
AS2.1 kernel errata(e.24)?

Larry Woodman
Comment 8 Jure Pecar 2003-06-24 07:39:33 EDT
It was fixed in e.10 or e.12, yes. Might as well close this bug.

Note You need to log in before you can comment on or make changes to this bug.