66867 – Kernel OOPS using gdth module on RH 7.3 2.4.18-3smp kernel

Bug 66867 - Kernel OOPS using gdth module on RH 7.3 2.4.18-3smp kernel

Summary: Kernel OOPS using gdth module on RH 7.3 2.4.18-3smp kernel

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	7.3
Hardware:	i386
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	---
Assignee:	Doug Ledford
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2002-06-17 20:33 UTC by Rob Landry
Modified:	2007-03-27 03:54 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2003-06-07 18:48:17 UTC
Embargoed:

Attachments	(Terms of Use)

Description Rob Landry 2002-06-17 20:33:49 UTC

Kernel OOPS during the function call scsi_get_host_dev in 2.4.18-3smp (RH 7.3)

On a standard RH 7.3 smp install on a dual Processor machine with Intel
Integrated RAID controller using kernel 2.4.18-3smp, with the "gdth" module
loaded, I get segmentation fault during:
1. Shutdown/halt or "rmmod gdth"(basically calling gdth_flush())
2. Accessing the "gdth" information from the /proc filesystem, with say "cat
/proc/scsi/gdth/3".

Note: This problem occurs ONLY with the RH kernel source for 2.4.18-3. The
2.4.18 (as downloaded from kernel.org) used with the
kernel-2.4.18-i686-smp.config does not exhibit this behaviour. Which leads me to
believe some Redhat custom kernel patch might be responsible. Also the problem
occurs only in SMP kernels.

Preliminary debugging shows that the function scsi_get_host_dev (scsi.c) seems
to be causing the segmentation fault. This function is called twice in
gdth_proc.c (gdth_get_info() and gdth_set_info() ) and once in gdth.c (
gdth_flush() )

Following is the OOPS for "cat /proc/scsi/gdth/3":

Jun 13 12:12:19 localhost kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000000
Jun 13 12:12:19 localhost kernel:  printing eip:
Jun 13 12:12:19 localhost kernel: d080e2f1
Jun 13 12:12:19 localhost kernel: *pde = 00000000
Jun 13 12:12:19 localhost kernel: Oops: 0002
Jun 13 12:12:19 localhost kernel: gdth binfmt_misc autofs eepro100 usb-ohci
usbcore ext3 jbd aic7xxx sd_mod scsi
Jun 13 12:12:19 localhost kernel: CPU:    1
Jun 13 12:12:19 localhost kernel: EIP:    0010:[<d080e2f1>]    Not tainted
Jun 13 12:12:19 localhost kernel: EFLAGS: 00010086
Jun 13 12:12:19 localhost kernel:
Jun 13 12:12:19 localhost kernel: EIP is at scsi_build_commandblocks [scsi_mod]
0x21 (2.4.18-3smp)
Jun 13 12:12:19 localhost kernel: eax: 00000000   ebx: cd3a9a00   ecx: 00000000
  edx: cd3a9a18
Jun 13 12:12:20 localhost kernel: esi: c0350000   edi: cd3a9b10   ebp: 00000000
  esp: cde03a90
Jun 13 12:12:20 localhost kernel: ds: 0018   es: 0018   ss: 0018
Jun 13 12:12:20 localhost kernel: Process cat (pid: 1308, stackpage=cde03000)
Jun 13 12:12:20 localhost kernel: Stack: cd3a9a18 c0350000 00000286 cd3a9a00
c0350000 cd3a9b10 00000000 d080fcdb
Jun 13 12:12:20 localhost kernel:        cd3a9a00 00000000 d08f7ce0 cde03ea3
d08f0574 c0350000 00000000 00000000
Jun 13 12:12:20 localhost kernel:        cde03ea3 00000000 00000000 00000000
00000004 c013117c 00000292 00001000
Jun 13 12:12:20 localhost kernel: Call Trace: [<d080fcdb>]
scsi_get_host_dev_Rsmp_f651c1f2 [scsi_mod] 0x4b
Jun 13 12:12:20 localhost kernel: [<d08f7ce0>] .rodata.str1.32 [gdth] 0x5c0
Jun 13 12:12:20 localhost kernel: [<d08f0574>] gdth_get_info [gdth] 0xa8
Jun 13 12:12:20 localhost kernel: [<c013117c>] filemap_nopage [kernel] 0xbc
Jun 13 12:12:20 localhost kernel: [<c0139d92>] __alloc_pages [kernel] 0x72
Jun 13 12:12:20 localhost kernel: [<c013ee4b>] page_add_rmap [kernel] 0x3b
Jun 13 12:12:20 localhost kernel: [<c012cdfd>] do_no_page [kernel] 0x23d
Jun 13 12:12:20 localhost kernel: [<d086a735>] ext3_mark_iloc_dirty [ext3] 0x25
Jun 13 12:12:20 localhost kernel: [<c012d3a6>] vm_enough_memory [kernel] 0x36
Jun 13 12:12:20 localhost kernel: [<c022ef60>] rb_insert_color [kernel] 0x70
Jun 13 12:12:20 localhost kernel: [<c0117a2d>] do_page_fault [kernel] 0x12d
Jun 13 12:12:20 localhost kernel: [<c0117900>] do_page_fault [kernel] 0x0
Jun 13 12:12:20 localhost kernel: [<c0108d5c>] error_code [kernel] 0x34
Jun 13 12:12:20 localhost kernel: [<c0117900>] do_page_fault [kernel] 0x0
Jun 13 12:12:20 localhost kernel: [<c0108d5c>] error_code [kernel] 0x34
Jun 13 12:12:20 localhost kernel: [<c022daeb>] clear_user [kernel] 0x2b
Jun 13 12:12:20 localhost kernel: [<c0160b2c>] padzero [kernel] 0x1c
Jun 13 12:12:20 localhost kernel: [<c0161c32>] load_elf_binary [kernel] 0x9e2
Jun 13 12:12:20 localhost kernel: [<c0139d92>] __alloc_pages [kernel] 0x72
Jun 13 12:12:20 localhost kernel: [<c013ee4b>] page_add_rmap [kernel] 0x3b
Jun 13 12:12:20 localhost kernel: [<c012cba7>] do_anonymous_page [kernel] 0x117
Jun 13 12:12:20 localhost kernel: [<d08ef195>] gdth_proc_info [gdth] 0x111
Jun 13 12:12:20 localhost kernel: [<d0811a2f>] proc_scsi_read [scsi_mod] 0x3f
Jun 13 12:12:20 localhost kernel: [<c0164b10>] proc_file_read [kernel] 0xe0
Jun 13 12:12:20 localhost kernel: [<c0141ad6>] sys_read [kernel] 0x96
Jun 13 12:12:20 localhost kernel: [<c012d5b5>] sys_brk [kernel] 0xc5
Jun 13 12:12:20 localhost kernel: [<c0108c6b>] system_call [kernel] 0x33
Jun 13 12:12:20 localhost kernel:
Jun 13 12:12:20 localhost kernel:
Jun 13 12:12:20 localhost kernel: Code: f0 fe 08 0f 88 ad 1d 00 00 80 bb 05 01
00 00 00 75 19 8b 54
 
----------
Action by: bojikannanthanam
Issue Registered
----------
Action by: rlandry
Is this same behavior present with the 2.4.18-4 errata kernel for 7.3?

Category set to: Kernel
Status set to: Waiting on Client
Issue assigned to: rlandry

----------
Action by: bojikannanthanam
The same behaviour is present with the 2.4.18-4 errata kernel for 7.3. I used
kernel-smp-2.4.18-4.i686.rpm.

Comment 1 Boji Tony Kannanthanam 2002-08-21 19:08:36 UTC

The same bug exists with Redhat Beta "null" (kernel 2.4.18-11).
This bug was resolved using 2.4.18-5 for RH 7.3. Seems like the patch did not 
make it to Redhat beta ?

Comment 2 Michael K. Johnson 2002-08-26 16:51:09 UTC

In initial testing with the new board you sent, we have not reproduced
this.

Perhaps you need to tell us more about the configuration in which you
are seeing the problem.  Could you do that, please?

Comment 3 Michael K. Johnson 2002-08-26 19:10:05 UTC

From: "Kannanthanam, Boji T" <boji.t.kannanthanam>

We saw the problem on the standard SMP kernel on Redhat 7.3 and RH Beta
(Null).

To reproduce the problem just do a "cat /proc/scsi/gdth/# " where "#" is the
nth SCSI controller on the system.
You can also see the Kernel OOPS when you unload the "gdth" module.

I usually have Redhat installed on a RAID driver on the controller.

Comment 4 Michael K. Johnson 2002-08-26 19:16:46 UTC

The cat /proc/scsi/gdth/# is exactly what I did to try to reproduce
the problem.

Comment 5 Arjan van de Ven 2003-06-07 18:48:17 UTC

this got fixed several errata ago

Comment 6 Achim Leubner 2003-09-24 13:21:07 UTC

The problem occurs again in Red Hat Enterprise Linux release 2.9.5AS, kernel 
2.4.21-1.1931.2.399.entsnmp!

I debugged it and found out what's going wrong. The problem occurs in 
scsi_build_commandblocks() in scsi.c, called from scsi_get_host_dev(). 
scsi_get_host_dev() allocates a new Scsi_Device structure, sets the elements 
to 0 and gives the structure pointer to scsi_build_commandblocks(). This 
function calls spin_lock_irqsave() with request_queue.queue_lock as the first 
parameter. But this element is not initialized and a NULL pointer!

The problem doesn't occur with the "original" 2.4.21 kernel. In this kernel 
scsi_build_commandblocks() calls spin_lock_irqsave() with a pointer to a 
defined spinlock_t structure "device_request_lock" as the first parameter.

Comment 7 Jason Sauve 2003-10-30 20:38:09 UTC

See Bugzilla Bug # 104520 for resolution with RHEL 3.0

Note You need to log in before you can comment on or make changes to this bug.