116853 – (SCSI MEGARAID)oops on shutdown

Bug 116853 - (SCSI MEGARAID)oops on shutdown

Summary: (SCSI MEGARAID)oops on shutdown

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	2
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Dave Jones
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	FC2Target FC3Target FC4Target
TreeView+	depends on / blocked

Reported:	2004-02-25 19:16 UTC by Dan Christian
Modified:	2015-01-04 22:04 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2005-04-16 04:53:02 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Dan Christian 2004-02-25 19:16:29 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030922

Description of problem:
On a 2x Xeon 2.4Ghz, 1Gb RAM, Megaraid disk controller (DELL PERC4)
using RAID-1 (2 disks).

I get an OOPS on shutdown about half the time.  The exact position
varies, but it seems to always be in the final un-mount/sync.


Version-Release number of selected component (if applicable):
2.6.3-1.91smp

How reproducible:
Sometimes

Steps to Reproduce:
1. Boot system
2. Excercise disks for 10 minutes
3. Reboot
    

Additional info:


Here are the console messages and oops for 3 events.

Sending all processes the KILL signal...
Syncing hardware clock to system time
Turning off swap:
Unmounting file systems:
Please stand by while rebooting the system...
md: stopping all md devices.
md: md0 switched to read-only mode.
Unable to handle kernel paging request at virtual address f885ff20
 printing eip:
c012d8cc
*pde = 00000000
Oops: 0002 [#1]
CPU:    0
EIP:    0060:[<c012d8cc>]    Not tainted
EFLAGS: 00010002
EIP is at internal_add_timer+0x84/0x8c
eax: f885ff20   ebx: c3974660   ecx: c39750f0   edx: f8a939a8
esi: 0003a161   edi: f8a939a8   ebp: 00000246   esp: c0377ef8
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c0376000 task=c02f05c0)
Stack: 00000000 c3974660 c012daf1 00000000 00041690 00000000 c0145684
00000001
       0000000f c03c40b0 f7fff8d8 00000003 00000000 f8a93920 00041690
f8a939a0
       c025d512 c0377f00 f8a93aa8 c3974660 f8a93920 c025d3bd c0377f5c
c012e39b
Call Trace:
 [<c012daf1>] __mod_timer+0x21d/0x2a3
 [<c0145684>] slab_destroy+0x121/0x154
 [<c025d512>] neigh_periodic_timer+0x155/0x167
 [<c025d3bd>] neigh_periodic_timer+0x0/0x167
 [<c012e39b>] run_timer_softirq+0x16a/0x1dd
 [<c0120127>] recalc_task_prio+0x141/0x14c
 [<c012a12d>] do_softirq+0x5d/0xb5
 [<c011b82f>] smp_apic_timer_interrupt+0x124/0x129
 [<c0105000>] _stext+0x0/0x65
 [<c010c0ca>] apic_timer_interrupt+0x1a/0x20
 [<c0109018>] default_idle+0x0/0x2c
 [<c0105000>] _stext+0x0/0x65
 [<c0109041>] default_idle+0x29/0x2c
 [<c010909d>] cpu_idle+0x26/0x3b
 [<c037874b>] start_kernel+0x1b2/0x1b7
 
Code: 89 10 5b 89 42 04 5e c3 55 57 89 c7 56 53 83 ec 24 89 54 24

====
Sending all processes the KILL signal...
Syncing hardware clock to system time
Turning off swap:
Unmounting file systems:
Unable to handle kernel paging request at virtual address f885ff20
 printing eip:
c012d8cc
*pde = 00003631
Oops: 0002 [#1]
CPU:    0
EIP:    0060:[<c012d8cc>]    Not tainted
EFLAGS: 00010002
EIP is at internal_add_timer+0x84/0x8c
eax: f885ff20   ebx: c3974660   ecx: c39750f0   edx: f205c38c
esi: 00038fe5   edi: f205c38c   ebp: 00000246   esp: f554bd38
ds: 007b   es: 007b   ss: 0068
Process mount (pid: 8241, threadinfo=f554a000 task=f52932f0)
Stack: 00000000 c3974660 c012daf1 00000000 00040515 000000a0 ffffffff
00000000
       3720fe00 00000000 3720fc00 00000000 f205c324 f205c324 f6ed8000
f7bc3290
       f8837745 00000000 0217e000 00000000 f184fa98 c021568b f205c324
f6e616c0
Call Trace:
 [<c012daf1>] __mod_timer+0x21d/0x2a3
 [<f8837745>] scsi_dispatch_cmd+0xcb/0x280 [scsi_mod]
 [<c021568b>] as_remove_request+0xa0/0xab
 [<f883c833>] scsi_request_fn+0x29b/0x3dc [scsi_mod]
 [<f883c558>] scsi_prep_fn+0x123/0x163 [scsi_mod]
 [<c020fdda>] generic_unplug_device+0x6b/0x9b
 [<c020ff84>] blk_run_queues+0xbf/0x12e
 [<c0162123>] block_sync_page+0x5/0x8
 [<c013ed60>] __lock_page+0x84/0xa7
 [<c0124037>] autoremove_wake_function+0x0/0x28
 [<c0124037>] autoremove_wake_function+0x0/0x28
 [<c013edc1>] find_get_page+0x3e/0x83
 [<c013f4fc>] do_generic_mapping_read+0x1e2/0x485
 [<c01afad0>] avc_has_perm_noaudit+0x157/0x279
 [<c013f79f>] file_read_actor+0x0/0xc9
 [<c013fa14>] __generic_file_aio_read+0x1ac/0x1cc
 [<c013f79f>] file_read_actor+0x0/0xc9
 [<c013fae1>] generic_file_read+0x66/0x7d
 [<c01b2ccf>] selinux_file_permission+0x127/0x131
 [<c0164df4>] block_llseek+0x23/0xbd
 [<c015d74c>] vfs_read+0xb8/0xe4
 [<c015d925>] sys_read+0x2c/0x42
 [<c010b663>] syscall_call+0x7/0xb
 
Code: 89 10 5b 89 42 04 5e c3 55 57 89 c7 56 53 83 ec 24 89 54 24

====
Sending all processes the KILL signal...
Syncing hardware clock to system time
Turning off swap:
Unmounting file systems:
Please stand by while rebooting the system...
md: stopping all md devices.
md: md0 switched to read-only mode.
Unable to handle kernel paging request at virtual address f885ff20
 printing eip:
c012d8cc
*pde = 00303237
Oops: 0002 [#1]
CPU:    0
EIP:    0060:[<c012d8cc>]    Not tainted
EFLAGS: 00010002
EIP is at internal_add_timer+0x84/0x8c
eax: f885ff20   ebx: c3974660   ecx: c39750f0   edx: f8a939a8
esi: 0003a1a1   edi: f8a939a8   ebp: 00000246   esp: f5513d84
ds: 007b   es: 007b   ss: 0068
Process reboot (pid: 8073, threadinfo=f5512000 task=f5f26000)
Stack: 00000000 c3974660 c012daf1 00000000 000416d0 00000000 c0145684
00000000
       0000000f c03c40b0 f7fff8d8 00000003 00000000 f8a93920 000416d0
f8a939a0
       c025d512 f5513d00 f8a93aa8 c3974660 f8a93920 c025d3bd f5513de8
c012e39b
Call Trace:
 [<c012daf1>] __mod_timer+0x21d/0x2a3
 [<c0145684>] slab_destroy+0x121/0x154
 [<c025d512>] neigh_periodic_timer+0x155/0x167
 [<c025d3bd>] neigh_periodic_timer+0x0/0x167
 [<c012e39b>] run_timer_softirq+0x16a/0x1dd
 [<c012a12d>] do_softirq+0x5d/0xb5
 [<c011b82f>] smp_apic_timer_interrupt+0x124/0x129
 [<c010c0ca>] apic_timer_interrupt+0x1a/0x20
 [<c0115a7a>] delay_tsc+0xb/0x13
 [<c01bf9b9>] __delay+0x9/0xa
 [<f8830f2a>] __megaraid_shutdown+0x7f/0x92 [megaraid]
 [<c020d704>] device_shutdown+0x4f/0x7a
 [<c0133707>] sys_reboot+0x125/0x368
 [<c014f83e>] handle_mm_fault+0xdf/0x1cd
 [<c01afea1>] inode_free_security+0xa5/0xac
 [<c01760fb>] destroy_inode+0x36/0x45
 [<c01760fb>] destroy_inode+0x36/0x45
 [<c0177632>] generic_forget_inode+0x16c/0x171
 [<c0173887>] dput+0x1b/0x287
 [<c015e559>] __fput+0xc4/0xe3
 [<c015ced8>] filp_close+0x59/0x5f
 [<c015cf7e>] sys_close+0xa0/0xd3
 [<c010b663>] syscall_call+0x7/0xb
 
Code: 89 10 5b 89 42 04 5e c3 55 57 89 c7 56 53 83 ec 24 89 54 24

Comment 1 Sahil Verma 2004-03-20 02:39:13 UTC

Timer list corruption. Does this happen without the megaraid module
loaded? With a 2.6.1-xx kernel?

Comment 2 Dan Christian 2004-03-22 16:51:19 UTC

I need the megaraid module to boot, so I can't try it without it.

I never tried 2.6.1*.  Are there still RPMs for it someplace?

Comment 3 Sahil Verma 2004-03-22 18:41:33 UTC

Oh crap, I thought so.

FC 2 test 1 has a 2.6.1-1.65 smp kernel.

Comment 4 Dan Christian 2004-04-02 21:36:49 UTC

I was running watchdog-5.2 (the software watchdog).

If I disable the software watchdog, then I don't see the Oops anymore
(after 10 tries).

If I try to do process monitoring (pidfile = /var/run/crond.pid), then
the system goes unstable within seconds.

Should I file a separate bug against the softdog module?

Comment 5 Miloslav Trmač 2004-11-16 14:24:09 UTC

*** Bug 130089 has been marked as a duplicate of this bug. ***

Comment 6 Dave Jones 2004-12-08 05:38:45 UTC

fixed in the latest update ?

Comment 7 Dave Jones 2005-04-16 04:53:02 UTC

Fedora Core 2 has now reached end of life, and no further updates will be
provided by Red Hat.  The Fedora legacy project will be producing further kernel
updates for security problems only.

If this bug has not been fixed in the latest Fedora Core 2 update kernel, please
try to reproduce it under Fedora Core 3, and reopen if necessary, changing the
product version accordingly.

Thank you.

Note You need to log in before you can comment on or make changes to this bug.