Bug 609975

Summary: kernel BUG at kernel/timer.c:951! EIP: SendIocReset... [mptbase]
Product: Red Hat Enterprise Linux 6 Reporter: Jan Stodola <jstodola>
Component: kernelAssignee: Tomas Henzl <thenzl>
Status: CLOSED NOTABUG QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 6.0CC: andriusb, coughlan, eric.moore, kashyap.desai, kzhang, peterm, sathya.prakash
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-07-27 10:46:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 582286    

Description Jan Stodola 2010-07-01 11:19:02 UTC
Description of problem:
This is what I found in console.log when running installation test:

...
running /sbin/loader 
 %Gdetecting hardware... 
waiting for hardware to initialize... 
[-- MARK -- Thu Jul  1 06:55:00 2010] 
BUG: unable to handle kernel  
tg3.c:v3.108 (February 17, 2010) 
  alloc irq_desc for 31 on node -1 
  alloc kstat_irqs on node -1 
tg3 0000:0e:03.0: PCI INT A -> GSI 31 (level, low) -> IRQ 31 
NULL pointer dereference at (null) 
IP: [<f8068a16>] SendIocReset+0x46/0x110 [mptbase] 
*pdpt = 00000000359e2001 *pde = 0000000000000000  
Oops: 0000 [#1] SMP  
last sysfs file: /sys/module/mptspi/initstate 
Modules linked in: tg3(+)(U) mptspi(U) mptscsih(U) mptbase(U) scsi_transport_spi(U) sr_mod(U) cdrom(U) ata_generic(U) pata_acpi(U) pata_amd(U) ipv6(U) iscsi_ibft(U) pcspkr(U) edd(U) floppy(U) iscsi_tcp(U) libiscsi_tcp(U) libiscsi(U) scsi_transport_iscsi(U) squashfs(U) cramfs(U) 
 
Pid: 334, comm: scsi_scan_2 Not tainted (2.6.32-37.el6.i686 #1) Quartet 
EIP: 0060:[<f8068a16>] EFLAGS: 00010246 CPU: 0 
EIP is at SendIocReset+0x46/0x110 [mptbase] 
EAX: 00000000 EBX: c1e2c000 ECX: c0b7a500 EDX: 00000000 
ESI: 20000000 EDI: 00000001 EBP: 00001389 ESP: c1e07b60 
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 
Process scsi_scan_2 (pid: 334, ti=c1e06000 task=f58f8560 task.ti=c1e06000) 
Stack: 
 f58f8560 fffbe1d2 00000286 00000000 c1e2c000 00000001 000003e7 00000001 
<0> f8069157 00000000 00000000 00000000 00000000 30f6ea3b 00000000 c1e2c000 
<0> 20000000 00000003 c1e2c000 00000000 00000001 f8069449 f80729e4 c1e2c008 
Call Trace: 
 [<f8069157>] ? KickStart+0x677/0x8f0 [mptbase] 
 [<f8069449>] ? MakeIocReady+0x79/0x380 [mptbase] 
 [<f806a469>] ? mpt_do_ioc_recovery+0x299/0x18d0 [mptbase] 
 [<c04421a3>] ? finish_task_switch+0x33/0xa0 
 [<c0816a7f>] ? schedule+0x42f/0xae0 
 [<c0474ebb>] ? up+0xb/0x40 
 [<c044ff26>] ? release_console_sem+0x1a6/0x1f0 
 [<c045f640>] ? process_timeout+0x0/0x10 
 [<f80a6c57>] ? mptspi_ioc_reset+0x17/0x50 [mptspi] 
 [<f806bb53>] ? mpt_HardResetHandler+0xb3/0x220 [mptbase] 
 [<f806c220>] ? mpt_config+0x380/0x550 [mptbase] 
 [<f80a7b50>] ? mptspi_write_spi_device_pg1+0x160/0x450 [mptspi] 
 [<c081afd0>] ? do_page_fault+0x1c0/0x480 
 [<c043671a>] ? kmap_atomic_prot+0x11a/0x150 
 [<c05e5404>] ? vsnprintf+0xd4/0x400 
 [<f80a7e9d>] ? mptspi_write_width+0x5d/0x70 [mptspi] 
 [<f80a8020>] ? mptspi_target_alloc+0x170/0x270 [mptspi] 
 [<c06a60d1>] ? attribute_container_add_device+0x51/0x180 
 [<c06ba988>] ? scsi_alloc_target+0x248/0x2b0 
 [<c06bb956>] ? __scsi_scan_target+0x66/0x6d0 
 [<c04081e7>] ? __switch_to+0xd7/0x1a0 
 [<c06bc037>] ? scsi_scan_channel+0x77/0x90 
 [<c06bc131>] ? scsi_scan_host_selected+0xe1/0x170 
 [<c06bc236>] ? do_scsi_scan_host+0x76/0x80 
 [<c06bc251>] ? do_scan_async+0x11/0x120 
 [<c06bc240>] ? do_scan_async+0x0/0x120 
 [<c0470094>] ? kthread+0x74/0x80 
 [<c0470020>] ? kthread+0x0/0x80 
 [<c040a547>] ? kernel_thread_helper+0x7/0x10 
Code: f2 8b 83 e8 00 00 00 c1 e6 18 89 30 89 fa 89 d8 e8 a0 e7 ff ff 31 ed 85 c0 0f 88 89 00 00 00 8d b6 00 00 00 00 8b 83 e8 00 00 00 <8b> 30 81 e6 00 00 00 f0 81 fe 00 00 00 10 89 b3 0c 01 00 00 74  
EIP: [<f8068a16>] SendIocReset+0x46/0x110 [mptbase] SS:ESP 0068:c1e07b60 
CR2: 0000000000000000 
------------[ cut here ]------------ 
kernel BUG at kernel/timer.c:951! 
invalid opcode: 0000 [#2] SMP  
last sysfs file: /sys/module/mptspi/initstate 
Modules linked in: tg3(+)(U) mptspi(U) mptscsih(U) mptbase(U) scsi_transport_spi(U) sr_mod(U) cdrom(U) ata_generic(U) pata_acpi(U) pata_amd(U) ipv6(U)


Version-Release number of selected component (if applicable):
RHEL6.0-20100622.1 / i386 / Server
kernel-2.6.32-37.el6.i686.rpm  

How reproducible:
tried only once

Steps to Reproduce:
1. try to install RHEL6.0-20100622.1 / i386 / Server in Beaker

Actual results:
kernel panic

Expected results:
successful installation and boot

Comment 3 Tom Coughlan 2010-07-19 22:30:56 UTC
LSI folks: this looks like it is in mpt fusion. Please take a look. 

(In reply to comment #0)

> How reproducible:
> tried only once

Jan, it looks like the system tries to do a write, then we land in  mpt_HardResetHandler,
then somewhere in the error recovery path, we get a NULL pointer dereference. 

This could be caused by a hardware error. The system should not crash, but it might be helpful to know whether the problem is reproducible, and whether the hardware works with other versions of the o.s.. If you can give it a try that would be appreciated.

Comment 4 kashyap 2010-07-20 07:09:54 UTC
(In reply to comment #3)
> LSI folks: this looks like it is in mpt fusion. Please take a look. 
> 
> (In reply to comment #0)
> 
> > How reproducible:
> > tried only once
> 
> Jan, it looks like the system tries to do a write, then we land in 
> mpt_HardResetHandler,
> then somewhere in the error recovery path, we get a NULL pointer dereference. 
> 
> This could be caused by a hardware error. The system should not crash, but it
> might be helpful to know whether the problem is reproducible, and whether the
> hardware works with other versions of the o.s.. If you can give it a try that
> would be appreciated.    

I agree with Tom. Meanwhile can you attach object dump for mptbase.
"objdump -Sd mptbase.o > mptbase.dump"

Thanks, Kashyap

Comment 6 Tomas Henzl 2010-07-27 10:46:45 UTC
The latest indicates that this is most probably a hardware problem.
I'm closing this one now.