Bug 483171 - Panic at boot if SATA disk is present
Summary: Panic at boot if SATA disk is present
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.3
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: rc
: 5.4
Assignee: David Milburn
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
: 491283 (view as bug list)
Depends On:
Blocks: 459515 483701 483784 485909 485920
TreeView+ depends on / blocked
 
Reported: 2009-01-29 23:13 UTC by Robert N. Evans
Modified: 2009-09-02 08:05 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-09-02 08:05:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Fixup .qc_issue and .qc_fill_rtf in sas_sata_ops (677 bytes, patch)
2009-02-03 22:58 UTC, David Milburn
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:1243 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.4 kernel security and bug fix update 2009-09-01 08:53:34 UTC

Description Robert N. Evans 2009-01-29 23:13:36 UTC
Description of problem:
Panic at boot when aic94xx is loading

Version-Release number of selected component (if applicable):
2.6.18-128.el5  Have also reproduced this problem on pre-releases of RHEL 5.3 back to kernel-2.6.18-120.el5.x86_64.  This problem does not occur on the same hardware with RHEL 5.2.

How reproducible:
Seems to happen every time we try, unless there is no SATA disk present.
Problem does not occur if only SAS disks are present.

Steps to Reproduce:
1. Insert SATA disk.
2. Boot RHEL 5.3.
3. Watch it panic.
  
Actual results:
Panic at boot as shown below.

Expected results:
No panic.

Additional info:
Booted with RHEL5.3 install disk in rescue mode to eliminate configuration variables.  Serial console output from that follows.  Stack of exception is at the end of this log...

Linux version 2.6.18-128.el5 (mockbuild.redhat.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Wed Dec 17 11:41:38 EST 2008
Command line: initrd=initrd.img BOOT_IMAGE=vmlinuz i8042.noaux rescue console=ttyS0,115200
BIOS-provided physical RAM map:                                                 
 BIOS-e820: 0000000000000000 - 000000000009b400 (usable)                        
 BIOS-e820: 000000000009b400 - 00000000000a0000 (reserved)                      
 BIOS-e820: 00000000000d2000 - 00000000000d4000 (reserved)                      
 BIOS-e820: 00000000000dc000 - 0000000000100000 (reserved)                      
 BIOS-e820: 0000000000100000 - 0000000000e00000 (usable)                        
 BIOS-e820: 0000000000e00000 - 0000000000e40000 (reserved)                      
 BIOS-e820: 0000000000e40000 - 000000007fef0000 (usable)                        
 BIOS-e820: 000000007fef0000 - 000000007fef9000 (ACPI data)                     
 BIOS-e820: 000000007fef9000 - 000000007ff00000 (ACPI NVS)                      
 BIOS-e820: 000000007ff00000 - 0000000080000000 (reserved)                      
 BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)                      
 BIOS-e820: 00000000fec80000 - 00000000fec90000 (reserved)                      
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)                      
 BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)                      
DMI present.                                                                    
No NUMA configuration found                                                     
Faking a node at 0000000000000000-000000007fef0000                              
Bootmem setup node 0 0000000000000000-000000007fef0000                          
Memory for crash kernel (0x0 to 0x0) notwithin permissible range                
disabling kdump                                                                 
ACPI: PM-Timer IO Port: 0x508                                                   
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)                              
Processor #0 7:7 APIC version 20
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x04] enabled)
Processor #4 7:7 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 7:7 APIC version 20
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x05] enabled)
Processor #5 7:7 APIC version 20
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x02] enabled)
Processor #2 7:7 APIC version 20
ACPI: LAPIC (acpi_id[0x05] lapic_id[0x06] enabled)
Processor #6 7:7 APIC version 20
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x03] enabled)
Processor #3 7:7 APIC version 20
ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled)
Processor #7 7:7 APIC version 20
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1])
ACPI: IOAPIC (id[0x08] address[0xfec80000] gsi_base[0])
IOAPIC[0]: apic_id 8, version 16, address 0xfec80000, GSI 0-35
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 8 global_irq 8 low edge)
ACPI: NMI_SRC (high edge global_irq 34)
Setting APIC routing to physical flat
Using ACPI (MADT) for SMP configuration information
Nosave address range: 000000000009b000 - 000000000009c000
Nosave address range: 000000000009c000 - 00000000000a0000
Nosave address range: 00000000000a0000 - 00000000000d2000
Nosave address range: 00000000000d2000 - 00000000000d4000
Nosave address range: 00000000000d4000 - 00000000000dc000
Nosave address range: 00000000000dc000 - 0000000000100000
Nosave address range: 0000000000e00000 - 0000000000e40000
Allocating PCI resources starting at 88000000 (gap: 80000000:60000000)
SMP: Allowing 8 CPUs, 0 hotplug CPUs
Built 1 zonelists.  Total pages: 513799
Kernel command line: initrd=initrd.img BOOT_IMAGE=vmlinuz i8042.noaux rescue console=ttyS0,115200
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
Console: colour VGA+ 80x25
Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
Checking aperture...
Memory: 2051520k/2096064k available (2494k kernel code, 43884k reserved, 1263k data, 200k init)
Calibrating delay using timer specific routine.. 4002.46 BogoMIPS (lpj=2001231)
Security Framework v1.0.0 initialized
SELinux:  Initializing.
selinux_register_security:  Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 256
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 6144K
using mwait in idle threads.
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
CPU0: Thermal monitoring enabled (TM1)
SMP alternatives: switching to UP code
ACPI: Core revision 20060707
Using local APIC timer interrupts.
result 20834957
Detected 20.834 MHz APIC timer.
SMP alternatives: switching to SMP code
Booting processor 1/8 APIC 0x4
Initializing CPU#1
Calibrating delay using timer specific routine.. 3999.61 BogoMIPS (lpj=1999805)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 6144K
CPU: Physical Processor ID: 1
CPU: Processor Core ID: 0
CPU1: Thermal monitoring enabled (TM1)
Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz stepping 06
SMP alternatives: switching to SMP code
Booting processor 2/8 APIC 0x1
Initializing CPU#2
Calibrating delay using timer specific routine.. 3999.60 BogoMIPS (lpj=1999804)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 6144K
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
CPU2: Thermal monitoring enabled (TM1)
Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz stepping 06
SMP alternatives: switching to SMP code
Booting processor 3/8 APIC 0x5
Initializing CPU#3
Calibrating delay using timer specific routine.. 3999.60 BogoMIPS (lpj=1999803)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 6144K
CPU: Physical Processor ID: 1
CPU: Processor Core ID: 1
CPU3: Thermal monitoring enabled (TM1)
Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz stepping 06
SMP alternatives: switching to SMP code
Booting processor 4/8 APIC 0x2
Initializing CPU#4
Calibrating delay using timer specific routine.. 3999.59 BogoMIPS (lpj=1999797)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 6144K
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 2
CPU4: Thermal monitoring enabled (TM1)
Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz stepping 06
SMP alternatives: switching to SMP code
Booting processor 5/8 APIC 0x6
Initializing CPU#5
Calibrating delay using timer specific routine.. 3999.60 BogoMIPS (lpj=1999802)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 6144K
CPU: Physical Processor ID: 1
CPU: Processor Core ID: 2
CPU5: Thermal monitoring enabled (TM1)
Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz stepping 06
SMP alternatives: switching to SMP code
Booting processor 6/8 APIC 0x3
Initializing CPU#6
Calibrating delay using timer specific routine.. 3999.60 BogoMIPS (lpj=1999803)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 6144K
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 3
CPU6: Thermal monitoring enabled (TM1)
Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz stepping 06
SMP alternatives: switching to SMP code
Booting processor 7/8 APIC 0x7
Initializing CPU#7
Calibrating delay using timer specific routine.. 3999.60 BogoMIPS (lpj=1999802)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 6144K
CPU: Physical Processor ID: 1
CPU: Processor Core ID: 3
CPU7: Thermal monitoring enabled (TM1)
Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz stepping 06
Brought up 8 CPUs
testing NMI watchdog ... OK.
time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer.
time.c: Detected 2000.156 MHz processor.
migration_cost=26,12762
checking if image is initramfs... it is
Freeing initrd memory: 6485k freed
NET: Registered protocol family 16
No dock devices found.
ACPI: bus type pci registered
PCI: Using MMCONFIG at e0000000
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: PXH quirk detected, disabling MSI for SHPC device
PCI: PXH quirk detected, disabling MSI for SHPC device
PCI: Transparent bridge - 0000:0b:1e.0
PCI: PXH quirk detected, disabling MSI for SHPC device
PCI: PXH quirk detected, disabling MSI for SHPC device
PCI: Transparent bridge - 0000:7e:1e.0
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
pnp: PnP ACPI: found 10 devices
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
PCI: Cannot allocate resource region 0 of device 0000:01:00.1
NetLabel: Initializing
NetLabel:  domain hash size = 128
NetLabel:  protocols = UNLABELED CIPSOv4
NetLabel:  unlabeled traffic allowed by default
PCI-GART: No AMD northbridge found.
PCI: mem resource #9:100000@c4100000 for 0000:05:00.0 was not allocated.
PCI: mem resource #6:40000@84400000 for 0000:06:01.0 was not allocated.
PCI: mem resource #6:40000@84400000 for 0000:06:01.1 was not allocated.
PCI: Bridge: 0000:05:00.0
  IO window: 1000-1fff
  MEM window: 84200000-843fffff
  PREFETCH window: disabled.
PCI: Bridge: 0000:05:00.2
  IO window: 2000-2fff
  MEM window: 84400000-844fffff
  PREFETCH window: c4000000-c40fffff
PCI: Bridge: 0000:04:00.0
  IO window: 1000-2fff
  MEM window: 84100000-844fffff
  PREFETCH window: c4000000-c40fffff
PCI: Bridge: 0000:04:01.0
  IO window: disabled.
  MEM window: disabled.
  PREFETCH window: disabled.
PCI: Bridge: 0000:04:02.0
  IO window: 3000-3fff
  MEM window: 84500000-845fffff
  PREFETCH window: disabled.
PCI: Bridge: 0000:03:00.0
  IO window: 1000-3fff
  MEM window: 84100000-845fffff
  PREFETCH window: c4000000-c40fffff
PCI: Bridge: 0000:03:00.3
  IO window: 4000-4fff
  MEM window: 84600000-848fffff
  PREFETCH window: 84b00000-84bfffff
PCI: Bridge: 0000:0b:1e.0
  IO window: 6000-6fff
  MEM window: 84a00000-84afffff
  PREFETCH window: c4100000-cfffffff
PCI: Bridge: 0000:03:01.0
  IO window: 5000-6fff
  MEM window: 84900000-84afffff
  PREFETCH window: c4100000-cfffffff
PCI: Bridge: 0000:02:00.0
  IO window: 1000-7fff
  MEM window: 84000000-a3ffffff
  PREFETCH window: c4000000-cfffffff
PCI: mem resource #9:100000@d0100000 for 0000:78:00.0 was not allocated.
PCI: mem resource #6:40000@a4400000 for 0000:79:01.0 was not allocated.
PCI: mem resource #6:40000@a4400000 for 0000:79:01.1 was not allocated.
PCI: Bridge: 0000:78:00.0
  IO window: 8000-8fff
  MEM window: a4200000-a43fffff
  PREFETCH window: disabled.
PCI: Bridge: 0000:78:00.2
  IO window: 9000-9fff
  MEM window: a4400000-a44fffff
  PREFETCH window: d0000000-d00fffff
PCI: Bridge: 0000:77:00.0
  IO window: 8000-9fff
  MEM window: a4100000-a44fffff
  PREFETCH window: d0000000-d00fffff
PCI: Bridge: 0000:77:01.0
  IO window: disabled.
  MEM window: disabled.
  PREFETCH window: disabled.
PCI: Bridge: 0000:77:02.0
  IO window: a000-afff
  MEM window: a4500000-a45fffff
  PREFETCH window: disabled.
PCI: Bridge: 0000:76:00.0
  IO window: 8000-afff
  MEM window: a4100000-a45fffff
  PREFETCH window: d0000000-d00fffff
PCI: Bridge: 0000:76:00.3
  IO window: b000-bfff
  MEM window: a4600000-a48fffff
  PREFETCH window: a4b00000-a4bfffff
PCI: Bridge: 0000:7e:1e.0
  IO window: d000-dfff
  MEM window: a4a00000-a4afffff
  PREFETCH window: d0100000-dfffffff
PCI: Bridge: 0000:76:01.0
  IO window: c000-dfff
  MEM window: a4900000-a4afffff
  PREFETCH window: d0100000-dfffffff
PCI: Bridge: 0000:02:01.0
  IO window: 8000-efff
  MEM window: a4000000-c3ffffff
  PREFETCH window: d0000000-dfffffff
PCI: Bridge: 0000:01:00.0
  IO window: 1000-efff
  MEM window: 80000000-c3ffffff
  PREFETCH window: c4000000-dfffffff
PCI: Bridge: 0000:00:02.0
  IO window: 1000-efff
  MEM window: 80000000-c3ffffff
  PREFETCH window: c4000000-dfffffff
PCI: Bridge: 0000:00:03.0
  IO window: disabled.
  MEM window: disabled.
  PREFETCH window: disabled.
GSI 16 sharing vector 0xA9 and IRQ 16
ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 35 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 35 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 35 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 35 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 35 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:05:00.2[B] -> GSI 35 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:04:01.0[A] -> GSI 35 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:04:02.0[A] -> GSI 35 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:03:00.3[A] -> GSI 35 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:03:01.0[A] -> GSI 1 (level, low) -> IRQ 1
ACPI: PCI Interrupt 0000:02:01.0[A] -> GSI 35 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:76:00.0[A] -> GSI 35 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:77:00.0[A] -> GSI 35 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:78:00.0[A] -> GSI 35 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:78:00.2[B] -> GSI 35 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:77:01.0[A] -> GSI 35 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:77:02.0[A] -> GSI 35 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:76:00.3[A] -> GSI 35 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:76:01.0[A] -> GSI 1 (level, low) -> IRQ 1
GSI 17 sharing vector 0xB1 and IRQ 17
ACPI: PCI Interrupt 0000:00:03.0[A] -> GSI 0 (level, low) -> IRQ 177
NET: Registered protocol family 2
IP route cache hash table entries: 65536 (order: 7, 524288 bytes)
TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
TCP: Hash tables configured (established 262144 bind 65536)
TCP reno registered
Simple Boot Flag at 0x35 set to 0x1
audit: initializing netlink socket (disabled)
type=2000 audit(1233023570.709:1): initialized
Total HugeTLB memory allocated, 0
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
Initializing Cryptographic API
alg: No test for crc32c (crc32c-generic)
ksign: Installing public key data
Loading keyring
- Added public key 5C0DC734E64D24FA
- User ID: Red Hat, Inc. (Kernel Module GPG key)
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
Real Time Clock Driver v1.12ac
Non-volatile memory driver v1.2
Linux agpgart interface v0.101 (c) Dave Jones
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
��serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
�00:06: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:07: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
GSI 18 sharing vector 0x52 and IRQ 18
ACPI: PCI Interrupt 0000:0c:01.1[A] -> GSI 22 (level, low) -> IRQ 82
ACPI: PCI interrupt for device 0000:0c:01.1 disabled
ACPI: PCI Interrupt 0000:0c:01.2[A] -> GSI 22 (level, low) -> IRQ 82
0000:0c:01.2: ttyS2 at I/O 0x6450 (irq = 82) is a 16550A
0000:0c:01.2: ttyS3 at I/O 0x6458 (irq = 82) is a 16550A
GSI 19 sharing vector 0x5A and IRQ 19
ACPI: PCI Interrupt 0000:7f:01.1[A] -> GSI 33 (level, low) -> IRQ 90
ACPI: PCI interrupt for device 0000:7f:01.1 disabled
ACPI: PCI Interrupt 0000:7f:01.2[A] -> GSI 33 (level, low) -> IRQ 90
Couldn't register serial port 0000:7f:01.2: -28
RAMDISK driver initialized: 16 RAM disks of 16384K size 4096 blocksize
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ESB2: IDE controller at PCI slot 0000:0b:1f.1
ACPI: PCI Interrupt 0000:0b:1f.1[A] -> GSI 12 (level, low) -> IRQ 12
ESB2: chipset revision 9
ESB2: 100% native mode on irq 12
    ide0: BM-DMA at 0x5060-0x5067, BIOS settings: hda:DMA, hdb:pio
hda: MATSHITADVD-RAM UJ870PC, ATAPI CD/DVD-ROM drive
ide0 at 0x5090-0x5097,0x5086 on irq 12
ESB2: IDE controller at PCI slot 0000:7e:1f.1
PCI: Enabling device 0000:7e:1f.1 (0004 -> 0005)
GSI 20 sharing vector 0x62 and IRQ 20
ACPI: PCI Interrupt 0000:7e:1f.1[A] -> GSI 23 (level, low) -> IRQ 98
Welcome to Red Hat Enterprise Linux Server                                      
                                                                                
                   +--------+ Loading SCSI driver +---------+                   
                   |                                        |                   
                   | Loading aic94xx driver...Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: 
 [<0000000000000000>] _stext+0x7ffff000/0x1000              |                   
PGD 0              +----------------------------------------+                   
Oops: 0010 [1] SMP                                                              
last sysfs file: /devices/pci0000:00/0000:00:00.0/class                         
CPU 0                                                                           
Modules linked in: aic94xx libsas scsi_transport_sas libata uhci_hcd ehci_hcd iscsi_ibft iscsi_tcp libiscsi scsi_transport_iscsi sr_mod sd_mod scsi_mod ide_cd cdrom ipv6 xfrm_nalgo crypto_api squashfs pcspkr edd loop nfs nfs_acl fscache lockd sunrpc vfat fat cramfs
Pid: 952, comm: scsi_wq_0 Not tainted 2.6.18-128.el5 #1                         
RIP: 0010:[<0000000000000000>]  [<0000000000000000>] _stext+0x7ffff000/0x1000   
RSP: 0000:ffff81007e3d39d8  EFLAGS: 00010002                                    
RAX: ffffffff882c2160 RBX: ffff81007e1440e0 RCX: 0000000000000000               
RDX: ffff81007e144000 RSI: ffff81007e4a5358 RDI: ffff81007e1440e0t screen      
RBP: ffff81007e1440e0 R08: ffffffffffffffff R09: 0000000000000001
R10: ffff81007f3f10c0 R11: 0000000000000000 R12: ffff81007e144000
R13: ffff81007e1463f0 R14: ffff81007e146528 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffffff803ac000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
Process scsi_wq_0 (pid: 952, threadinfo ffff81007e3d2000, task ffff81007f0c5820)
Stack:  ffffffff88272606 ffff81007e1440e0 ffff81007e3d3a50 ffff81007e1440e0
 ffff81007e144000 ffff81007e1463f0 ffffffff882730e6 000000017f0c5820
 ffff81007e3d3ac0 000000028ac28261 ffff81007e3d3b60 ecff81007f0c5a08
Call Trace:
 [<ffffffff88272606>] :libata:ata_qc_complete+0x143/0x163
 [<ffffffff882730e6>] :libata:ata_exec_internal_sg+0x267/0x40f
 [<ffffffff88273378>] :libata:ata_exec_internal+0xea/0xf9
 [<ffffffff88273754>] :libata:ata_dev_read_id+0xef/0x3ce
 [<ffffffff882752cc>] :libata:ata_bus_probe+0x1eb/0x4a1
 [<ffffffff88277c1f>] :libata:ata_sas_port_start+0x26/0x35
 [<ffffffff881c6c58>] :scsi_mod:scsi_alloc_sdev+0x17d/0x1c4
 [<ffffffff881c6e29>] :scsi_mod:scsi_probe_and_add_lun+0x10d/0x9c9
 [<ffffffff881c7a44>] :scsi_mod:scsi_alloc_target+0x239/0x320
 [<ffffffff881c7c6e>] :scsi_mod:__scsi_scan_target+0xc3/0x5e2
 [<ffffffff80106d21>] sysfs_add_file+0x76/0x85
 [<ffffffff801ba812>] transport_add_class_device+0x0/0x32
 [<ffffffff801ba29f>] attribute_container_add_attrs+0x27/0x39
 [<ffffffff881c8436>] :scsi_mod:scsi_scan_target+0x6c/0x83
 [<ffffffff882a5a70>] :scsi_transport_sas:sas_rphy_add+0x102/0x10e
 [<ffffffff882b8879>] :libsas:sas_discover_domain+0x31b/0x382
 [<ffffffff882b855e>] :libsas:sas_discover_domain+0x0/0x382
 [<ffffffff8004d139>] run_workqueue+0x94/0xe4
 [<ffffffff800499ba>] worker_thread+0x0/0x122
 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80049aaa>] worker_thread+0xf0/0x122
 [<ffffffff8008a461>] default_wake_function+0x0/0xe
 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032360>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032262>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11


Code:  Bad RIP value.
RIP  [<0000000000000000>] _stext+0x7ffff000/0x1000
 RSP <ffff81007e3d39d8>
CR2: 0000000000000000
 <0>Kernel panic - not syncing: Fatal exception

Comment 1 Andrius Benokraitis 2009-02-02 17:58:41 UTC
jlarrew: would you happen to know if there was a change made in aic94xx that could have caused this?

Comment 2 David Milburn 2009-02-02 22:51:22 UTC
Just noting that I was able to add a SATA disk to LSI SAS1068E system without
problems.

07:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS (rev 08)

[root@dhcp-122 ~]#  lsmod |grep sas
mptsas                 69201  2
mptscsih               69697  1 mptsas
mptbase               113637  2 mptsas,mptscsih
scsi_transport_sas     66753  1 mptsas
scsi_mod              196569  8 scsi_dh,sr_mod,sg,libata,mptsas,mptscsih,scsi_transport_sas,sd_mod

I am in the process of looking over the code shown on the panic stack trace.

Comment 3 Jesse Larrew 2009-02-03 00:43:19 UTC
(In reply to comment #1)
> jlarrew: would you happen to know if there was a change made in aic94xx that
> could have caused this?

It looks like some changes to aic94xx for 2.6.25 were back-ported to 2.6.18-104.el5 for RHEL 5.3: 
https://bugzilla.redhat.com/show_bug.cgi?id=439573

Other problems with aic94xx, SATA, and smartctl are noted here: https://bugzilla.redhat.com/show_bug.cgi?id=429606

I'll mirror this bug to the IBM LTC for further investigation. 

Jesse

Comment 4 David Milburn 2009-02-03 20:14:26 UTC
Looking at the stack trace

Call Trace:
 [<ffffffff88272606>] :libata:ata_qc_complete+0x143/0x163
 [<ffffffff882730e6>] :libata:ata_exec_internal_sg+0x267/0x40f
 [<ffffffff88273378>] :libata:ata_exec_internal+0xea/0xf9
 [<ffffffff88273754>] :libata:ata_dev_read_id+0xef/0x3ce
 [<ffffffff882752cc>] :libata:ata_bus_probe+0x1eb/0x4a1
 [<ffffffff88277c1f>] :libata:ata_sas_port_start+0x26/0x35

Since no error handler is defined in the ata_port_operations in sas_ata.c

static struct ata_port_operations sas_sata_ops = {
        .phy_reset              = sas_ata_phy_reset,
        .post_internal_cmd      = sas_ata_post_internal,
        .qc_prep                = ata_noop_qc_prep,
        .qc_issue               = sas_ata_qc_fill_rtf,
        .port_start             = ata_sas_port_start,
        .port_stop              = ata_sas_port_stop,
        .scr_read               = sas_ata_scr_read,
        .scr_write              = sas_ata_scr_write
};

Then in ata_qc_complete this condition should be false

        if (ap->ops->error_handler) {

We should be dropping down to the else

        } else {
                if (qc->flags & ATA_QCFLAG_EH_SCHEDULED)
                        return;

                /* read result TF if failed or requested */
                if (qc->err_mask || qc->flags & ATA_QCFLAG_RESULT_TF)
                        fill_result_tf(qc);

                __ata_qc_complete(qc);
        }

And since ata_exec_internal_sg should be setting ATA_QCFLAG_RESULT_TF which
should end up in fill_result_tf.

And

static void fill_result_tf(struct ata_queued_cmd *qc)
{
        struct ata_port *ap = qc->ap;

        qc->result_tf.flags = qc->tf.flags;
        ap->ops->qc_fill_rtf(qc);  <========CRASH ?
}

And I don't see where sas_sata_ops include a .qc_fill_rtf pointer like most
of the drivers/ata, I would think that it is crashing in fill_result_tf,
a vmcore could tell us for sure. 

And looking at upstream, I think we need part of this commit
applied, I will look into this some more and hopefully have some
test kernels built soon.

commit 4c9bf4e799ce06a7378f1196587084802a414c03
Author: Tejun Heo <htejun>
Date:   Mon Apr 7 22:47:20 2008 +0900

    libata: replace tf_read with qc_fill_rtf for non-SFF drivers

Comment 5 David Milburn 2009-02-03 22:58:49 UTC
Created attachment 330793 [details]
Fixup .qc_issue and .qc_fill_rtf in sas_sata_ops

Comment 6 David Milburn 2009-02-04 00:06:24 UTC
Robert,

Would you please try the kernel-2.6.18-130.el5.bz483171.1 test kernel? 

http://people.redhat.com/dmilburn/

Comment 7 Robert N. Evans 2009-02-04 16:55:50 UTC
(In reply to comment #6)

With the test kernel I could boot with a SATA disk.  I then did IO to the disk without problems:
- Used fdisk to create a partition
- mkfs and mount a new ext3 file system
- wrote files to the file system and verified their presence via `ls'.

Comment 9 RHEL Program Management 2009-02-04 19:15:47 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 10 Don Zickus 2009-02-09 18:26:12 UTC
in kernel-2.6.18-131.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 11 RHEL Program Management 2009-02-16 15:04:14 UTC
Updating PM score.

Comment 14 Jim Paradis 2009-02-27 23:14:16 UTC
I have confirmed this bug fix on Stratus hardware.  Kernel .98 did not show the bug, kernel .124 crashed as described, kernels .131 and .132 booted normally.

Comment 15 David Milburn 2009-03-26 23:32:29 UTC
*** Bug 491283 has been marked as a duplicate of this bug. ***

Comment 17 errata-xmlrpc 2009-09-02 08:05:31 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html


Note You need to log in before you can comment on or make changes to this bug.