Description of problem: Panic at boot when aic94xx is loading Version-Release number of selected component (if applicable): 2.6.18-128.el5 Have also reproduced this problem on pre-releases of RHEL 5.3 back to kernel-2.6.18-120.el5.x86_64. This problem does not occur on the same hardware with RHEL 5.2. How reproducible: Seems to happen every time we try, unless there is no SATA disk present. Problem does not occur if only SAS disks are present. Steps to Reproduce: 1. Insert SATA disk. 2. Boot RHEL 5.3. 3. Watch it panic. Actual results: Panic at boot as shown below. Expected results: No panic. Additional info: Booted with RHEL5.3 install disk in rescue mode to eliminate configuration variables. Serial console output from that follows. Stack of exception is at the end of this log... Linux version 2.6.18-128.el5 (mockbuild.redhat.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Wed Dec 17 11:41:38 EST 2008 Command line: initrd=initrd.img BOOT_IMAGE=vmlinuz i8042.noaux rescue console=ttyS0,115200 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009b400 (usable) BIOS-e820: 000000000009b400 - 00000000000a0000 (reserved) BIOS-e820: 00000000000d2000 - 00000000000d4000 (reserved) BIOS-e820: 00000000000dc000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 0000000000e00000 (usable) BIOS-e820: 0000000000e00000 - 0000000000e40000 (reserved) BIOS-e820: 0000000000e40000 - 000000007fef0000 (usable) BIOS-e820: 000000007fef0000 - 000000007fef9000 (ACPI data) BIOS-e820: 000000007fef9000 - 000000007ff00000 (ACPI NVS) BIOS-e820: 000000007ff00000 - 0000000080000000 (reserved) BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved) BIOS-e820: 00000000fec80000 - 00000000fec90000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved) DMI present. No NUMA configuration found Faking a node at 0000000000000000-000000007fef0000 Bootmem setup node 0 0000000000000000-000000007fef0000 Memory for crash kernel (0x0 to 0x0) notwithin permissible range disabling kdump ACPI: PM-Timer IO Port: 0x508 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 7:7 APIC version 20 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x04] enabled) Processor #4 7:7 APIC version 20 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 7:7 APIC version 20 ACPI: LAPIC (acpi_id[0x03] lapic_id[0x05] enabled) Processor #5 7:7 APIC version 20 ACPI: LAPIC (acpi_id[0x04] lapic_id[0x02] enabled) Processor #2 7:7 APIC version 20 ACPI: LAPIC (acpi_id[0x05] lapic_id[0x06] enabled) Processor #6 7:7 APIC version 20 ACPI: LAPIC (acpi_id[0x06] lapic_id[0x03] enabled) Processor #3 7:7 APIC version 20 ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled) Processor #7 7:7 APIC version 20 ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1]) ACPI: IOAPIC (id[0x08] address[0xfec80000] gsi_base[0]) IOAPIC[0]: apic_id 8, version 16, address 0xfec80000, GSI 0-35 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 8 global_irq 8 low edge) ACPI: NMI_SRC (high edge global_irq 34) Setting APIC routing to physical flat Using ACPI (MADT) for SMP configuration information Nosave address range: 000000000009b000 - 000000000009c000 Nosave address range: 000000000009c000 - 00000000000a0000 Nosave address range: 00000000000a0000 - 00000000000d2000 Nosave address range: 00000000000d2000 - 00000000000d4000 Nosave address range: 00000000000d4000 - 00000000000dc000 Nosave address range: 00000000000dc000 - 0000000000100000 Nosave address range: 0000000000e00000 - 0000000000e40000 Allocating PCI resources starting at 88000000 (gap: 80000000:60000000) SMP: Allowing 8 CPUs, 0 hotplug CPUs Built 1 zonelists. Total pages: 513799 Kernel command line: initrd=initrd.img BOOT_IMAGE=vmlinuz i8042.noaux rescue console=ttyS0,115200 Initializing CPU#0 PID hash table entries: 4096 (order: 12, 32768 bytes) Console: colour VGA+ 80x25 Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) Checking aperture... Memory: 2051520k/2096064k available (2494k kernel code, 43884k reserved, 1263k data, 200k init) Calibrating delay using timer specific routine.. 4002.46 BogoMIPS (lpj=2001231) Security Framework v1.0.0 initialized SELinux: Initializing. selinux_register_security: Registering secondary module capability Capability LSM initialized as secondary Mount-cache hash table entries: 256 CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 6144K using mwait in idle threads. CPU: Physical Processor ID: 0 CPU: Processor Core ID: 0 CPU0: Thermal monitoring enabled (TM1) SMP alternatives: switching to UP code ACPI: Core revision 20060707 Using local APIC timer interrupts. result 20834957 Detected 20.834 MHz APIC timer. SMP alternatives: switching to SMP code Booting processor 1/8 APIC 0x4 Initializing CPU#1 Calibrating delay using timer specific routine.. 3999.61 BogoMIPS (lpj=1999805) CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 6144K CPU: Physical Processor ID: 1 CPU: Processor Core ID: 0 CPU1: Thermal monitoring enabled (TM1) Intel(R) Xeon(R) CPU E5405 @ 2.00GHz stepping 06 SMP alternatives: switching to SMP code Booting processor 2/8 APIC 0x1 Initializing CPU#2 Calibrating delay using timer specific routine.. 3999.60 BogoMIPS (lpj=1999804) CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 6144K CPU: Physical Processor ID: 0 CPU: Processor Core ID: 1 CPU2: Thermal monitoring enabled (TM1) Intel(R) Xeon(R) CPU E5405 @ 2.00GHz stepping 06 SMP alternatives: switching to SMP code Booting processor 3/8 APIC 0x5 Initializing CPU#3 Calibrating delay using timer specific routine.. 3999.60 BogoMIPS (lpj=1999803) CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 6144K CPU: Physical Processor ID: 1 CPU: Processor Core ID: 1 CPU3: Thermal monitoring enabled (TM1) Intel(R) Xeon(R) CPU E5405 @ 2.00GHz stepping 06 SMP alternatives: switching to SMP code Booting processor 4/8 APIC 0x2 Initializing CPU#4 Calibrating delay using timer specific routine.. 3999.59 BogoMIPS (lpj=1999797) CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 6144K CPU: Physical Processor ID: 0 CPU: Processor Core ID: 2 CPU4: Thermal monitoring enabled (TM1) Intel(R) Xeon(R) CPU E5405 @ 2.00GHz stepping 06 SMP alternatives: switching to SMP code Booting processor 5/8 APIC 0x6 Initializing CPU#5 Calibrating delay using timer specific routine.. 3999.60 BogoMIPS (lpj=1999802) CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 6144K CPU: Physical Processor ID: 1 CPU: Processor Core ID: 2 CPU5: Thermal monitoring enabled (TM1) Intel(R) Xeon(R) CPU E5405 @ 2.00GHz stepping 06 SMP alternatives: switching to SMP code Booting processor 6/8 APIC 0x3 Initializing CPU#6 Calibrating delay using timer specific routine.. 3999.60 BogoMIPS (lpj=1999803) CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 6144K CPU: Physical Processor ID: 0 CPU: Processor Core ID: 3 CPU6: Thermal monitoring enabled (TM1) Intel(R) Xeon(R) CPU E5405 @ 2.00GHz stepping 06 SMP alternatives: switching to SMP code Booting processor 7/8 APIC 0x7 Initializing CPU#7 Calibrating delay using timer specific routine.. 3999.60 BogoMIPS (lpj=1999802) CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 6144K CPU: Physical Processor ID: 1 CPU: Processor Core ID: 3 CPU7: Thermal monitoring enabled (TM1) Intel(R) Xeon(R) CPU E5405 @ 2.00GHz stepping 06 Brought up 8 CPUs testing NMI watchdog ... OK. time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer. time.c: Detected 2000.156 MHz processor. migration_cost=26,12762 checking if image is initramfs... it is Freeing initrd memory: 6485k freed NET: Registered protocol family 16 No dock devices found. ACPI: bus type pci registered PCI: Using MMCONFIG at e0000000 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (0000:00) PCI: PXH quirk detected, disabling MSI for SHPC device PCI: PXH quirk detected, disabling MSI for SHPC device PCI: Transparent bridge - 0000:0b:1e.0 PCI: PXH quirk detected, disabling MSI for SHPC device PCI: PXH quirk detected, disabling MSI for SHPC device PCI: Transparent bridge - 0000:7e:1e.0 Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI init pnp: PnP ACPI: found 10 devices usbcore: registered new driver usbfs usbcore: registered new driver hub PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report PCI: Cannot allocate resource region 0 of device 0000:01:00.1 NetLabel: Initializing NetLabel: domain hash size = 128 NetLabel: protocols = UNLABELED CIPSOv4 NetLabel: unlabeled traffic allowed by default PCI-GART: No AMD northbridge found. PCI: mem resource #9:100000@c4100000 for 0000:05:00.0 was not allocated. PCI: mem resource #6:40000@84400000 for 0000:06:01.0 was not allocated. PCI: mem resource #6:40000@84400000 for 0000:06:01.1 was not allocated. PCI: Bridge: 0000:05:00.0 IO window: 1000-1fff MEM window: 84200000-843fffff PREFETCH window: disabled. PCI: Bridge: 0000:05:00.2 IO window: 2000-2fff MEM window: 84400000-844fffff PREFETCH window: c4000000-c40fffff PCI: Bridge: 0000:04:00.0 IO window: 1000-2fff MEM window: 84100000-844fffff PREFETCH window: c4000000-c40fffff PCI: Bridge: 0000:04:01.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:04:02.0 IO window: 3000-3fff MEM window: 84500000-845fffff PREFETCH window: disabled. PCI: Bridge: 0000:03:00.0 IO window: 1000-3fff MEM window: 84100000-845fffff PREFETCH window: c4000000-c40fffff PCI: Bridge: 0000:03:00.3 IO window: 4000-4fff MEM window: 84600000-848fffff PREFETCH window: 84b00000-84bfffff PCI: Bridge: 0000:0b:1e.0 IO window: 6000-6fff MEM window: 84a00000-84afffff PREFETCH window: c4100000-cfffffff PCI: Bridge: 0000:03:01.0 IO window: 5000-6fff MEM window: 84900000-84afffff PREFETCH window: c4100000-cfffffff PCI: Bridge: 0000:02:00.0 IO window: 1000-7fff MEM window: 84000000-a3ffffff PREFETCH window: c4000000-cfffffff PCI: mem resource #9:100000@d0100000 for 0000:78:00.0 was not allocated. PCI: mem resource #6:40000@a4400000 for 0000:79:01.0 was not allocated. PCI: mem resource #6:40000@a4400000 for 0000:79:01.1 was not allocated. PCI: Bridge: 0000:78:00.0 IO window: 8000-8fff MEM window: a4200000-a43fffff PREFETCH window: disabled. PCI: Bridge: 0000:78:00.2 IO window: 9000-9fff MEM window: a4400000-a44fffff PREFETCH window: d0000000-d00fffff PCI: Bridge: 0000:77:00.0 IO window: 8000-9fff MEM window: a4100000-a44fffff PREFETCH window: d0000000-d00fffff PCI: Bridge: 0000:77:01.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:77:02.0 IO window: a000-afff MEM window: a4500000-a45fffff PREFETCH window: disabled. PCI: Bridge: 0000:76:00.0 IO window: 8000-afff MEM window: a4100000-a45fffff PREFETCH window: d0000000-d00fffff PCI: Bridge: 0000:76:00.3 IO window: b000-bfff MEM window: a4600000-a48fffff PREFETCH window: a4b00000-a4bfffff PCI: Bridge: 0000:7e:1e.0 IO window: d000-dfff MEM window: a4a00000-a4afffff PREFETCH window: d0100000-dfffffff PCI: Bridge: 0000:76:01.0 IO window: c000-dfff MEM window: a4900000-a4afffff PREFETCH window: d0100000-dfffffff PCI: Bridge: 0000:02:01.0 IO window: 8000-efff MEM window: a4000000-c3ffffff PREFETCH window: d0000000-dfffffff PCI: Bridge: 0000:01:00.0 IO window: 1000-efff MEM window: 80000000-c3ffffff PREFETCH window: c4000000-dfffffff PCI: Bridge: 0000:00:02.0 IO window: 1000-efff MEM window: 80000000-c3ffffff PREFETCH window: c4000000-dfffffff PCI: Bridge: 0000:00:03.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. GSI 16 sharing vector 0xA9 and IRQ 16 ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 35 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 35 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 35 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 35 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 35 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:05:00.2[B] -> GSI 35 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:04:01.0[A] -> GSI 35 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:04:02.0[A] -> GSI 35 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:03:00.3[A] -> GSI 35 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:03:01.0[A] -> GSI 1 (level, low) -> IRQ 1 ACPI: PCI Interrupt 0000:02:01.0[A] -> GSI 35 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:76:00.0[A] -> GSI 35 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:77:00.0[A] -> GSI 35 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:78:00.0[A] -> GSI 35 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:78:00.2[B] -> GSI 35 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:77:01.0[A] -> GSI 35 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:77:02.0[A] -> GSI 35 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:76:00.3[A] -> GSI 35 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:76:01.0[A] -> GSI 1 (level, low) -> IRQ 1 GSI 17 sharing vector 0xB1 and IRQ 17 ACPI: PCI Interrupt 0000:00:03.0[A] -> GSI 0 (level, low) -> IRQ 177 NET: Registered protocol family 2 IP route cache hash table entries: 65536 (order: 7, 524288 bytes) TCP established hash table entries: 262144 (order: 10, 4194304 bytes) TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) TCP: Hash tables configured (established 262144 bind 65536) TCP reno registered Simple Boot Flag at 0x35 set to 0x1 audit: initializing netlink socket (disabled) type=2000 audit(1233023570.709:1): initialized Total HugeTLB memory allocated, 0 VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 512 (order 0, 4096 bytes) Initializing Cryptographic API alg: No test for crc32c (crc32c-generic) ksign: Installing public key data Loading keyring - Added public key 5C0DC734E64D24FA - User ID: Red Hat, Inc. (Kernel Module GPG key) io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered (default) assign_interrupt_mode Found MSI capability assign_interrupt_mode Found MSI capability assign_interrupt_mode Found MSI capability assign_interrupt_mode Found MSI capability assign_interrupt_mode Found MSI capability assign_interrupt_mode Found MSI capability assign_interrupt_mode Found MSI capability assign_interrupt_mode Found MSI capability assign_interrupt_mode Found MSI capability assign_interrupt_mode Found MSI capability pci_hotplug: PCI Hot Plug PCI Core version: 0.5 Real Time Clock Driver v1.12ac Non-volatile memory driver v1.2 Linux agpgart interface v0.101 (c) Dave Jones Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled ��serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A �00:06: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A 00:07: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A GSI 18 sharing vector 0x52 and IRQ 18 ACPI: PCI Interrupt 0000:0c:01.1[A] -> GSI 22 (level, low) -> IRQ 82 ACPI: PCI interrupt for device 0000:0c:01.1 disabled ACPI: PCI Interrupt 0000:0c:01.2[A] -> GSI 22 (level, low) -> IRQ 82 0000:0c:01.2: ttyS2 at I/O 0x6450 (irq = 82) is a 16550A 0000:0c:01.2: ttyS3 at I/O 0x6458 (irq = 82) is a 16550A GSI 19 sharing vector 0x5A and IRQ 19 ACPI: PCI Interrupt 0000:7f:01.1[A] -> GSI 33 (level, low) -> IRQ 90 ACPI: PCI interrupt for device 0000:7f:01.1 disabled ACPI: PCI Interrupt 0000:7f:01.2[A] -> GSI 33 (level, low) -> IRQ 90 Couldn't register serial port 0000:7f:01.2: -28 RAMDISK driver initialized: 16 RAM disks of 16384K size 4096 blocksize Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx ESB2: IDE controller at PCI slot 0000:0b:1f.1 ACPI: PCI Interrupt 0000:0b:1f.1[A] -> GSI 12 (level, low) -> IRQ 12 ESB2: chipset revision 9 ESB2: 100% native mode on irq 12 ide0: BM-DMA at 0x5060-0x5067, BIOS settings: hda:DMA, hdb:pio hda: MATSHITADVD-RAM UJ870PC, ATAPI CD/DVD-ROM drive ide0 at 0x5090-0x5097,0x5086 on irq 12 ESB2: IDE controller at PCI slot 0000:7e:1f.1 PCI: Enabling device 0000:7e:1f.1 (0004 -> 0005) GSI 20 sharing vector 0x62 and IRQ 20 ACPI: PCI Interrupt 0000:7e:1f.1[A] -> GSI 23 (level, low) -> IRQ 98 Welcome to Red Hat Enterprise Linux Server +--------+ Loading SCSI driver +---------+ | | | Loading aic94xx driver...Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: [<0000000000000000>] _stext+0x7ffff000/0x1000 | PGD 0 +----------------------------------------+ Oops: 0010 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/class CPU 0 Modules linked in: aic94xx libsas scsi_transport_sas libata uhci_hcd ehci_hcd iscsi_ibft iscsi_tcp libiscsi scsi_transport_iscsi sr_mod sd_mod scsi_mod ide_cd cdrom ipv6 xfrm_nalgo crypto_api squashfs pcspkr edd loop nfs nfs_acl fscache lockd sunrpc vfat fat cramfs Pid: 952, comm: scsi_wq_0 Not tainted 2.6.18-128.el5 #1 RIP: 0010:[<0000000000000000>] [<0000000000000000>] _stext+0x7ffff000/0x1000 RSP: 0000:ffff81007e3d39d8 EFLAGS: 00010002 RAX: ffffffff882c2160 RBX: ffff81007e1440e0 RCX: 0000000000000000 RDX: ffff81007e144000 RSI: ffff81007e4a5358 RDI: ffff81007e1440e0t screen RBP: ffff81007e1440e0 R08: ffffffffffffffff R09: 0000000000000001 R10: ffff81007f3f10c0 R11: 0000000000000000 R12: ffff81007e144000 R13: ffff81007e1463f0 R14: ffff81007e146528 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffffff803ac000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0 Process scsi_wq_0 (pid: 952, threadinfo ffff81007e3d2000, task ffff81007f0c5820) Stack: ffffffff88272606 ffff81007e1440e0 ffff81007e3d3a50 ffff81007e1440e0 ffff81007e144000 ffff81007e1463f0 ffffffff882730e6 000000017f0c5820 ffff81007e3d3ac0 000000028ac28261 ffff81007e3d3b60 ecff81007f0c5a08 Call Trace: [<ffffffff88272606>] :libata:ata_qc_complete+0x143/0x163 [<ffffffff882730e6>] :libata:ata_exec_internal_sg+0x267/0x40f [<ffffffff88273378>] :libata:ata_exec_internal+0xea/0xf9 [<ffffffff88273754>] :libata:ata_dev_read_id+0xef/0x3ce [<ffffffff882752cc>] :libata:ata_bus_probe+0x1eb/0x4a1 [<ffffffff88277c1f>] :libata:ata_sas_port_start+0x26/0x35 [<ffffffff881c6c58>] :scsi_mod:scsi_alloc_sdev+0x17d/0x1c4 [<ffffffff881c6e29>] :scsi_mod:scsi_probe_and_add_lun+0x10d/0x9c9 [<ffffffff881c7a44>] :scsi_mod:scsi_alloc_target+0x239/0x320 [<ffffffff881c7c6e>] :scsi_mod:__scsi_scan_target+0xc3/0x5e2 [<ffffffff80106d21>] sysfs_add_file+0x76/0x85 [<ffffffff801ba812>] transport_add_class_device+0x0/0x32 [<ffffffff801ba29f>] attribute_container_add_attrs+0x27/0x39 [<ffffffff881c8436>] :scsi_mod:scsi_scan_target+0x6c/0x83 [<ffffffff882a5a70>] :scsi_transport_sas:sas_rphy_add+0x102/0x10e [<ffffffff882b8879>] :libsas:sas_discover_domain+0x31b/0x382 [<ffffffff882b855e>] :libsas:sas_discover_domain+0x0/0x382 [<ffffffff8004d139>] run_workqueue+0x94/0xe4 [<ffffffff800499ba>] worker_thread+0x0/0x122 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4 [<ffffffff80049aaa>] worker_thread+0xf0/0x122 [<ffffffff8008a461>] default_wake_function+0x0/0xe [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4 [<ffffffff80032360>] kthread+0xfe/0x132 [<ffffffff8005dfb1>] child_rip+0xa/0x11 [<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4 [<ffffffff80032262>] kthread+0x0/0x132 [<ffffffff8005dfa7>] child_rip+0x0/0x11 Code: Bad RIP value. RIP [<0000000000000000>] _stext+0x7ffff000/0x1000 RSP <ffff81007e3d39d8> CR2: 0000000000000000 <0>Kernel panic - not syncing: Fatal exception
jlarrew: would you happen to know if there was a change made in aic94xx that could have caused this?
Just noting that I was able to add a SATA disk to LSI SAS1068E system without problems. 07:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS (rev 08) [root@dhcp-122 ~]# lsmod |grep sas mptsas 69201 2 mptscsih 69697 1 mptsas mptbase 113637 2 mptsas,mptscsih scsi_transport_sas 66753 1 mptsas scsi_mod 196569 8 scsi_dh,sr_mod,sg,libata,mptsas,mptscsih,scsi_transport_sas,sd_mod I am in the process of looking over the code shown on the panic stack trace.
(In reply to comment #1) > jlarrew: would you happen to know if there was a change made in aic94xx that > could have caused this? It looks like some changes to aic94xx for 2.6.25 were back-ported to 2.6.18-104.el5 for RHEL 5.3: https://bugzilla.redhat.com/show_bug.cgi?id=439573 Other problems with aic94xx, SATA, and smartctl are noted here: https://bugzilla.redhat.com/show_bug.cgi?id=429606 I'll mirror this bug to the IBM LTC for further investigation. Jesse
Looking at the stack trace Call Trace: [<ffffffff88272606>] :libata:ata_qc_complete+0x143/0x163 [<ffffffff882730e6>] :libata:ata_exec_internal_sg+0x267/0x40f [<ffffffff88273378>] :libata:ata_exec_internal+0xea/0xf9 [<ffffffff88273754>] :libata:ata_dev_read_id+0xef/0x3ce [<ffffffff882752cc>] :libata:ata_bus_probe+0x1eb/0x4a1 [<ffffffff88277c1f>] :libata:ata_sas_port_start+0x26/0x35 Since no error handler is defined in the ata_port_operations in sas_ata.c static struct ata_port_operations sas_sata_ops = { .phy_reset = sas_ata_phy_reset, .post_internal_cmd = sas_ata_post_internal, .qc_prep = ata_noop_qc_prep, .qc_issue = sas_ata_qc_fill_rtf, .port_start = ata_sas_port_start, .port_stop = ata_sas_port_stop, .scr_read = sas_ata_scr_read, .scr_write = sas_ata_scr_write }; Then in ata_qc_complete this condition should be false if (ap->ops->error_handler) { We should be dropping down to the else } else { if (qc->flags & ATA_QCFLAG_EH_SCHEDULED) return; /* read result TF if failed or requested */ if (qc->err_mask || qc->flags & ATA_QCFLAG_RESULT_TF) fill_result_tf(qc); __ata_qc_complete(qc); } And since ata_exec_internal_sg should be setting ATA_QCFLAG_RESULT_TF which should end up in fill_result_tf. And static void fill_result_tf(struct ata_queued_cmd *qc) { struct ata_port *ap = qc->ap; qc->result_tf.flags = qc->tf.flags; ap->ops->qc_fill_rtf(qc); <========CRASH ? } And I don't see where sas_sata_ops include a .qc_fill_rtf pointer like most of the drivers/ata, I would think that it is crashing in fill_result_tf, a vmcore could tell us for sure. And looking at upstream, I think we need part of this commit applied, I will look into this some more and hopefully have some test kernels built soon. commit 4c9bf4e799ce06a7378f1196587084802a414c03 Author: Tejun Heo <htejun> Date: Mon Apr 7 22:47:20 2008 +0900 libata: replace tf_read with qc_fill_rtf for non-SFF drivers
Created attachment 330793 [details] Fixup .qc_issue and .qc_fill_rtf in sas_sata_ops
Robert, Would you please try the kernel-2.6.18-130.el5.bz483171.1 test kernel? http://people.redhat.com/dmilburn/
(In reply to comment #6) With the test kernel I could boot with a SATA disk. I then did IO to the disk without problems: - Used fdisk to create a partition - mkfs and mount a new ext3 file system - wrote files to the file system and verified their presence via `ls'.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-131.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
Updating PM score.
I have confirmed this bug fix on Stratus hardware. Kernel .98 did not show the bug, kernel .124 crashed as described, kernels .131 and .132 booted normally.
*** Bug 491283 has been marked as a duplicate of this bug. ***
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1243.html