129529 – kernel-2.6.7-1.494.2.2 fails to boot Adaptec aacraid

Bug 129529 - kernel-2.6.7-1.494.2.2 fails to boot Adaptec aacraid

Summary: kernel-2.6.7-1.494.2.2 fails to boot Adaptec aacraid

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	2
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Dave Jones
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-08-10 03:34 UTC by Yazz D. Atlas
Modified:	2015-01-04 22:08 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2005-04-16 05:05:17 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Yazz D. Atlas 2004-08-10 03:34:35 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1)
Gecko/20031114

Description of problem:
Using the latest kernel upgrade I am no longer able to boot a
system that uses an Adaptec 2120s (ASR-2120S/128MB) raid card
for it main /dev/sda HD. 

I have serial console connected so if you need more output that what
I have provided let me know.

Version-Release number of selected component (if applicable):
kernel-2.6.7-1.494.2.2

How reproducible:
Always

Steps to Reproduce:
1. Install Fedora Core 2 from CD-ROM on a new/clean system
2. Upgrade system using yum
3. System fails to boot kernel-2.6.7-1.494.2.2 or smp version
    

Actual Results: 
The system fails to boot if you follow the above steps and have an 
Adaptec 2120s card setup doing raid5. I am currently able to boot 
using the older kernel-2.6.5-1.358 just fine.

Additional info:

  Booting 'Fedora Core (2.6.7-1.494.2.2smp) serial'
                                                                     
          
root (hd0,0)
 Filesystem type is ext2fs, partition type 0x83
kernel /boot/vmlinuz-2.6.7-1.494.2.2smp ro root=LABEL=/ panic=20
console=tty0 c
onsole=ttyS1,9600n8 acpi=off
   [Linux-bzImage, setup=0x1400, size=0x14d68a]
initrd /boot/initrd-2.6.7-1.494.2.2smp.img
   [Linux-initrd @ 0x37fad000, 0x42a1a bytes]

--cut stuff---

ICH3: IDE controller at PCI slot 0000:00:1f.1
PCI: Enabling device 0000:00:1f.1 (0005 -> 0007)
ICH3: chipset revision 2
ICH3: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:pio, hdb:pio
    ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:pio, hdd:pio
hda: CD-224E, ATAPI CD/DVD-ROM drive
Using cfq io scheduler
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: ATAPI 24X CD-ROM drive, 128kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
ide-floppy driver 0.99.newide
usbcore: registered new driver hiddev
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.0:USB HID core driver
mice: PS/2 mouse device common for all mice
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
input: AT Translated Set 2 keyboard on isa0060/serio0
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
NET: Registered protocol family 2
IP: routing cache hash table of 32768 buckets, 512Kbytes
TCP: Hash tables configured (established 262144 bind 43690)
Initializing IPsec netlink socket
NET: Registered protocol family 1
NET: Registered protocol family 17
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
RAMDISK: Compressed image found at block 0
VFS: Mounted root (ext2 filesystem).
Red Hat nash verSCSI subsystem initialized
sion 3.5.22 starRed Hat/Adaptec aacraid driver (1.1.2-lk2 Aug  3 2004)
ting
Mounted /proc filesystem
Mounting sysfs
AAC0: kernel 4.1.4 build 7244
Loading scsi_modAAC0: monitor 4.1.4 build 7244
.ko module
LoadAAC0: bios 4.1.0 build 7244
ing sd_mod.ko moAAC0: serial bc5593fafaf001
dule
Loading aaAAC0: 64bit support enabled.
craid.ko moduleAAC0: 64 Bit PAE enabled
                                                                     
          
scsi0 : aacraid
  Vendor: ADAPTEC   Model: Adaptec RAID5     Rev: V1.0
  Type:   Direct-Access                      ANSI SCSI revision: 02
SCSI device sda: 430510464 512-byte hdwr sectors (220421 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write through
 sda:<3>aacraid: Host adapter reset request. SCSI hang ?
aacraid: Host adapter appears dead
scsi: Device offlined - not ready after error recovery: host 0 channel
0 id 0 lun 0
SCSI error : <0 0 0 0> return code = 0x6000000
end_request: I/O error, dev sda, sector 0
Buffer I/O error on device sda, logical block 0
scsi0 (0:0): rejecting I/O to offline device
Buffer I/O error on device sda, logical block 0
scsi0 (0:0): rejecting I/O to offline device
Buffer I/O error on device sda, logical block 53813807
scsi0 (0:0): rejecting I/O to offline device
Buffer I/O error on device sda, logical block 53813807
scsi0 (0:0): rejecting I/O to offline device
Buffer I/O error on device sda, logical block 0
 unable to read partition table
Attached scsi removable disk sda at scsi0, channel 0, id 0, lun 0
aacraid: Host adapter reset request. SCSI hang ?
aacraid: Host adapter appears dead
scsi: Device offlined - not ready after error recovery: host 0 channel
0 id 1 lun 0

Comment 1 Ryan Tokarek 2004-09-15 20:34:33 UTC

I get the same problem with the Adaptec 2410SA and linux
2.6.8-2.6.8-1.521(smp/up). This is an IBM xSeries 306 Server with
ServeRAID-7t SATA controller. (3.0GHz P4, 512MB RAM, 2 250GB Hitachi
Deskstor SATA drives). The stock 2.6.5-1.358 works just fine. I can
(and did) install an OS, but I can't boot from it with 2.6.8. 

Booting off another drive attached to the motherboard SATA (non-RAID),
I see this in kernel logs:

Red Hat/Adaptec aacraid driver (1.1.2-lk2 Aug 16 2004)
ACPI: PCI interrupt 0000:03:02.0[A] -> GSI 11 (level, low) -> IRQ 11
spurious 8259A interrupt: IRQ7.
AAC0: kernel 4.1.4 build 7235
AAC0: monitor 4.1.4 build 7235
AAC0: bios 4.1.0 build 7235
AAC0: serial bb8eb0fafaf001
scsi0 : aacraid
  Vendor: ADAPTEC   Model: AAR-2410SA Strip  Rev: V1.0
  Type:   Direct-Access                      ANSI SCSI revision: 02
SCSI device sda: 976558080 512-byte hdwr sectors (499998 MB)
sda: Write Protect is off
sda: Mode Sense: 03 00 00 00
SCSI device sda: drive cache: write through
 sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 >
Attached scsi removable disk sda at scsi0, channel 0, id 0, lun 0
...
Adding 787144k swap on /dev/sda6.  Priority:-1 extents:1
aacraid: Host adapter reset request. SCSI hang ?
Debug: sleeping function called from invalid context at mm/slab.c:2000
in_atomic():0[expected: 0], irqs_disabled():1
 [<0211b765>] __might_sleep+0x82/0x8c
 [<02146dbc>] kmem_cache_alloc+0x1d/0x4c
 [<22868e7c>] aac_rx_check_health+0x3e/0x116 [aacraid]
 [<228641a2>] aac_eh_reset+0x24/0x2ad [aacraid]
 [<2284d6cc>] scsi_try_host_reset+0x106/0x2e8 [scsi_mod]
 [<2284d9ff>] scsi_eh_host_reset+0x44/0xca [scsi_mod]
 [<2284de8d>] scsi_eh_ready_devs+0x39/0x4d [scsi_mod]
 [<2284e1c0>] scsi_unjam_host+0x24c/0x25d [scsi_mod]
 [<2284e42e>] scsi_error_handler+0x25d/0x2b4 [scsi_mod]
 [<2284e1d1>] scsi_error_handler+0x0/0x2b4 [scsi_mod]
 [<021041d9>] kernel_thread_helper+0x5/0xb
aacraid: Host adapter appears dead
scsi: Device offlined - not ready after error recovery: host 0 channel
0 id 0 lun 0
SCSI error : <0 0 0 0> return code = 0x6000000
end_request: I/O error, dev sda, sector 976558072
Buffer I/O error on device sda, logical block 122069759
scsi0 (0:0): rejecting I/O to offline device
Buffer I/O error on device sda, logical block 122069759


Please let me know if I should file another bug, but though the cards
are different, the error seems to be at least related. 

Ryan Tokarek
System Administrator 
Wolfram Research Inc.

Comment 2 Ryan Tokarek 2004-09-15 20:44:41 UTC

I should also draw attention to the line:

SCSI device sda: 976558080 512-byte hdwr sectors (499998 MB)

expr 976558080 / 2 / 1024
476835   (MB)

The RAID controller reports a 465GB container at startup. Linux
reports the SCSI device with the right number of blocks (to equal
465GB), but reports 499998MB (488GB) for the size. 

Perhaps I'm only drawing attention to my own ignorance, but it seems
to me that those numbers should match up better.  

Ryan Tokarek
System Administrator
Wolfram Research Inc.

Comment 3 Ryan Farris 2004-10-24 03:28:36 UTC

Having a similar problem with kernels 2.6.8-1.521 and 2.6.9-1.640.
Also, whatever Fedora Core 3, Test 3 is running fares no better.

Hardware:
Adaptec 2410SA SATA Raid controller on a Supermicro P4SCi motherboard
w/P4-2.8GHz  

Problem:
Panics on startup when booting off the 2410SA controller running raid5.

Messages:
aacraid: Host adapter reset request. SCSI hang?
Debug: sleeping function called from invalid context at mm/slab.c:2063
aacraid: Host adapter appears dead.

Comment 4 Nate Bradley 2004-11-19 21:39:04 UTC

Yes, mine's broke as well on Adaptec 2410SA

Comment 5 Dave Jones 2004-11-27 22:34:04 UTC

mass update for old bugs:

Is this still a problem in the 2.6.9 based kernel update ?

Comment 6 Ashley Gittins 2004-12-15 17:16:52 UTC

I can confirm that kernel 2.6.9-1.6 has the same problem on a Dell 
PowerEdge 750. I get I/O errors from the controller then it downs the 
device. This is with two drives running in RAID. 
 
Snippet from dmesg on a good boot in 2.6.5-1.358 (for card info): 
 
SCSI subsystem initialized 
Red Hat/Adaptec aacraid driver (1.1.2-lk1 May  8 2004) 
AAC0: kernel 4.1.4 build 7401 
AAC0: monitor 4.1.4 build 7401 
AAC0: bios 4.1.0 build 7401 
AAC0: serial bdf001fafaf001 
scsi0 : aacraid 
  Vendor: DELL      Model: CERC Mirror       Rev: V1.0 
  Type:   Direct-Access                      ANSI SCSI revision: 02 
SCSI device sda: 312432384 512-byte hdwr sectors (159965 MB) 
sda: Write Protect is off 
sda: Mode Sense: 03 00 00 00 
SCSI device sda: drive cache: write through 
 sda: sda1 sda2 sda3 sda4 < sda5 > 
Attached scsi removable disk sda at scsi0, channel 0, id 0, lun 0 
libata version 1.02 loaded. 
ata_piix version 1.02 
ata_piix: combined mode detected 
ACPI: No IRQ known for interrupt pin A of device 0000:00:1f.2 
ata: 0x1f0 IDE port busy 
PCI: Setting latency timer of device 0000:00:1f.2 to 64 
ata1: SATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xFEA8 irq 15 
ata1: SATA port has no device. 
ata1: thread exiting 
scsi1 : ata_piix 
 
If I boot (via PXE) the FC3 CD, anaconda will find the existing 
installation on /dev/sda1, but choosing it for upgrade causes IO 
erros (and a continual beep). 
Interestingly, the PE750 beeps for a brief period with a working 
kernel too, during the activating swap / checking root fs part of 
bootup. 
 
[OT] I am guessing that this will make it somewhat difficult to 
upgrade to FC3 even after a fix is out, would I need to rebuild an 
initrd image after the problem is sorted? 
 
Anyway, yes, this is still a problem at least for 2.6.9-1.6 in my 
case.

Comment 7 Gunther Schlegel 2005-01-25 17:17:47 UTC

I seem to have the same problem with kernel 2.6.10-1.9_FC2smp and
2.6.10-1.8_FC2smp. Have not had it with older releases.

Have no kernel messages because it only affects the /var partition and
I have not yet attached a serial console to it. Though, the filesystem
errors I see are the same.

Machine is a Dell PowerEdge 2650 with Perc 3/DI controller. Latest
machine and controller Bios applied.

Crash occurs every 20 to 30 hours.

Comment 8 Dave Jones 2005-04-16 05:05:17 UTC

Fedora Core 2 has now reached end of life, and no further updates will be
provided by Red Hat.  The Fedora legacy project will be producing further kernel
updates for security problems only.

If this bug has not been fixed in the latest Fedora Core 2 update kernel, please
try to reproduce it under Fedora Core 3, and reopen if necessary, changing the
product version accordingly.

Thank you.

Comment 9 Ashley Gittins 2005-04-16 15:07:58 UTC

The system I was reporting from was experiencing the fault on FC3's initial 
release kernel, ie it would not boot to install because (it appears) of this 
bug. Thus, it is a but that appears to apply to FC3 also. 
However, the machine in question is now in production use at my client's co-lo 
facility so I can not verify if later kernel releases have fixed the problem, 
nor can I easily access it to do any testing. As such I won't re-open the bug 
myself since I can't gather the right first-hand evidence to support it. If 
anyone else here on the cc list still has this problem please re-open the bug, 
or if you have found a solution, please add it in a comment here (eg, does 
fc4test[?] work)?

Note You need to log in before you can comment on or make changes to this bug.