From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1) Gecko/20031114 Description of problem: Using the latest kernel upgrade I am no longer able to boot a system that uses an Adaptec 2120s (ASR-2120S/128MB) raid card for it main /dev/sda HD. I have serial console connected so if you need more output that what I have provided let me know. Version-Release number of selected component (if applicable): kernel-2.6.7-1.494.2.2 How reproducible: Always Steps to Reproduce: 1. Install Fedora Core 2 from CD-ROM on a new/clean system 2. Upgrade system using yum 3. System fails to boot kernel-2.6.7-1.494.2.2 or smp version Actual Results: The system fails to boot if you follow the above steps and have an Adaptec 2120s card setup doing raid5. I am currently able to boot using the older kernel-2.6.5-1.358 just fine. Additional info: Booting 'Fedora Core (2.6.7-1.494.2.2smp) serial' root (hd0,0) Filesystem type is ext2fs, partition type 0x83 kernel /boot/vmlinuz-2.6.7-1.494.2.2smp ro root=LABEL=/ panic=20 console=tty0 c onsole=ttyS1,9600n8 acpi=off [Linux-bzImage, setup=0x1400, size=0x14d68a] initrd /boot/initrd-2.6.7-1.494.2.2smp.img [Linux-initrd @ 0x37fad000, 0x42a1a bytes] --cut stuff--- ICH3: IDE controller at PCI slot 0000:00:1f.1 PCI: Enabling device 0000:00:1f.1 (0005 -> 0007) ICH3: chipset revision 2 ICH3: not 100% native mode: will probe irqs later ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:pio, hdb:pio ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:pio, hdd:pio hda: CD-224E, ATAPI CD/DVD-ROM drive Using cfq io scheduler ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 hda: ATAPI 24X CD-ROM drive, 128kB Cache, UDMA(33) Uniform CD-ROM driver Revision: 3.20 ide-floppy driver 0.99.newide usbcore: registered new driver hiddev usbcore: registered new driver usbhid drivers/usb/input/hid-core.c: v2.0:USB HID core driver mice: PS/2 mouse device common for all mice serio: i8042 AUX port at 0x60,0x64 irq 12 serio: i8042 KBD port at 0x60,0x64 irq 1 input: AT Translated Set 2 keyboard on isa0060/serio0 md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 NET: Registered protocol family 2 IP: routing cache hash table of 32768 buckets, 512Kbytes TCP: Hash tables configured (established 262144 bind 43690) Initializing IPsec netlink socket NET: Registered protocol family 1 NET: Registered protocol family 17 md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. RAMDISK: Compressed image found at block 0 VFS: Mounted root (ext2 filesystem). Red Hat nash verSCSI subsystem initialized sion 3.5.22 starRed Hat/Adaptec aacraid driver (1.1.2-lk2 Aug 3 2004) ting Mounted /proc filesystem Mounting sysfs AAC0: kernel 4.1.4 build 7244 Loading scsi_modAAC0: monitor 4.1.4 build 7244 .ko module LoadAAC0: bios 4.1.0 build 7244 ing sd_mod.ko moAAC0: serial bc5593fafaf001 dule Loading aaAAC0: 64bit support enabled. craid.ko moduleAAC0: 64 Bit PAE enabled scsi0 : aacraid Vendor: ADAPTEC Model: Adaptec RAID5 Rev: V1.0 Type: Direct-Access ANSI SCSI revision: 02 SCSI device sda: 430510464 512-byte hdwr sectors (220421 MB) sda: Write Protect is off SCSI device sda: drive cache: write through sda:<3>aacraid: Host adapter reset request. SCSI hang ? aacraid: Host adapter appears dead scsi: Device offlined - not ready after error recovery: host 0 channel 0 id 0 lun 0 SCSI error : <0 0 0 0> return code = 0x6000000 end_request: I/O error, dev sda, sector 0 Buffer I/O error on device sda, logical block 0 scsi0 (0:0): rejecting I/O to offline device Buffer I/O error on device sda, logical block 0 scsi0 (0:0): rejecting I/O to offline device Buffer I/O error on device sda, logical block 53813807 scsi0 (0:0): rejecting I/O to offline device Buffer I/O error on device sda, logical block 53813807 scsi0 (0:0): rejecting I/O to offline device Buffer I/O error on device sda, logical block 0 unable to read partition table Attached scsi removable disk sda at scsi0, channel 0, id 0, lun 0 aacraid: Host adapter reset request. SCSI hang ? aacraid: Host adapter appears dead scsi: Device offlined - not ready after error recovery: host 0 channel 0 id 1 lun 0
I get the same problem with the Adaptec 2410SA and linux 2.6.8-2.6.8-1.521(smp/up). This is an IBM xSeries 306 Server with ServeRAID-7t SATA controller. (3.0GHz P4, 512MB RAM, 2 250GB Hitachi Deskstor SATA drives). The stock 2.6.5-1.358 works just fine. I can (and did) install an OS, but I can't boot from it with 2.6.8. Booting off another drive attached to the motherboard SATA (non-RAID), I see this in kernel logs: Red Hat/Adaptec aacraid driver (1.1.2-lk2 Aug 16 2004) ACPI: PCI interrupt 0000:03:02.0[A] -> GSI 11 (level, low) -> IRQ 11 spurious 8259A interrupt: IRQ7. AAC0: kernel 4.1.4 build 7235 AAC0: monitor 4.1.4 build 7235 AAC0: bios 4.1.0 build 7235 AAC0: serial bb8eb0fafaf001 scsi0 : aacraid Vendor: ADAPTEC Model: AAR-2410SA Strip Rev: V1.0 Type: Direct-Access ANSI SCSI revision: 02 SCSI device sda: 976558080 512-byte hdwr sectors (499998 MB) sda: Write Protect is off sda: Mode Sense: 03 00 00 00 SCSI device sda: drive cache: write through sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 > Attached scsi removable disk sda at scsi0, channel 0, id 0, lun 0 ... Adding 787144k swap on /dev/sda6. Priority:-1 extents:1 aacraid: Host adapter reset request. SCSI hang ? Debug: sleeping function called from invalid context at mm/slab.c:2000 in_atomic():0[expected: 0], irqs_disabled():1 [<0211b765>] __might_sleep+0x82/0x8c [<02146dbc>] kmem_cache_alloc+0x1d/0x4c [<22868e7c>] aac_rx_check_health+0x3e/0x116 [aacraid] [<228641a2>] aac_eh_reset+0x24/0x2ad [aacraid] [<2284d6cc>] scsi_try_host_reset+0x106/0x2e8 [scsi_mod] [<2284d9ff>] scsi_eh_host_reset+0x44/0xca [scsi_mod] [<2284de8d>] scsi_eh_ready_devs+0x39/0x4d [scsi_mod] [<2284e1c0>] scsi_unjam_host+0x24c/0x25d [scsi_mod] [<2284e42e>] scsi_error_handler+0x25d/0x2b4 [scsi_mod] [<2284e1d1>] scsi_error_handler+0x0/0x2b4 [scsi_mod] [<021041d9>] kernel_thread_helper+0x5/0xb aacraid: Host adapter appears dead scsi: Device offlined - not ready after error recovery: host 0 channel 0 id 0 lun 0 SCSI error : <0 0 0 0> return code = 0x6000000 end_request: I/O error, dev sda, sector 976558072 Buffer I/O error on device sda, logical block 122069759 scsi0 (0:0): rejecting I/O to offline device Buffer I/O error on device sda, logical block 122069759 Please let me know if I should file another bug, but though the cards are different, the error seems to be at least related. Ryan Tokarek System Administrator Wolfram Research Inc.
I should also draw attention to the line: SCSI device sda: 976558080 512-byte hdwr sectors (499998 MB) expr 976558080 / 2 / 1024 476835 (MB) The RAID controller reports a 465GB container at startup. Linux reports the SCSI device with the right number of blocks (to equal 465GB), but reports 499998MB (488GB) for the size. Perhaps I'm only drawing attention to my own ignorance, but it seems to me that those numbers should match up better. Ryan Tokarek System Administrator Wolfram Research Inc.
Having a similar problem with kernels 2.6.8-1.521 and 2.6.9-1.640. Also, whatever Fedora Core 3, Test 3 is running fares no better. Hardware: Adaptec 2410SA SATA Raid controller on a Supermicro P4SCi motherboard w/P4-2.8GHz Problem: Panics on startup when booting off the 2410SA controller running raid5. Messages: aacraid: Host adapter reset request. SCSI hang? Debug: sleeping function called from invalid context at mm/slab.c:2063 aacraid: Host adapter appears dead.
Yes, mine's broke as well on Adaptec 2410SA
mass update for old bugs: Is this still a problem in the 2.6.9 based kernel update ?
I can confirm that kernel 2.6.9-1.6 has the same problem on a Dell PowerEdge 750. I get I/O errors from the controller then it downs the device. This is with two drives running in RAID. Snippet from dmesg on a good boot in 2.6.5-1.358 (for card info): SCSI subsystem initialized Red Hat/Adaptec aacraid driver (1.1.2-lk1 May 8 2004) AAC0: kernel 4.1.4 build 7401 AAC0: monitor 4.1.4 build 7401 AAC0: bios 4.1.0 build 7401 AAC0: serial bdf001fafaf001 scsi0 : aacraid Vendor: DELL Model: CERC Mirror Rev: V1.0 Type: Direct-Access ANSI SCSI revision: 02 SCSI device sda: 312432384 512-byte hdwr sectors (159965 MB) sda: Write Protect is off sda: Mode Sense: 03 00 00 00 SCSI device sda: drive cache: write through sda: sda1 sda2 sda3 sda4 < sda5 > Attached scsi removable disk sda at scsi0, channel 0, id 0, lun 0 libata version 1.02 loaded. ata_piix version 1.02 ata_piix: combined mode detected ACPI: No IRQ known for interrupt pin A of device 0000:00:1f.2 ata: 0x1f0 IDE port busy PCI: Setting latency timer of device 0000:00:1f.2 to 64 ata1: SATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xFEA8 irq 15 ata1: SATA port has no device. ata1: thread exiting scsi1 : ata_piix If I boot (via PXE) the FC3 CD, anaconda will find the existing installation on /dev/sda1, but choosing it for upgrade causes IO erros (and a continual beep). Interestingly, the PE750 beeps for a brief period with a working kernel too, during the activating swap / checking root fs part of bootup. [OT] I am guessing that this will make it somewhat difficult to upgrade to FC3 even after a fix is out, would I need to rebuild an initrd image after the problem is sorted? Anyway, yes, this is still a problem at least for 2.6.9-1.6 in my case.
I seem to have the same problem with kernel 2.6.10-1.9_FC2smp and 2.6.10-1.8_FC2smp. Have not had it with older releases. Have no kernel messages because it only affects the /var partition and I have not yet attached a serial console to it. Though, the filesystem errors I see are the same. Machine is a Dell PowerEdge 2650 with Perc 3/DI controller. Latest machine and controller Bios applied. Crash occurs every 20 to 30 hours.
Fedora Core 2 has now reached end of life, and no further updates will be provided by Red Hat. The Fedora legacy project will be producing further kernel updates for security problems only. If this bug has not been fixed in the latest Fedora Core 2 update kernel, please try to reproduce it under Fedora Core 3, and reopen if necessary, changing the product version accordingly. Thank you.
The system I was reporting from was experiencing the fault on FC3's initial release kernel, ie it would not boot to install because (it appears) of this bug. Thus, it is a but that appears to apply to FC3 also. However, the machine in question is now in production use at my client's co-lo facility so I can not verify if later kernel releases have fixed the problem, nor can I easily access it to do any testing. As such I won't re-open the bug myself since I can't gather the right first-hand evidence to support it. If anyone else here on the cc list still has this problem please re-open the bug, or if you have found a solution, please add it in a comment here (eg, does fc4test[?] work)?