Bug 150559
Summary: | Can't install RHEL3 on system with Adaptec AAR 1210SA SATA controller (sata_sil - siimage problem) | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Peter Bieringer <pb> | ||||
Component: | kernel | Assignee: | John W. Linville <linville> | ||||
Status: | CLOSED ERRATA | QA Contact: | |||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 3.0 | CC: | peterm, petrides, riel | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | RHSA-2006-0144 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2006-03-15 15:52:02 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 168424 | ||||||
Attachments: |
|
Description
Peter Bieringer
2005-03-08 12:08:40 UTC
Does the siimage driver not work with that device? It seems to think that it does. The siimage source contains explicit references to supporting sata, fwiw... No, it doesn't work with the chipset on the controller, got same result as shown URL above. * FAQ Items: * If you are using Marvell SATA-IDE adapters with Maxtor drives * ensure the system is set up for ATA100/UDMA5 not UDMA6. * * If you are using WD drives with SATA bridges you must set the * drive to "Single". "Master" will hang * The above is taken from the siimage driver source. I don't know whether or not it applies to your setup, but I thought it was worth mentioning... Server has 2 Seagate drives: # cat /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: ATA Model: HDS722580VLSA80 Rev: V32O Type: Direct-Access ANSI SCSI revision: 05 Host: scsi1 Channel: 00 Id: 00 Lun: 00 Vendor: ATA Model: HDS722580VLSA80 Rev: V32O Type: Direct-Access ANSI SCSI revision: 05 I can't adjust any UDMA settings in the Adaptec BIOS configuration. Here the log of a successful boot with modified kernel like show above: Mar 1 10:44:56 pib PCI: Found IRQ 11 for device 00:14.0 Mar 1 10:44:56 pib PCI: Sharing IRQ 11 with 00:10.1 Mar 1 10:44:56 pib ata1: SATA max UDMA/100 cmd 0xF8843080 ctl 0xF884308A bmdma 0xF8843000 irq 11 Mar 1 10:44:56 pib ata2: SATA max UDMA/100 cmd 0xF88430C0 ctl 0xF88430CA bmdma 0xF8843008 irq 11 Mar 1 10:44:56 pib ata1: dev 0 cfg 49:2f00 82:74eb 83:7fea 84:4023 85:74e9 86:3c02 87:4023 88:203f Mar 1 10:44:56 pib ata1: dev 0 ATA, max UDMA/100, 160836480 sectors: lba48 Mar 1 10:44:56 pib ata1: dev 0 configured for UDMA/100 Mar 1 10:44:56 pib ata2: dev 0 cfg 49:2f00 82:74eb 83:7fea 84:4023 85:74e9 86:3c02 87:4023 88:203f Mar 1 10:44:56 pib ata2: dev 0 ATA, max UDMA/100, 160836480 sectors: lba48 Mar 1 10:44:56 pib ata2: dev 0 configured for UDMA/100 Mar 1 10:44:56 pib scsi0 : sata_sil Mar 1 10:44:56 pib scsi1 : sata_sil Mar 1 10:44:56 pib Vendor: ATA Model: HDS722580VLSA80 Rev: V32O Mar 1 10:44:56 pib Type: Direct-Access ANSI SCSI revision: 05 Mar 1 10:44:56 pib Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 Mar 1 10:44:56 pib SCSI device sda: 160836480 512-byte hdwr sectors (82348 MB) Mar 1 10:44:56 pib Partition check: Mar 1 10:44:56 pib sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 > Mar 1 10:44:56 pib Vendor: ATA Model: HDS722580VLSA80 Rev: V32O Mar 1 10:44:56 pib Type: Direct-Access ANSI SCSI revision: 05 Mar 1 10:44:56 pib Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0 Mar 1 10:44:56 pib SCSI device sdb: 160836480 512-byte hdwr sectors (82348 MB) Mar 1 10:44:56 pib sdb: sdb1 sdb2 sdb3 sdb4 < sdb5 sdb6 sdb7 sdb8 sdb9 > As I alread told, it's necessary to disable siimage taking over the shown PCI IDs or be disabled by boot command line. Disabling siimage completely or even removing the PCI IDs from it (as in RHEL4) is likely to be very unwelcome in RHEL3. Ostensibly the siimage driver supports at least some versions of the devices w/ PCI ID 9005:0240, so RHEL3 will need to continue supporting those devices w/ siimage. A command line option may be possible, but upstream does not offer that. As a result, I'd prefer to get a working siimage before we facilitate disabling it. I would like to start by updating the siimage driver to match what is currently upstream in 2.4. Kernels w/ that patch are available here: http://people.redhat.com/linville/kernels/rhel3/ If that doesn't work, I'll probably try taking what is currently in 2.6. After that...we'll figure it out... :-) BTW, if you install that kernel on the box installed w/ sata_sil, you may experience some problems relating to /dev/hda vs. /dev/sda, etc. But, at least I think we should be able to determine whether or not the driver is "working"... Please give the kernels above a try and let me know the results. Thanks! Won't help, here are the complete results: 2.4.21-27.0.2.EL: Adaptec AAR-1210SA: IDE controller at PCI slot 00:14.0 PCI: Found IRQ 11 for device 00:14.0 PCI: Sharing IRQ 11 with 00:10.1 Adaptec AAR-1210SA: chipset revision 2 Adaptec AAR-1210SA: not 100% native mode: will probe irqs later ide2: MMIO-DMA , BIOS settings: hde:pio, hdf:pio ide3: MMIO-DMA , BIOS settings: hdg:pio, hdh:pio hdd: LITE-ON CD-ROM LTN-529S, ATAPI CD/DVD-ROM drive hde: HDS722580VLSA80, ATA DISK drive blk: queue c041ae98, I/O limit 4095Mb (mask 0xffffffff) hdg: HDS722580VLSA80, ATA DISK drive blk: queue c041b364, I/O limit 4095Mb (mask 0xffffffff) ide1 at 0x170-0x177,0x376 on irq 15 ide2 at 0xf880d080-0xf880d087,0xf880d08a on irq 11 ide3 at 0xf880d0c0-0xf880d0c7,0xf880d0ca on irq 11 hde: attached ide-disk driver. hde: lost interrupt 2.4.21-29.EL.jwltest.5: Adaptec AAR-1210SA: IDE controller at PCI slot 00:14.0 PCI: Found IRQ 11 for device 00:14.0 PCI: Sharing IRQ 11 with 00:10.1 Adaptec AAR-1210SA: chipset revision 2 Adaptec AAR-1210SA: not 100% native mode: will probe irqs later ide2: MMIO-DMA , BIOS settings: hde:pio, hdf:pio ide3: MMIO-DMA , BIOS settings: hdg:pio, hdh:pio hdd: LITE-ON CD-ROM LTN-529S, ATAPI CD/DVD-ROM drive hde: HDS722580VLSA80, ATA DISK drive blk: queue c041ae98, I/O limit 4095Mb (mask 0xffffffff) hdg: HDS722580VLSA80, ATA DISK drive blk: queue c041b364, I/O limit 4095Mb (mask 0xffffffff) ide1 at 0x170-0x177,0x376 on irq 15 ide2 at 0xf880d080-0xf880d087,0xf880d08a on irq 11 ide3 at 0xf880d0c0-0xf880d0c7,0xf880d0ca on irq 11 hde: attached ide-disk driver. hde: lost interrupt 2.4.21-27.EL.AE.1 (siimage completly disabled) SCSI subsystem driver Revision: 1.00 libata version 1.02 loaded. sata_sil version 0.54 PCI: Found IRQ 11 for device 00:14.0 PCI: Sharing IRQ 11 with 00:10.1 ata1: SATA max UDMA/100 cmd 0xF8843080 ctl 0xF884308A bmdma 0xF8843000 irq 11 ata2: SATA max UDMA/100 cmd 0xF88430C0 ctl 0xF88430CA bmdma 0xF8843008 irq 11 ata1: dev 0 cfg 49:2f00 82:74eb 83:7fea 84:4023 85:74e9 86:3c02 87:4023 88:203f ata1: dev 0 ATA, max UDMA/100, 160836480 sectors: lba48 ata1: dev 0 configured for UDMA/100 ata2: dev 0 cfg 49:2f00 82:74eb 83:7fea 84:4023 85:74e9 86:3c02 87:4023 88:203f ata2: dev 0 ATA, max UDMA/100, 160836480 sectors: lba48 ata2: dev 0 configured for UDMA/100 scsi0 : sata_sil scsi1 : sata_sil Vendor: ATA Model: HDS722580VLSA80 Rev: V32O Type: Direct-Access ANSI SCSI revision: 05 Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 SCSI device sda: 160836480 512-byte hdwr sectors (82348 MB) Partition check: sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 > Vendor: ATA Model: HDS722580VLSA80 Rev: V32O Type: Direct-Access ANSI SCSI revision: 05 Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0 SCSI device sdb: 160836480 512-byte hdwr sectors (82348 MB) sdb: sdb1 sdb2 sdb3 sdb4 < sdb5 sdb6 sdb7 sdb8 sdb9 > 2.6.9-5.EL: Loading scsi_mod.ko module SCSI subsystem initialized Loading sd_mod.ko module Loading libata.ko module Loading sata_sil.ko module ACPI: PCI interrupt 0000:00:14.0[A] -> GSI 11 (level, low) -> IRQ 11 ata1: SATA max UDMA/100 cmd 0xF8804080 ctl 0xF880408A bmdma 0xF8804000 irq 11 ata2: SATA max UDMA/100 cmd 0xF88040C0 ctl 0xF88040CA bmdma 0xF8804008 irq 11 ata1: dev 0 ATA, max UDMA/100, 160836480 sectors: lba48 ata1: dev 0 configured for UDMA/100 scsi0 : sata_sil ata2: dev 0 ATA, max UDMA/100, 160836480 sectors: lba48 ata2: dev 0 configured for UDMA/100 scsi1 : sata_sil Vendor: ATA Model: HDS722580VLSA80 Rev: V32O Type: Direct-Access ANSI SCSI revision: 05 SCSI device sda: 160836480 512-byte hdwr sectors (82348 MB) SCSI device sda: drive cache: write back sda: sda1 sda2 sda3 Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 Vendor: ATA Model: HDS722580VLSA80 Rev: V32O Type: Direct-Access ANSI SCSI revision: 05 SCSI device sdb: 160836480 512-byte hdwr sectors (82348 MB) SCSI device sdb: drive cache: write back sdb: sdb1 sdb2 sdb3 Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0 Peter, I would like to try the siimage driver from upstream 2.6 as well. I have pre-built test kernels here: http://people.redhat.com/linville/kernels/rhel3/ Please give those a try and let me know the results. I appreciate your patience and cooperation! You have the luck that I own two similar boxes, one productive with my patched kernel, one currently installing RHEL4 as alternative, but able to boot at least the static part of an RHEL3 kernel. Your newest kernel won't help also, same issue. Adaptec AAR-1210SA: IDE controller at PCI slot 00:14.0 PCI: Found IRQ 11 for device 00:14.0 PCI: Sharing IRQ 11 with 00:10.1 Adaptec AAR-1210SA: chipset revision 2 Adaptec AAR-1210SA: not 100% native mode: will probe irqs later ide2: MMIO-DMA , BIOS settings: hde:pio, hdf:pio ide3: MMIO-DMA , BIOS settings: hdg:pio, hdh:pio hdd: LITE-ON CD-ROM LTN-529S, ATAPI CD/DVD-ROM drive hde: HDS722580VLSA80, ATA DISK drive hdg: HDS722580VLSA80, ATA DISK drive ide1 at 0x170-0x177,0x376 on irq 15 ide2 at 0xf880d080-0xf880d087,0xf880d08a on irq 11 ide3 at 0xf880d0c0-0xf880d0c7,0xf880d0ca on irq 11 hde: attached ide-disk driver. hde: lost interrupt hde: lost interrupt hde: lost interrupt hde: host protected area => 1 hde: lost interrupt hde: 160836480 sectors (82348 MB) w/7938KiB Cache, CHS=10011/255/63 hde: lost interrupt hde: lost interrupt hdg: attached ide-disk driver. hdg: lost interrupt hdg: lost interrupt hdg: lost interrupt hdg: host protected area => 1 hdg: lost interrupt hdg: 160836480 sectors (82348 MB) w/7938KiB Cache, CHS=10011/255/63 hdg: lost interrupt hdg: lost interrupt ide-floppy driver 0.99.newide hdg: lost interrupt ide-floppy driver 0.99.newide Partition check: hde:<3>hde: lost interrupt hde1 hde2 hde3 hde4 <<3>hde: lost interrupt hde5<3>hde: lost interrupt hde6<3>hde: lost interrupt hde7 Looks like one really have to extend siimage with "disable" capability using boot commmand line switch. Ok, ok... :-) I have a patch that allows "siimage=off" on the kernel command-line. The kernels are available at the same link as before (see comment 7). No promises! I'm not sure this will be welcome in RHEL3... Please test the kernels and let me know if the command-line option is working. Thanks! Works fine for me now (booted on my productive RHEL3 server, because I'm not able to find a working depmod.old for RHEL4). I'm really wondering about that until now nobody has problems using this Adaptec controller (available since Q1/2003) and 2.4.x Linux kernels because siimage is compiled static by default. So nobody can use it until recompiling the kernel - strange. Re-opening in order to follow RH's process... :-) I presume that no one has had problems because the siimage driver is supposed to work with your card. That fact that it apparently does not is the real concern. As I said in comment 9, this is likely not the preferred solution for RHEL3. I'll have to consult with some people internally... Perhaps my controller is a too new release, here some chipset data: Silicon Image SATALINK Sil3112ACT144 MQA7299.1 0434 1.21 And controller is labeled as AAR-1210SA RESPIN RAID CONTROL 2043100 A 0501 Mainboard is a EPIA-PD with VIA chipset. There has been chatter elsewhere about the 1210SA firmware turning off interrupts. Matthew, can you be more specific? Could you provide some pointers for where to look? Do you know of any proposed fixes? Thanks! John, it was not a very direct reference - perhaps I should not have posted. In any case, this is the post to which I was referring. https://www.redhat.com/archives/fedora-test-list/2004-January/msg00234.html One new question: does kernel-2.4.21-31.EL.jwltest.16 already contain latest security fixes of kernel-2.4.21-27.0.4.EL? If not, can you create a new one? Thank you! I rebased the test kernels, so kernel-2.4.21-32.2.jwltest.17 (available now) should serve for you. Specifically, I can confirm that the 2.4.21-32.EL base (which is the latest U5 beta) contains all fixes that were recently released in 2.4.21-27.0.4.EL. Thank you for very fast rebuilding, installed, working. Would at least this fix included in U5? It's small, do not break anything. Peter, U5 is already closed (and will be released in about 2 weeks). As I suspected, the proposed addition of the "siimage=off" parameter was unpopular. Peter, have you tried using "ide2=noprobe ide3=noprobe" instead of "siimage=off"? Please try that and let me know the results. (Don't forget to remove "siimage=off".) Thanks! Afair I did test this (I can't do this anymore at the moment, because both servers are productive). The "noprobe" comes too late, PCI IDs already catched by siimage. If the siimage=off parameter is unpopular, then you have only the choice to fix PCI IDs (as in RHEL4) or move siimage from static built-in to module. Peter, I have another alternative, suggested by Alan Cox. I have added a blacklist facility to siimage that will exclude your specific card. Would you mind trying the new kernels at the location from comment 7? Please post the results. Thanks! Unfortunately, it's not working: root (hd0,0) Filesystem type is ext2fs, partition type 0xfd kernel /vmlinuz-2.4.21-32.3.EL.jwltest.22 ro root=/dev/md1 panic=60 vga=extende d siimage=off console=tty0 console=ttyS0,38400n8 fastboot [Linux-bzImage, setup=0x1400, size=0x1308ed] initrd /initrd-2.4.21-32.3.EL.jwltest.22.img [Linux-initrd @ 0x37faf000, 0x409c1 bytes] Linux version 2.4.21-32.3.EL.jwltest.22 (bhcompile.redhat.com) (gcc 5 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 00000000000a0000 (usable) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000003eff0000 (usable) BIOS-e820: 000000003eff0000 - 000000003eff3000 (ACPI NVS) BIOS-e820: 000000003eff3000 - 000000003f000000 (ACPI data) BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved) 111MB HIGHMEM available. 896MB LOWMEM available. NX protection not present; using segment protection On node 0 totalpages: 258032 zone(0): 4096 pages. zone(1): 225280 pages. zone(2): 28656 pages. Kernel command line: ro root=/dev/md1 panic=60 vga=extended siimage=off consolet Initializing CPU#0 Detected 1002.280 MHz processor. Console: colour VGA+ 80x50 Calibrating delay loop... 1998.84 BogoMIPS Page-cache hash table entries: 262144 (order: 8, 1024 KB) Page-pin hash table entries: 65536 (order: 6, 256 KB) Dentry cache hash table entries: 131072 (order: 8, 1024 KB) Inode cache hash table entries: 65536 (order: 7, 512 KB) Buffer cache hash table entries: 65536 (order: 6, 256 KB) Memory: 1007820k/1032128k available (1543k kernel code, 20856k reserved, 1071k ) zapping low mappings. Mount cache hash table entries: 512 (order: 0, 4096 bytes) CPU: L1 I Cache: 64K (32 bytes/line), D cache 64K (32 bytes/line) CPU: L2 Cache: 64K (32 bytes/line) CPU: Centaur VIA Nehemiah stepping 08 Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. POSIX conformance testing by UNIFIX Process timing init...done. mtrr: v1.40 (20010327) Richard Gooch (rgooch.au) mtrr: detected mtrr type: Intel PCI: PCI BIOS revision 2.10 entry at 0xfad30, last bus=1 PCI: Using configuration type 1 PCI: Probing PCI hardware PCI: Using IRQ router VIA [1106/3177] at 00:11.0 PCI: Found IRQ 10 for device 00:11.1 PCI: Sharing IRQ 10 with 00:10.0 PCI: Sharing IRQ 10 with 00:12.0 isapnp: Scanning for PnP cards... isapnp: No Plug & Play device found Linux NET4.0 for Linux 2.4 Based upon Swansea University Computer Society NET3.039 Initializing RT netlink socket apm: BIOS version 1.2 Flags 0x07 (Driver version 1.16) Total HugeTLB memory allocated, 0 Starting kswapd allocated 32 pages and 32 bhs reserved for the highmem bounces VFS: Disk quotas vdquot_6.5.1 aio_setup: num_physpages = 64508 aio_setup: sizeof(struct page) = 56 Hugetlbfs mounted. Detected PS/2 Mouse Port. pty: 2048 Unix98 ptys configured Serial driver version 5.05c (2001-07-08) with MANY_PORTS MULTIPORT SHARE_IRQ SEd ttyS0 at 0x03f8 (irq = 4) is a 16550A ttyS1 at 0x02f8 (irq = 3) is a 16550A Real Time Clock Driver v1.10e NET4: Frame Diverter 0.46 RAMDISK driver initialized: 256 RAM disks of 8192K size 1024 blocksize Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx VP_IDE: IDE controller at PCI slot 00:11.1 PCI: Found IRQ 10 for device 00:11.1 PCI: Sharing IRQ 10 with 00:10.0 PCI: Sharing IRQ 10 with 00:12.0 VP_IDE: chipset revision 6 VP_IDE: not 100% native mode: will probe irqs later ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx VP_IDE: VIA vt8235 (rev 00) IDE UDMA133 controller on pci00:11.1 ide0: BM-DMA at 0xd000-0xd007, BIOS settings: hda:pio, hdb:pio ide1: BM-DMA at 0xd008-0xd00f, BIOS settings: hdc:pio, hdd:DMA siimage: Adaptec 1210-SA not supported. hdd: LITE-ON CD-ROM LTN-529S, ATAPI CD/DVD-ROM drive ide1 at 0x170-0x177,0x376 on irq 15 ide-floppy driver 0.99.newide ide-floppy driver 0.99.newide md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. Initializing Cryptographic API NET4: Linux TCP/IP 1.0 for NET4.0 IP: routing cache hash table of 8192 buckets, 64Kbytes TCP: Hash tables configured (established 262144 bind 65536) Linux IP multicast router 0.06 plus PIM-SM Initializing IPsec netlink socket NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. RAMDISK: Compressed image found at block 0 Freeing initrd memory: 258k freed VFS: Mounted root (ext2 filesystem). Red Hat nash version 3.5.13 starting Loading scsi_mod.o module SCSI subsystem driver Revision: 1.00 Loading sd_mod.o module Loading libata.o module Loading sata_sil.o module /lib/sata_sil.o: init_module: Hint: insmod ermd: raid1 personality registered as nr 3 rors can be caused by incorrect module parameters, including invJournalled Blocd alid IO or IRQ parameters. You may find more information in syslog or themd: Autodetecting RAID arra. output from dmemd: autorun ... sg ERROR: /bin/md: ... autorun DONE. insmod exited abmd: Autodetecting RAID arrays. normally! Loadimd: autorun ... ng raid1.o modulmd: ... autorun DONE. e Loading jbd.omd: Autodetecting RAID arrays. module Loadingmd: autorun ... ext3.o module md: ... autorun DONE. Mounting /proc fmd: Autodetecting RAID arrays. ilesystem md: autorun ... md: ... autorun DONE. md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. Creating block devices EXT2-fs: unable to read superblock isofs_read_super: bread failed, dev=09:01, iso_blknum=16, block=32 EXT3-fs: unable to read superblock Kernel panic: VFS: Unable to mount root fs on 09:01 Rebooting in 60 seconds.. With the previous kernel following is shown: ... VFS: Mounted root (ext2 filesystem). Red Hat nash version 3.5.13 starting Loading scsi_mod.o module SCSI subsystem driver Revision: 1.00 Loading sd_mod.o module Loading libata.o module Loading sata_sil.o module PCI: Found IRQ 11 for device 00:14.0 PCI: Sharing IRQ 11 with 00:10.1 ata1: SATA max UDMA/100 cmd 0xF8845080 ctl 0xF884508A bmdma 0xF8845000 irq 11 ata2: SATA max UDMA/100 cmd 0xF88450C0 ctl 0xF88450CA bmdma 0xF8845008 irq 11 ata1: dev 0 ATA, max UDMA/100, 160836480 sectors: lba48 ata1: dev 0 configured for UDMA/100 ata2: dev 0 ATA, max UDMA/100, 160836480 sectors: lba48 ata2: dev 0 configured for UDMA/100 scsi0 : sata_sil scsi1 : sata_sil Vendor: ATA Model: HDS722580VLSA80 Rev: V32O Type: Direct-Access ANSI SCSI revision: 05 Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 SCSI device sda: 160836480 512-byte hdwr sectors (82348 MB) Partition check: sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 sda10 sda11 sda12 > Vendor: ATA Model: HDS722580VLSA80 Rev: V32O Type: Direct-Access ANSI SCSI revision: 05 Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0 SCSI device sdb: 160836480 512-byte hdwr sectors (82348 MB) So booting the siimage-blacklist-kernel, sata_sil is not able to catch the PCI devices, following messages are missing after "Loading sata_sil.o module" PCI: Found IRQ 11 for device 00:14.0 PCI: Sharing IRQ 11 with 00:10.1 Strange... Created attachment 119003 [details]
jwltest-siimage-sa1210.patch
Well, I finally got some hardware that reproduced the problem. Using that, I think I have something that works. Patched kernels are available at the same location as in comment 7. I would appreciate some testing. Be warned that with these kernels the siimage IDE driver (i.e. NOT the sata_sil libata driver) will claim your drives, potentially changing their names back to hdX instead of sdX. Even if you do not want to run them that way, I would appreciate some testing just to confirm that this patch is working. BTW, please remember to remove "siimage=off" from your kernel command line while testing...thanks! Thank you, because machine is productive, I can try this new kernel hopefully in next 2 weeks. Installed, rebooted, works! Thank you very much for working on it. I only needed to adjust fstab relating to swap devices, because all other partitions are md devices. BTW: which magic is working on RHEL4? swap devices created during installation (anaconda) are labeled, but mkswap does not support option -L Thanks for the feedback! I'll try to get this pushed upstream and in RHEL ASAP. RE: RHEL4 and labeled swap devices, later versions of mkswap seem to support "-L" as an option. Perhaps you were looking at the RHEL3 man page? /me was wondering why the name on the IPv6 HOWTO looked so familiar... Relating to the mkswap issue, it's already known: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=152026 A fix for this problem has just been committed to the RHEL3 U7 patch pool this evening (in kernel version 2.4.21-37.5.EL). Can you please update me whether today released U6 update kernel-2.4.21-37.0.1.EL https://rhn.redhat.com/errata/RHSA-2006-0140.html include all the fixes which 2.4.21-37.2.EL.jwltest.55 (currently installed) had already included? Or should I now install kernel-2.4.21-38.EL from beta channel? After reading changelog of kernel-2.4.21-38.EL I tried this version, but it has a major problem: Following was cached via serial console, some chars are missing: SCSI subsystem driver Revision: 1.00 Loading sd_mod.o module Loading libata.o module Loading sata_sil.o module /lib/sata_sil.o: init_module: Hint: insmod ermd: raid1 personality registered as nr 3 rors can be caused by incorrect module parameters, including invJournalled Blocd alid IO or IRQ parameters. You may find more information in syslog or themd: Autodetecting RAID arra. output from dme [events: 0000001b] sg ERROR: /bin/ [events: 0000001b] insmod exited ab [events: 0000001b] normally! Loadi [events: 0000001b] ng raid1.o modul [events: 00000021] e Loading jbd.o [events: 00000021] module Loading [events: 00000026] ext3.o module [events: 00000026] Mounting /proc f [events: 00000026] ...later on filesystem check step: / has gone 92 days without being checked, check forced. raid1: Disk failure on hdg2, disabling device. Ooops...only a SysRQ->Crash->"Automatic reboot after panic" brings machine back to life (puuhh...it was the productive one, remote rebooted...) I rebooted the old 2.4.21-37.2.EL.jwltest.55 again and after some while (filesystem check during RAID1 reconstruction need a lot of time), system is up again. Note that I get confused...the libata message also occurs in -2.4.21-37.2.EL.jwltest.55 and doesn't cause the problem - sorry for disturbing. Had to check further on. So, system rebooted, still syncing mirror. As far as I remember I never had to resync the mirrors *after* updating to provided kernel with non-libsata support for this controller. Now the rebuild is very very slow, top shows: CPU states: cpu user nice system irq softirq iowait idle total 7.2% 0.0% 3.3% 88.8% 0.5% 0.0% 0.0% # cat /proc/mdstat Personalities : [raid1] read_ahead 1024 sectors Event: 10 md9 : active raid1 hde12[0] hdg12[1] 7823552 blocks [2/2] [UU] [=====>...............] resync = 29.6% (2317824/7823552) finish=89.4min speed=1024K/sec md8 : active raid1 hde11[0] hdg11[1] 2048192 blocks [2/2] [UU] resync=DELAYED md7 : active raid1 hde10[0] hdg10[1] 19542976 blocks [2/2] [UU] resync=DELAYED md6 : active raid1 hde9[0] hdg9[1] 7823552 blocks [2/2] [UU] resync=DELAYED md5 : active raid1 hde8[0] hdg8[1] 7823552 blocks [2/2] [UU] resync=DELAYED md4 : active raid1 hde7[0] hdg7[1] 7823552 blocks [2/2] [UU] resync=DELAYED md3 : active raid1 hde6[0] hdg6[1] 7823552 blocks [2/2] [UU] resync=DELAYED md2 : active raid1 hde5[0] hdg5[1] 2048192 blocks [2/2] [UU] resync=DELAYED md1 : active raid1 hde2[0] hdg2[1] 4096448 blocks [2/2] [UU] resync=DELAYED md0 : active raid1 hde1[0] hdg1[1] 104320 blocks [2/2] [UU] unused devices: <none> # sysctl -a |grep raid dev.raid.speed_limit_max = 10000 dev.raid.speed_limit_min = 100 md: syncing RAID array md0 md: minimum _guaranteed_ reconstruction speed: 100 KB/sec/disc. md: using maximum available idle IO bandwith (but not more than 10000 KB/sec) for reconstruction. md: using 124k window, over a total of 104320 blocks. vmstat shows: procs memory swap io system cpu r b swpd free buff cache si so bi bo in cs us sy wa id 3 1 0 376388 58596 76780 0 0 314 316 3676 684 46 50 4 0 3 2 0 375496 59208 76816 0 0 125 87 523 138 61 39 0 0 Looks like the fixed old fashioned driver causes very high interrupt load and can never reach proper sync speed. I played around with hdparm to enabled DMA (because it was shown as "off"). This was not a good idea, afterwards, io errors occur and load increases fast. Had to reboot via SysRQ. I will report in some hours, whether hdg2 causes problems with 2.4.21-37.2.EL.jwltest.55 during sync, it's still delayed. Peter, yesterday's release of 2.4.21-37.0.1.EL kernel does not contain most of the stuff that has gone into U7 (-38.EL), hence the lower-numbered version. There will be a U7 respin soon (-39.EL) that incorporates the security fixes released in the post-U6 erratum plus a few U7 regression fixes. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0144.html |