Bug 150559

Summary: Can't install RHEL3 on system with Adaptec AAR 1210SA SATA controller (sata_sil - siimage problem)
Product: Red Hat Enterprise Linux 3 Reporter: Peter Bieringer <pb>
Component: kernelAssignee: John W. Linville <linville>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: peterm, petrides, riel
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2006-0144 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-03-15 15:52:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 168424    
Attachments:
Description Flags
jwltest-siimage-sa1210.patch none

Description Peter Bieringer 2005-03-08 12:08:40 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.6) Gecko/20050225 Firefox/1.0.1

Description of problem:
It's impossible to install RHEL3 on a system with Adaptec 1210-SA SATA controller using kernels delivered on installation CD (until U4)

Reason: kernel driver siimage (built-in static, not as module) takes over the control about the SATA interface before the (since U3 delivered) sata_sil module would be able to the chance.



Version-Release number of selected component (if applicable):
kernel-2.4.21-27.0.2.EL (and lower)

How reproducible:
Always

Steps to Reproduce:
1. Get system with Adaptec AAR 1210SA SATA controller
2. Install RHEL3U4

  

Actual Results:  kernel runs in several timeouts during booting resulting in that no device for installation will be found.

Expected Results:  Proper installation.

Additional info:

Since siimage is static compiled in, I see the only chance to avoid a break in RHEL's installation conformation that one add a kernel switch into the siimage code to disable this static module like:

Boot: linux siimage=off (or similiar).

I've digged around with Google and it looks like that the reason is that siimage catched devices with PCI ids, which it shouldn't catch.

See also:
https://www.redhat.com/archives/taroon-list/2004-September/msg00062.html


Note: RHEL4 works, but don't help here because RHEL3 is required

Workaround: rebuild kernel with disabled siimage static module, create own boot CD or boot from network using this kernel (and after installation, install kernel before next reboot...hard work, needed some hours for a successful installation).

--- kernel-2.4.21-27.0.2.EL.spec        2005-02-28 11:47:35.000000000 -0500
+++ kernel-2.4.21-27.0.2.EL.1.spec      2005-02-28 11:54:17.000000000 -0500
@@ -1816,6 +1816,12 @@
 # since make mrproper wants to wipe out .config files, we move our mrproper
 # up before we copy the config files around.
     cp configs/kernel-%{kversion}-$Config.config .config
+
+    # Disable builtin CONFIG_BLK_DEV_SIIMAGE
+    echo "Disable siimage driver"
+    mv .config .config.orig
+    cat .config.orig | sed 's/CONFIG_BLK_DEV_SIIMAGE=y/CONFIG_BLK_DEV_SIIMAGE=n/g' >.config
+
     # make sure EXTRAVERSION says what we want it to say
     perl -p -i -e "s/^EXTRAVERSION.*/EXTRAVERSION = -%{release}$2/" Makefile


# lspci -vv

00:14.0 RAID bus controller: Silicon Image, Inc. (formerly CMD Technology Inc) Adaptec AAR-1210SA SATA HostRAID Controller (rev 02) (prog-if 01)
        Subsystem: Adaptec: Unknown device 0240
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32, cache line size 08
        Interrupt: pin A routed to IRQ 11
        Region 0: I/O ports at d800 [size=8]
        Region 1: I/O ports at dc00 [size=4]
        Region 2: I/O ports at e000 [size=8]
        Region 3: I/O ports at e400 [size=4]
        Region 4: I/O ports at e800 [size=16]
        Region 5: Memory at e7402000 (32-bit, non-prefetchable) [size=512]
        Expansion ROM at <unassigned> [disabled] [size=512K]
        Capabilities: [60] Power Management version 2
                Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=2 PME-

# lspci -nv
00:14.0 Class 0104: 1095:0240 (rev 02) (prog-if 01)
        Subsystem: 9005:0240
        Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 11
        I/O ports at d800 [size=8]
        I/O ports at dc00 [size=4]
        I/O ports at e000 [size=8]
        I/O ports at e400 [size=4]
        I/O ports at e800 [size=16]
        Memory at e7402000 (32-bit, non-prefetchable) [size=512]
        Expansion ROM at <unassigned> [disabled] [size=512K]
        Capabilities: [60] Power Management version 2

Comment 1 John W. Linville 2005-03-08 21:49:57 UTC
Does the siimage driver not work with that device?  It seems to think
that it does.  The siimage source contains explicit references to
supporting sata, fwiw...

Comment 2 Peter Bieringer 2005-03-08 21:54:37 UTC
No, it doesn't work with the chipset on the controller, got same
result as shown URL above.

Comment 3 John W. Linville 2005-03-09 15:41:09 UTC
 *  FAQ Items:
 *      If you are using Marvell SATA-IDE adapters with Maxtor drives
 *      ensure the system is set up for ATA100/UDMA5 not UDMA6.
 *
 *      If you are using WD drives with SATA bridges you must set the
 *      drive to "Single". "Master" will hang
 *

The above is taken from the siimage driver source.  I don't know
whether or not it applies to your setup, but I thought it was worth
mentioning...

Comment 4 Peter Bieringer 2005-03-09 16:58:05 UTC
Server has 2 Seagate drives:

# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: HDS722580VLSA80  Rev: V32O
  Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: HDS722580VLSA80  Rev: V32O
  Type:   Direct-Access                    ANSI SCSI revision: 05

I can't adjust any UDMA settings in the Adaptec BIOS configuration.

Here the log of a successful boot with modified kernel like show above:

Mar  1 10:44:56 pib PCI: Found IRQ 11 for device 00:14.0
Mar  1 10:44:56 pib PCI: Sharing IRQ 11 with 00:10.1
Mar  1 10:44:56 pib ata1: SATA max UDMA/100 cmd 0xF8843080 ctl 0xF884308A bmdma
0xF8843000 irq 11
Mar  1 10:44:56 pib ata2: SATA max UDMA/100 cmd 0xF88430C0 ctl 0xF88430CA bmdma
0xF8843008 irq 11
Mar  1 10:44:56 pib ata1: dev 0 cfg 49:2f00 82:74eb 83:7fea 84:4023 85:74e9
86:3c02 87:4023 88:203f
Mar  1 10:44:56 pib ata1: dev 0 ATA, max UDMA/100, 160836480 sectors: lba48
Mar  1 10:44:56 pib ata1: dev 0 configured for UDMA/100
Mar  1 10:44:56 pib ata2: dev 0 cfg 49:2f00 82:74eb 83:7fea 84:4023 85:74e9
86:3c02 87:4023 88:203f
Mar  1 10:44:56 pib ata2: dev 0 ATA, max UDMA/100, 160836480 sectors: lba48
Mar  1 10:44:56 pib ata2: dev 0 configured for UDMA/100
Mar  1 10:44:56 pib scsi0 : sata_sil
Mar  1 10:44:56 pib scsi1 : sata_sil
Mar  1 10:44:56 pib Vendor: ATA       Model: HDS722580VLSA80   Rev: V32O
Mar  1 10:44:56 pib Type:   Direct-Access                      ANSI SCSI
revision: 05
Mar  1 10:44:56 pib Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Mar  1 10:44:56 pib SCSI device sda: 160836480 512-byte hdwr sectors (82348 MB)
Mar  1 10:44:56 pib Partition check:
Mar  1 10:44:56 pib sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 >
Mar  1 10:44:56 pib Vendor: ATA       Model: HDS722580VLSA80   Rev: V32O
Mar  1 10:44:56 pib Type:   Direct-Access                      ANSI SCSI
revision: 05
Mar  1 10:44:56 pib Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0
Mar  1 10:44:56 pib SCSI device sdb: 160836480 512-byte hdwr sectors (82348 MB)
Mar  1 10:44:56 pib sdb: sdb1 sdb2 sdb3 sdb4 < sdb5 sdb6 sdb7 sdb8 sdb9 >


As I alread told, it's necessary to disable siimage taking over the shown PCI
IDs or be disabled by boot command line.

Comment 5 John W. Linville 2005-03-09 18:15:04 UTC
Disabling siimage completely or even removing the PCI IDs from it (as
in RHEL4) is likely to be very unwelcome in RHEL3.  Ostensibly the
siimage driver supports at least some versions of the devices w/ PCI
ID 9005:0240, so RHEL3 will need to continue supporting those devices
w/ siimage.

A command line option may be possible, but upstream does not offer
that.  As a result, I'd prefer to get a working siimage before we
facilitate disabling it.

I would like to start by updating the siimage driver to match what is
currently upstream in 2.4.  Kernels w/ that patch are available here:

   http://people.redhat.com/linville/kernels/rhel3/

If that doesn't work, I'll probably try taking what is currently in
2.6.  After that...we'll figure it out... :-)

BTW, if you install that kernel on the box installed w/ sata_sil, you
may experience some problems relating to /dev/hda vs. /dev/sda, etc. 
But, at least I think we should be able to determine whether or not
the driver is "working"...

Please give the kernels above a try and let me know the results.

Thanks!

Comment 6 Peter Bieringer 2005-03-10 12:35:02 UTC
Won't help, here are the complete results:

2.4.21-27.0.2.EL:

Adaptec AAR-1210SA: IDE controller at PCI slot 00:14.0
PCI: Found IRQ 11 for device 00:14.0
PCI: Sharing IRQ 11 with 00:10.1
Adaptec AAR-1210SA: chipset revision 2
Adaptec AAR-1210SA: not 100% native mode: will probe irqs later
    ide2: MMIO-DMA , BIOS settings: hde:pio, hdf:pio
    ide3: MMIO-DMA , BIOS settings: hdg:pio, hdh:pio
hdd: LITE-ON CD-ROM LTN-529S, ATAPI CD/DVD-ROM drive
hde: HDS722580VLSA80, ATA DISK drive
blk: queue c041ae98, I/O limit 4095Mb (mask 0xffffffff)
hdg: HDS722580VLSA80, ATA DISK drive
blk: queue c041b364, I/O limit 4095Mb (mask 0xffffffff)
ide1 at 0x170-0x177,0x376 on irq 15
ide2 at 0xf880d080-0xf880d087,0xf880d08a on irq 11
ide3 at 0xf880d0c0-0xf880d0c7,0xf880d0ca on irq 11
hde: attached ide-disk driver.
hde: lost interrupt



2.4.21-29.EL.jwltest.5:

Adaptec AAR-1210SA: IDE controller at PCI slot 00:14.0
PCI: Found IRQ 11 for device 00:14.0
PCI: Sharing IRQ 11 with 00:10.1
Adaptec AAR-1210SA: chipset revision 2
Adaptec AAR-1210SA: not 100% native mode: will probe irqs later
    ide2: MMIO-DMA , BIOS settings: hde:pio, hdf:pio
    ide3: MMIO-DMA , BIOS settings: hdg:pio, hdh:pio
hdd: LITE-ON CD-ROM LTN-529S, ATAPI CD/DVD-ROM drive
hde: HDS722580VLSA80, ATA DISK drive
blk: queue c041ae98, I/O limit 4095Mb (mask 0xffffffff)
hdg: HDS722580VLSA80, ATA DISK drive
blk: queue c041b364, I/O limit 4095Mb (mask 0xffffffff)
ide1 at 0x170-0x177,0x376 on irq 15
ide2 at 0xf880d080-0xf880d087,0xf880d08a on irq 11
ide3 at 0xf880d0c0-0xf880d0c7,0xf880d0ca on irq 11
hde: attached ide-disk driver.
hde: lost interrupt



2.4.21-27.EL.AE.1 (siimage completly disabled)

SCSI subsystem driver Revision: 1.00
libata version 1.02 loaded.
sata_sil version 0.54
PCI: Found IRQ 11 for device 00:14.0
PCI: Sharing IRQ 11 with 00:10.1
ata1: SATA max UDMA/100 cmd 0xF8843080 ctl 0xF884308A bmdma 0xF8843000
irq 11
ata2: SATA max UDMA/100 cmd 0xF88430C0 ctl 0xF88430CA bmdma 0xF8843008
irq 11
ata1: dev 0 cfg 49:2f00 82:74eb 83:7fea 84:4023 85:74e9 86:3c02
87:4023 88:203f
ata1: dev 0 ATA, max UDMA/100, 160836480 sectors: lba48
ata1: dev 0 configured for UDMA/100
ata2: dev 0 cfg 49:2f00 82:74eb 83:7fea 84:4023 85:74e9 86:3c02
87:4023 88:203f
ata2: dev 0 ATA, max UDMA/100, 160836480 sectors: lba48
ata2: dev 0 configured for UDMA/100
scsi0 : sata_sil
scsi1 : sata_sil
Vendor: ATA       Model: HDS722580VLSA80   Rev: V32O
Type:   Direct-Access                      ANSI SCSI revision: 05
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
SCSI device sda: 160836480 512-byte hdwr sectors (82348 MB)
Partition check:
sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 >
Vendor: ATA       Model: HDS722580VLSA80   Rev: V32O
Type:   Direct-Access                      ANSI SCSI revision: 05
Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0
SCSI device sdb: 160836480 512-byte hdwr sectors (82348 MB)
sdb: sdb1 sdb2 sdb3 sdb4 < sdb5 sdb6 sdb7 sdb8 sdb9 >




2.6.9-5.EL:

Loading scsi_mod.ko module
SCSI subsystem initialized
Loading sd_mod.ko module
Loading libata.ko module
Loading sata_sil.ko module
ACPI: PCI interrupt 0000:00:14.0[A] -> GSI 11 (level, low) -> IRQ 11
ata1: SATA max UDMA/100 cmd 0xF8804080 ctl 0xF880408A bmdma 0xF8804000
irq 11
ata2: SATA max UDMA/100 cmd 0xF88040C0 ctl 0xF88040CA bmdma 0xF8804008
irq 11
ata1: dev 0 ATA, max UDMA/100, 160836480 sectors: lba48
ata1: dev 0 configured for UDMA/100
scsi0 : sata_sil
ata2: dev 0 ATA, max UDMA/100, 160836480 sectors: lba48
ata2: dev 0 configured for UDMA/100
scsi1 : sata_sil
  Vendor: ATA       Model: HDS722580VLSA80   Rev: V32O
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 160836480 512-byte hdwr sectors (82348 MB)
SCSI device sda: drive cache: write back
 sda: sda1 sda2 sda3
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
  Vendor: ATA       Model: HDS722580VLSA80   Rev: V32O
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sdb: 160836480 512-byte hdwr sectors (82348 MB)
SCSI device sdb: drive cache: write back
 sdb: sdb1 sdb2 sdb3
Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0


Comment 7 John W. Linville 2005-03-10 23:23:52 UTC
Peter,

I would like to try the siimage driver from upstream 2.6 as well.  I
have pre-built test kernels here:

   http://people.redhat.com/linville/kernels/rhel3/

Please give those a try and let me know the results.  I appreciate
your patience and cooperation!

Comment 8 Peter Bieringer 2005-03-11 09:45:38 UTC
You have the luck that I own two similar boxes, one productive with my
patched kernel, one currently installing RHEL4 as alternative, but
able to boot at least the static part of an RHEL3 kernel.

Your newest kernel won't help also, same issue.

Adaptec AAR-1210SA: IDE controller at PCI slot 00:14.0
PCI: Found IRQ 11 for device 00:14.0
PCI: Sharing IRQ 11 with 00:10.1
Adaptec AAR-1210SA: chipset revision 2
Adaptec AAR-1210SA: not 100% native mode: will probe irqs later
    ide2: MMIO-DMA , BIOS settings: hde:pio, hdf:pio
    ide3: MMIO-DMA , BIOS settings: hdg:pio, hdh:pio
hdd: LITE-ON CD-ROM LTN-529S, ATAPI CD/DVD-ROM drive
hde: HDS722580VLSA80, ATA DISK drive
hdg: HDS722580VLSA80, ATA DISK drive
ide1 at 0x170-0x177,0x376 on irq 15
ide2 at 0xf880d080-0xf880d087,0xf880d08a on irq 11
ide3 at 0xf880d0c0-0xf880d0c7,0xf880d0ca on irq 11
hde: attached ide-disk driver.
hde: lost interrupt
hde: lost interrupt
hde: lost interrupt
hde: host protected area => 1
hde: lost interrupt
hde: 160836480 sectors (82348 MB) w/7938KiB Cache, CHS=10011/255/63
hde: lost interrupt
hde: lost interrupt
hdg: attached ide-disk driver.
hdg: lost interrupt
hdg: lost interrupt
hdg: lost interrupt
hdg: host protected area => 1
hdg: lost interrupt
hdg: 160836480 sectors (82348 MB) w/7938KiB Cache, CHS=10011/255/63
hdg: lost interrupt
hdg: lost interrupt
ide-floppy driver 0.99.newide
hdg: lost interrupt
ide-floppy driver 0.99.newide
Partition check:
 hde:<3>hde: lost interrupt
 hde1 hde2 hde3 hde4 <<3>hde: lost interrupt
 hde5<3>hde: lost interrupt
 hde6<3>hde: lost interrupt
 hde7

Looks like one really have to extend siimage with "disable" capability
using boot commmand line switch.

Comment 9 John W. Linville 2005-03-11 21:08:15 UTC
Ok, ok... :-)

I have a patch that allows "siimage=off" on the kernel command-line. 
The kernels are available at the same link as before (see comment 7).

No promises!  I'm not sure this will be welcome in RHEL3...

Please test the kernels and let me know if the command-line option is
working.  Thanks!

Comment 10 Peter Bieringer 2005-03-15 10:49:16 UTC
Works fine for me now (booted on my productive RHEL3 server, because I'm not
able to find a working depmod.old for RHEL4).

I'm really wondering about that until now nobody has problems using this Adaptec
controller (available since Q1/2003) and 2.4.x Linux kernels because siimage is
compiled static by default. So nobody can use it until recompiling the kernel -
strange.

Comment 11 John W. Linville 2005-03-15 15:02:59 UTC
Re-opening in order to follow RH's process... :-)

I presume that no one has had problems because the siimage driver is
supposed to work with your card.  That fact that it apparently does
not is the real concern.

As I said in comment 9, this is likely not the preferred solution for
RHEL3.  I'll have to consult with some people internally...

Comment 13 Peter Bieringer 2005-03-15 15:16:55 UTC
Perhaps my controller is a too new release, here some chipset data:

Silicon Image SATALINK
Sil3112ACT144
MQA7299.1
0434
1.21

And controller is labeled as
AAR-1210SA RESPIN RAID CONTROL
2043100 A 0501

Mainboard is a EPIA-PD with VIA chipset.




Comment 15 Matthew Steele 2005-03-19 00:33:23 UTC
There has been chatter elsewhere about the 1210SA firmware turning off 
interrupts.

Comment 16 John W. Linville 2005-03-21 17:13:22 UTC
Matthew, can you be more specific?  Could you provide some pointers for where to
look?  Do you know of any proposed fixes?  Thanks!

Comment 17 Matthew Steele 2005-03-23 14:17:20 UTC
John, it was not a very direct reference - perhaps I should not have posted. In 
any case, this is the post to which I was referring.  
https://www.redhat.com/archives/fedora-test-list/2004-January/msg00234.html

Comment 18 Peter Bieringer 2005-04-25 11:07:39 UTC
One new question: does kernel-2.4.21-31.EL.jwltest.16 already contain latest
security fixes of kernel-2.4.21-27.0.4.EL? If not, can you create a new one?
Thank you!

Comment 19 John W. Linville 2005-04-25 17:26:19 UTC
I rebased the test kernels, so kernel-2.4.21-32.2.jwltest.17 (available now) 
should serve for you. 

Comment 20 Ernie Petrides 2005-04-25 17:39:05 UTC
Specifically, I can confirm that the 2.4.21-32.EL base (which is the latest
U5 beta) contains all fixes that were recently released in 2.4.21-27.0.4.EL.

Comment 21 Peter Bieringer 2005-04-26 09:39:29 UTC
Thank you for very fast rebuilding, installed, working. Would at least this fix
included in U5? It's small, do not break anything.

Comment 22 Ernie Petrides 2005-04-26 18:05:25 UTC
Peter, U5 is already closed (and will be released in about 2 weeks).

Comment 25 John W. Linville 2005-05-05 14:09:51 UTC
As I suspected, the proposed addition of the "siimage=off" parameter was 
unpopular. 
 
Peter, have you tried using "ide2=noprobe ide3=noprobe" instead of 
"siimage=off"?  Please try that and let me know the results.  (Don't forget to 
remove "siimage=off".)  Thanks! 
 

Comment 26 Peter Bieringer 2005-05-05 14:24:53 UTC
Afair I did test this (I can't do this anymore at the moment, because both
servers are productive). The "noprobe" comes too late, PCI IDs already catched
by siimage.

If the siimage=off parameter is unpopular, then you have only the choice to fix
PCI IDs (as in RHEL4) or move siimage from static built-in to module.

Comment 27 John W. Linville 2005-05-06 18:52:17 UTC
Peter, 
 
I have another alternative, suggested by Alan Cox.  I have added a blacklist 
facility to siimage that will exclude your specific card.  Would you mind 
trying the new kernels at the location from comment 7?  Please post the 
results.  Thanks! 

Comment 28 Peter Bieringer 2005-05-09 10:01:38 UTC
Unfortunately, it's not working:

root (hd0,0)
 Filesystem type is ext2fs, partition type 0xfd
kernel /vmlinuz-2.4.21-32.3.EL.jwltest.22 ro root=/dev/md1 panic=60 vga=extende
d siimage=off console=tty0 console=ttyS0,38400n8 fastboot
   [Linux-bzImage, setup=0x1400, size=0x1308ed]
initrd /initrd-2.4.21-32.3.EL.jwltest.22.img
   [Linux-initrd @ 0x37faf000, 0x409c1 bytes]

Linux version 2.4.21-32.3.EL.jwltest.22 (bhcompile.redhat.com) (gcc 5
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000003eff0000 (usable)
 BIOS-e820: 000000003eff0000 - 000000003eff3000 (ACPI NVS)
 BIOS-e820: 000000003eff3000 - 000000003f000000 (ACPI data)
 BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
111MB HIGHMEM available.
896MB LOWMEM available.
NX protection not present; using segment protection
On node 0 totalpages: 258032
zone(0): 4096 pages.
zone(1): 225280 pages.
zone(2): 28656 pages.
Kernel command line: ro root=/dev/md1 panic=60 vga=extended siimage=off consolet
Initializing CPU#0
Detected 1002.280 MHz processor.
Console: colour VGA+ 80x50
Calibrating delay loop... 1998.84 BogoMIPS
Page-cache hash table entries: 262144 (order: 8, 1024 KB)
Page-pin hash table entries: 65536 (order: 6, 256 KB)
Dentry cache hash table entries: 131072 (order: 8, 1024 KB)
Inode cache hash table entries: 65536 (order: 7, 512 KB)
Buffer cache hash table entries: 65536 (order: 6, 256 KB)
Memory: 1007820k/1032128k available (1543k kernel code, 20856k reserved, 1071k )
zapping low mappings.
Mount cache hash table entries: 512 (order: 0, 4096 bytes)
CPU: L1 I Cache: 64K (32 bytes/line), D cache 64K (32 bytes/line)
CPU: L2 Cache: 64K (32 bytes/line)
CPU: Centaur VIA Nehemiah stepping 08
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
Process timing init...done.
mtrr: v1.40 (20010327) Richard Gooch (rgooch.au)
mtrr: detected mtrr type: Intel
PCI: PCI BIOS revision 2.10 entry at 0xfad30, last bus=1
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Using IRQ router VIA [1106/3177] at 00:11.0
PCI: Found IRQ 10 for device 00:11.1
PCI: Sharing IRQ 10 with 00:10.0
PCI: Sharing IRQ 10 with 00:12.0
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
apm: BIOS version 1.2 Flags 0x07 (Driver version 1.16)
Total HugeTLB memory allocated, 0
Starting kswapd
allocated 32 pages and 32 bhs reserved for the highmem bounces
VFS: Disk quotas vdquot_6.5.1
aio_setup: num_physpages = 64508
aio_setup: sizeof(struct page) = 56
Hugetlbfs mounted.
Detected PS/2 Mouse Port.
pty: 2048 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS MULTIPORT SHARE_IRQ SEd
ttyS0 at 0x03f8 (irq = 4) is a 16550A
ttyS1 at 0x02f8 (irq = 3) is a 16550A
Real Time Clock Driver v1.10e
NET4: Frame Diverter 0.46
RAMDISK driver initialized: 256 RAM disks of 8192K size 1024 blocksize
Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller at PCI slot 00:11.1
PCI: Found IRQ 10 for device 00:11.1
PCI: Sharing IRQ 10 with 00:10.0
PCI: Sharing IRQ 10 with 00:12.0
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: VIA vt8235 (rev 00) IDE UDMA133 controller on pci00:11.1
    ide0: BM-DMA at 0xd000-0xd007, BIOS settings: hda:pio, hdb:pio
    ide1: BM-DMA at 0xd008-0xd00f, BIOS settings: hdc:pio, hdd:DMA
siimage: Adaptec 1210-SA not supported.
hdd: LITE-ON CD-ROM LTN-529S, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
ide-floppy driver 0.99.newide
ide-floppy driver 0.99.newide
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
Initializing Cryptographic API
NET4: Linux TCP/IP 1.0 for NET4.0
IP: routing cache hash table of 8192 buckets, 64Kbytes
TCP: Hash tables configured (established 262144 bind 65536)
Linux IP multicast router 0.06 plus PIM-SM
Initializing IPsec netlink socket
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
RAMDISK: Compressed image found at block 0
Freeing initrd memory: 258k freed
VFS: Mounted root (ext2 filesystem).
Red Hat nash version 3.5.13 starting
Loading scsi_mod.o module
SCSI subsystem driver Revision: 1.00
Loading sd_mod.o module
Loading libata.o module
Loading sata_sil.o module
/lib/sata_sil.o: init_module:
Hint: insmod ermd: raid1 personality registered as nr 3
rors can be caused by incorrect module parameters, including invJournalled Blocd
alid IO or IRQ parameters.
      You may find more information in syslog or themd: Autodetecting RAID arra.
 output from dmemd: autorun ...
sg
ERROR: /bin/md: ... autorun DONE.
insmod exited abmd: Autodetecting RAID arrays.
normally!
Loadimd: autorun ...
ng raid1.o modulmd: ... autorun DONE.
e
Loading jbd.omd: Autodetecting RAID arrays.
 module
Loadingmd: autorun ...
 ext3.o module
md: ... autorun DONE.
Mounting /proc fmd: Autodetecting RAID arrays.
ilesystem
md: autorun ...
md: ... autorun DONE.
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
Creating block devices
EXT2-fs: unable to read superblock
isofs_read_super: bread failed, dev=09:01, iso_blknum=16, block=32
EXT3-fs: unable to read superblock
Kernel panic: VFS: Unable to mount root fs on 09:01

Rebooting in 60 seconds..



With the previous kernel following is shown:

...
VFS: Mounted root (ext2 filesystem).
Red Hat nash version 3.5.13 starting
Loading scsi_mod.o module
SCSI subsystem driver Revision: 1.00
Loading sd_mod.o module
Loading libata.o module
Loading sata_sil.o module
PCI: Found IRQ 11 for device 00:14.0
PCI: Sharing IRQ 11 with 00:10.1
ata1: SATA max UDMA/100 cmd 0xF8845080 ctl 0xF884508A bmdma 0xF8845000 irq 11
ata2: SATA max UDMA/100 cmd 0xF88450C0 ctl 0xF88450CA bmdma 0xF8845008 irq 11
ata1: dev 0 ATA, max UDMA/100, 160836480 sectors: lba48
ata1: dev 0 configured for UDMA/100
ata2: dev 0 ATA, max UDMA/100, 160836480 sectors: lba48
ata2: dev 0 configured for UDMA/100
scsi0 : sata_sil
scsi1 : sata_sil
  Vendor: ATA       Model: HDS722580VLSA80   Rev: V32O
  Type:   Direct-Access                      ANSI SCSI revision: 05
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
SCSI device sda: 160836480 512-byte hdwr sectors (82348 MB)
Partition check:
 sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 sda10 sda11 sda12 >
  Vendor: ATA       Model: HDS722580VLSA80   Rev: V32O
  Type:   Direct-Access                      ANSI SCSI revision: 05
Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0
SCSI device sdb: 160836480 512-byte hdwr sectors (82348 MB)


So booting the siimage-blacklist-kernel, sata_sil is not able to catch the PCI
devices, following messages are missing after "Loading sata_sil.o module"

PCI: Found IRQ 11 for device 00:14.0
PCI: Sharing IRQ 11 with 00:10.1

Strange...

Comment 33 John W. Linville 2005-09-19 20:55:59 UTC
Created attachment 119003 [details]
jwltest-siimage-sa1210.patch

Comment 34 John W. Linville 2005-09-20 14:14:46 UTC
Well, I finally got some hardware that reproduced the problem.  Using that, I 
think I have something that works.  Patched kernels are available at the same 
location as in comment 7. 
 
I would appreciate some testing.  Be warned that with these kernels the 
siimage IDE driver (i.e. NOT the sata_sil libata driver) will claim your 
drives, potentially changing their names back to hdX instead of sdX. 
 
Even if you do not want to run them that way, I would appreciate some testing 
just to confirm that this patch is working.  BTW, please remember to remove 
"siimage=off" from your kernel command line while testing...thanks! 

Comment 35 Peter Bieringer 2005-09-21 08:48:40 UTC
Thank you, because machine is productive, I can try this new kernel hopefully in
next 2 weeks.

Comment 36 Peter Bieringer 2005-09-28 09:31:01 UTC
Installed, rebooted, works! Thank you very much for working on it.
I only needed to adjust fstab relating to swap devices, because all other
partitions are md devices.

BTW: which magic is working on RHEL4? swap devices created during installation
(anaconda) are labeled, but mkswap does not support option -L


Comment 37 John W. Linville 2005-09-28 14:12:31 UTC
Thanks for the feedback!  I'll try to get this pushed upstream and in RHEL 
ASAP. 
 
RE: RHEL4 and labeled swap devices, later versions of mkswap seem to support 
"-L" as an option.  Perhaps you were looking at the RHEL3 man page? 
 
/me was wondering why the name on the IPv6 HOWTO looked so familiar... 

Comment 38 Peter Bieringer 2005-09-28 14:26:12 UTC
Relating to the mkswap issue, it's already known:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=152026


Comment 40 Ernie Petrides 2005-10-08 02:15:05 UTC
A fix for this problem has just been committed to the RHEL3 U7
patch pool this evening (in kernel version 2.4.21-37.5.EL).


Comment 42 Peter Bieringer 2006-01-20 08:52:34 UTC
Can you please update me whether today released U6 update
kernel-2.4.21-37.0.1.EL  https://rhn.redhat.com/errata/RHSA-2006-0140.html
include all the fixes which 2.4.21-37.2.EL.jwltest.55 (currently installed) had
already included?

Or should I now install kernel-2.4.21-38.EL from beta channel?

Comment 43 Peter Bieringer 2006-01-20 10:21:46 UTC
After reading changelog of kernel-2.4.21-38.EL I tried this version, but it has
a major problem:

Following was cached via serial console, some chars are missing:

SCSI subsystem driver Revision: 1.00
Loading sd_mod.o module
Loading libata.o module
Loading sata_sil.o module
/lib/sata_sil.o: init_module:
Hint: insmod ermd: raid1 personality registered as nr 3
rors can be caused by incorrect module parameters, including invJournalled Blocd
alid IO or IRQ parameters.
      You may find more information in syslog or themd: Autodetecting RAID arra.
 output from dme [events: 0000001b]
sg
ERROR: /bin/ [events: 0000001b]
insmod exited ab [events: 0000001b]
normally!
Loadi [events: 0000001b]
ng raid1.o modul [events: 00000021]
e
Loading jbd.o [events: 00000021]
 module
Loading [events: 00000026]
 ext3.o module
 [events: 00000026]
Mounting /proc f [events: 00000026]

...later on filesystem check step:

/ has gone 92 days without being checked, check forced.
raid1: Disk failure on hdg2, disabling device.

Ooops...only a SysRQ->Crash->"Automatic reboot after panic" brings machine back
to life (puuhh...it was the productive one, remote rebooted...)

I rebooted the old 2.4.21-37.2.EL.jwltest.55 again and after some while
(filesystem check during RAID1 reconstruction need a lot of time), system is up
again.


Comment 44 Peter Bieringer 2006-01-20 11:10:37 UTC
Note that I get confused...the libata message also occurs in
-2.4.21-37.2.EL.jwltest.55 and doesn't cause the problem - sorry for disturbing.
Had to check further on.

Comment 45 Peter Bieringer 2006-01-20 12:19:09 UTC
So, system rebooted, still syncing mirror.

As far as I remember I never had to resync the mirrors *after* updating to
provided kernel with non-libsata support for this controller.

Now the rebuild is very very slow, top shows:

CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total    7.2%    0.0%    3.3%  88.8%     0.5%    0.0%    0.0%


# cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
Event: 10
md9 : active raid1 hde12[0] hdg12[1]
      7823552 blocks [2/2] [UU]
      [=====>...............]  resync = 29.6% (2317824/7823552) finish=89.4min
speed=1024K/sec
md8 : active raid1 hde11[0] hdg11[1]
      2048192 blocks [2/2] [UU]
        resync=DELAYED
md7 : active raid1 hde10[0] hdg10[1]
      19542976 blocks [2/2] [UU]
        resync=DELAYED
md6 : active raid1 hde9[0] hdg9[1]
      7823552 blocks [2/2] [UU]
        resync=DELAYED
md5 : active raid1 hde8[0] hdg8[1]
      7823552 blocks [2/2] [UU]
        resync=DELAYED
md4 : active raid1 hde7[0] hdg7[1]
      7823552 blocks [2/2] [UU]
        resync=DELAYED
md3 : active raid1 hde6[0] hdg6[1]
      7823552 blocks [2/2] [UU]
        resync=DELAYED
md2 : active raid1 hde5[0] hdg5[1]
      2048192 blocks [2/2] [UU]
        resync=DELAYED
md1 : active raid1 hde2[0] hdg2[1]
      4096448 blocks [2/2] [UU]
        resync=DELAYED
md0 : active raid1 hde1[0] hdg1[1]
      104320 blocks [2/2] [UU]

unused devices: <none>


# sysctl -a |grep raid
dev.raid.speed_limit_max = 10000
dev.raid.speed_limit_min = 100


md: syncing RAID array md0
md: minimum _guaranteed_ reconstruction speed: 100 KB/sec/disc.
md: using maximum available idle IO bandwith (but not more than 10000 KB/sec)
for reconstruction.
md: using 124k window, over a total of 104320 blocks.


vmstat shows:
procs                      memory      swap          io     system         cpu
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy wa id
 3  1      0 376388  58596  76780    0    0   314   316 3676   684 46 50  4  0
 3  2      0 375496  59208  76816    0    0   125    87  523   138 61 39  0  0


Looks like the fixed old fashioned driver causes very high interrupt load and
can never reach proper sync speed.

I played around with hdparm to enabled DMA (because it was shown as "off"). This
was not a good idea, afterwards, io errors occur and load increases fast. Had to
reboot via SysRQ.


I will report in some hours, whether hdg2 causes problems with
2.4.21-37.2.EL.jwltest.55 during sync, it's still delayed.



Comment 46 Ernie Petrides 2006-01-20 20:07:32 UTC
Peter, yesterday's release of 2.4.21-37.0.1.EL kernel does not contain
most of the stuff that has gone into U7 (-38.EL), hence the lower-numbered
version.  There will be a U7 respin soon (-39.EL) that incorporates the
security fixes released in the post-U6 erratum plus a few U7 regression fixes.

Comment 48 Red Hat Bugzilla 2006-03-15 15:52:02 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0144.html