From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020809 Description of problem: Null completely fails to install on the Compaq Presario 900US laptop. This is a very new Athlon laptop with an ALi chipset. While things appear okay during kernel bootup, Anaconda is unable to access either the IDE hard drive or CD-ROM due to failures that you can see in ALT-F4. Google found quite a few recent posts in LKML about ALI15X3 problems. I bought myself a laptop IDE to desktop IDE converter, so I will be able to intall Null onto this hard drive and test various kernels on this machine. Tomorrow I plan on testing 2.4.20-pre, -ac and 2.5.x. Anything else I should test? Note that there also seems to be a problem with the 8139too kernel module. ALT-F4 debug messages for this are also included below. I will post this into a separate Bugzilla report. Version-Release number of selected component (if applicable): Red Hat beta Null How reproducible: Always Additional info: The following are excerps from the kernel bootup (hit Scroll Lock before it ran Anaconda). Unfortunately I am unable to reach any shell in the installer due to its inability to access IDE devices, so I am unable to get the dmesg output from during Anaconda runtime. Uniform Multi-Platform E-IDE driver Revision: 6.31 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx ALI15X3: IDE controller on PCI bus 00 dev 80 PCI: No IRQ known for interrupt pin A of device 00:10.0. Please try using pci=biosirq. ALI15X3: chipset revision 196 ALI15X3: not 100% native mode: will probe irqs later ide0: BM-DMA at 0x8080-0x8087, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0x8088-0x808f, BIOS settings: hdc:pio, hdd:pio hda: TOSHIBA MK3018GAP, ATA DISK drive ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx hdc: SD-R2102, ATAPI CD/DVD-ROM drive The following can be viewed in ALT-F4 when Anaconda attempts and fails to access the IDE CD-ROM. It has similar problems if I attempt a "Hard Drive" install. <6>8139too Fast Ethernet driver 0.9.25 <6>8139too: pci dev 00:0b.0 (id 10ec:8139 rev ff) is an enhanced 8139C+ chip <6>8139too: Use the "8139cp" driver for improved performance and stability. <6>PCI: Assigned IRQ 11 for device 00:0b.0 <3>8139too: 00:0b.0: Chip not responding, ignoring board <6>8139too Fast Ethernet driver 0.9.25 <6>8139too: pci dev 00:0b.0 (id 10ec:8139 rev ff) is an enhanced 8139C+ chip <6>8139too: Use the "8139cp" driver for improved performance and stability. <6>PCI: Assigned IRQ 11 for device 00:0b.0 <3>8139too: 00:0b.0: Chip not responding, ignoring board <4>hdc: status timeout: status=0xd0 { Busy } <4>hdc: drive not ready for command <4>hdc: ATAPI reset timed-out, status=0xd0 <4>ide1: reset timed-out, status 0xd0 <4>hdc: status timeout: status=0xd0 { Busy } <4>hdc: drive not ready for command <4>hdc: ATAPI reset timed-out, status=0xd0 <4>ide1: reset timed-out, status=0xd0 <4>end_request: I/O error, dev 16:00 (hdc), sector 0 <4>end_request: I/O error, dev 16:00 (hdc), sector 0 <4>end_request: I/O error, dev 16:00 (hdc), sector 0 <4>end_request: I/O error, dev 16:00 (hdc), sector 0 <4>end_request: I/O error, dev 16:00 (hdc), sector 0 <4>end_request: I/O error, dev 16:00 (hdc), sector 0 <4>end_request: I/O error, dev 16:00 (hdc), sector 0 <4>end_request: I/O error, dev 16:00 (hdc), sector 0 <4>end_request: I/O error, dev 16:00 (hdc), sector 0 <4>end_request: I/O error, dev 16:00 (hdc), sector 0 <4>end_request: I/O error, dev 16:00 (hdc), sector 0 <4>end_request: I/O error, dev 16:00 (hdc), sector 0 <4>end_request: I/O error, dev 16:00 (hdc), sector 0 <6>attempt to access beyond end of device <6>16:00: rw=0, want=2147483381, limit=557735282 <4>isofs_read_super: bread failed, dev=16:00, iso_blknum=-134, block=-268 <4>end_request: I/O error, dev 16:00 (hdc), sector 0 Please let me know what other information is needed. Thanks, Warren
Bug 72388 filed: (Compaq Presario 900) 8139too module fails
I swapped the hard drive into a working laptop and installed Null, compiled 2.4.20-pre4-ac1 and swapped the hard drives back. Let me know if you want the .config file from the -ac kernel build. The Null kernel always fails with a Machine Check exception, similar to a crash I see in Mandrake 9.0 beta's installer. I compiled MCE into the -ac kernel, and it doesn't panic at that point. I managed to boot into the system with "pci=conf2" or "pci=off", although I can't use any devices I could at least get to a shell. What lspci and other information would you need? ------------------------ 2.4.20-pre4-ac1 ------------------------ NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. hda: status timeout: status=0x80 { Busy } hda: drive not ready for command ide0: reset timed-out, status=0x80 end_request: I/O error, dev 03:03 (hda), sector 2 EXT3-fs: unable to read superblock end_request: I/O error, dev 03:03 (hda), sector 2 EXT2-fs: unable to read superblock end_request: I/O error, dev 03:03 (hda), sector 64 isofs_read_super: bread failed, dev=03:03, iso_blknum=16, block=32 Kernel panic: VFS: Unable to mount root fs on 03:03 ----------------- kernel-2.4.18-11 (Null default) ----------------- mtrr: detected mtrr type: Intel PCI: PCI BIOS revision 2.10 entry at 0xfd87e, last bus=1 PCI: Using configuration type 1 PCI: Probing PCI hardware PCI: Using IRQ router ALI [10b9/1533] at 00:07.0 isapnp: Scanning for PnP cards... CPU 0: Machine Check Exception: 0000000000000007 Bank 3: b40000000000083b at 00000001fc0003b3 Kernel panic: Unable to continue
Unfortunately I have this laptop only for another 36 hours. I would appreciate it if I could have a few more things to test before I have to return it.
CPU 0: Machine Check Exception: 0000000000000007 Bank 3: b40000000000083b at 00000001fc0003b3 Thats a fault report from the processor Error enabled, uncorrected error, valid Bus/Interconnect Error on locally originated processor request Data read request for I/O
My friend bought the same laptop, so we swapped hard drives and tested several of the kernels that I had built. All of the following testing was identical on both machines. Further testing from today: --------------------------- 2.4.20-pre4-ac1 (compiled with MCE enabled) * does NOT crash with MC error * crashes when unable to use IDE device * pci=off boots, PIO mode working fine, albeit system useless without devices 2.4.18-12.4 (Rawhide) and 2.4.18-11 (Null) * always crashes with MCE error, regardless of pci=off 2.5.31 (compiled with MCE enabled) * always crashes with MCE error, regardless of pci=off 2.5.13 (compiled without MCE) * crashes with IDE problems similar to 2.4.20-pre4-ac1. Is there anything notably different in Andre Hedrick's ide patch against 2.4.19-ac4? I'll try that next.
Tested 2.4.19-ac4 + Hedrick's ide-2.4.19-ac4.11.patch fails in the same way.
An MCE is generated by a hardware problem.
Are you saying that a MCE always indicates a hardware defect? Everything reported above is identical on two brand new Presario 900US laptops, which are rock stable in the factory shipped Windows XP. Perhaps this is a new motherboard chipset variation, poorly tested with anything other than Windows. =(
Created attachment 73004 [details] lspci -vv and lspci -vvn, system booted with "pci=off"
Many different problems with this hardware... [root@compaq root]# lspci -vv pcilib: Cannot open /proc/bus/pci pcilib: Bus 00 seen twice (firmware bug). Ignored. ... ... (continued in attachment above)
MCE indicates the processor found itself in what it considers to be an illegal state. I can think of two things that might trip the CPU into believing that other than plain faulty components #1 - If the box relies on ACPI for setup, in which case for the moment its a windows only system (probably for another 3 months) #2 - See if booting with the option "mem=nopentium" helps. That turns off the 4Mb paging feature, just in case its some weird interaction with the system. Other things that make the box as conservative as possible are "ide=nodma noapm"
#1 What happens in 3 months regarding ACPI? Something I haven't tested here was a kernel with ACPI instead of APM. How different is the current acpi4linux patch from the acpi in 2.4.x or 2.5.x? #2 Teste all combinations of mem=nopentium, ide=nodma, and noapm with the Red Hat kernel. No luck.
In about three months the current intel linux acpi patch should be solid enough that its usable. Red Hat's shipping plans for ACPI are a seperate matter, but roll your own kernels with ACPI ought to be usable by then
http://videl.ics.hawaii.edu/mailman/listinfo/linuxpresario900 I created this discussion mailing list for owners of Compaq Presario 900 series laptops to share their findings and work toward a solution to this problem.
Bill McLean was able to successfully install onto this laptop with "nousb" and "nopcmcia" but ran into data corruption later. Is any indication of what is breaking the IDE part of this problem?
The "nousb" may be significant. Especially if he got the corruption later on having not always used "nousb". The reason for that is that there are connections deep down in the hardware setup between the USB and IDE clocking. I've also had one strange report that 2.2 boots on it but I've not been able to verify that
http://videl.ics.hawaii.edu/pipermail/linuxpresario900/2002-November/000007.html Mark Pavlidis wrote this complete report talking about the effects of the various "no" kernel options. Please let us know what we can test to begin to begin work on these problems.
Can someone get me an lspci -vxx -M -H1 (that will do the lspci directly even if booted with "nopci". That may let me at least blacklist the PCI IDE so people can get a booting system with PCI enabled to do further research.
Ok the MCE seems to be from the pcmcia package - exclude 0x380-0x3ff according to Chris Cheney
*** Bug 78123 has been marked as a duplicate of this bug. ***
*** Bug 80352 has been marked as a duplicate of this bug. ***
I have the same problem with a Presario 920CA. I can also help debug or gather information if necessary. I followed the link posted in comment #14 I could install RH8 but I could not get X to work and I after rebooting I could not log into Linux the kernel would panic.
http://videl.ics.hawaii.edu/pipermail/linuxpresario900/2002-December/000120.html > My goal is that the next Red Hat runs on the IGP chipset boxes. > Pleasefile bugs, please add me to the cc of any IGP related bugs. > Alan Cox mentioned this on the LinuxPresario900 mailing list.
i had a problem with X windows also. i eventually realized that the initially detected sound card on my presario 915 was causing X windows to freeze the box also, when it tried to play a sound during startup. after disabling the sound card entirely (removing it from modules.conf), X windows will run and not crash the machine. dunno if that's related to the X problem Diego is having or not.
Created attachment 89101 [details] X log file with the VESA errors
Created attachment 89102 [details] the errors I get after X fails
Actually I reinstalled RH8 with just pci=off and everything went ok. After installation i had to pass nomce or the kernel would panic. I removed the 0x380-0x3ff port range from /etc/pcmcia/config.opts and removed the nomce flag but again the kernel would panic so I had to pass noisapnp and then the system would boot ok. On the first boot kudzu detected the the network card, the audio device controller (ALi M5451 PCI AC-Link), ALi USB 1.1 controller and the sound card but the next time I rebooted the system would hang while loading the OCHI controller so I had to add the nousb option and I would get to the login prompt. But then again when I tried starting X then I would get errors from hdc and I would not be able to log in to any other VT. It basically gets stuck in the bluecurve-RH logo (where it loads the different components). I have attached my X log and another file with the errors I get after X fails.
have you tried disabling the sound card already? or here's an experiment you can try without disabling it: at a shell prompt, enter the command: play /usr/share/sounds/error.wav if you don't have that file, try any other sound file. if it's the sound card causing problems, then you should see the same IDE errors and the system will act like it does with your X problem. this is what i did in order to demonstrate to myself that it wasn't X, but was the sound card instead.
Yes it is the sound card. I did the WAV experiment and I got the same error. I will compile a new kernel to try to get the card to work. I found some help on the ALi website about the sound card on 2.4.x kernels. Maybe that will solve the problem. I just don't want to disable the sound, I like to listen to some tunes while I work ;) I'll will do this on the weekend and post my results here. The link is: http://www.ali.com.tw/eng/support/support_driver.htm Search by OS There is a FIR driver and an audio driver for the 2.4.x kernels.
I got sound, usb and ACPI working after following the instructions of: http://www.wsu.edu/~ice124/ I patched and recompiled kernel 2.4.20. Although I had to tweak the volume control before I could start getting any sound out of the box. But when it comes to video I could not use the Radeon driver X would die on me saying "no screens found". Finally, I am using the 8139cp network driver as the system would suggest I use it instead but the only way of selecting it was to go to the .config file because the option is not inside menuconfig. So I would have to do my changes inside menuconfig then exit and save and then go edit the .config file because it would unselect the 8139CP driver on exit from menuconfig. BTW is there any problem if I compile the kernel with both ACPI and APM? Or should I just stick with ACPI?
*** Bug 81587 has been marked as a duplicate of this bug. ***