Bug 159078

Summary: (Asus K8N-DL Bad BIOS) Boot errors from PCI-Express Bus
Product: [Fedora] Fedora Reporter: Sean Bruno <sbruno>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: low Docs Contact:
Priority: medium    
Version: 4CC: intel-linux-acpi, pfrields, retsil, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-09-23 17:25:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
lspci -vv from the i386 installation
none
Output of dmesg
none
Logs from Xorg startup via "startx"
none
output of dmidecode after BIOS update(2.6.12-git10)
none
Updated output of dmesg after BIOS update(2.6.12-git10)
none
Outpue of lsusb after BIOS update(2.6.12-git10)
none
Output of lspci after BIOS update(2.6.12-git10)
none
Output of lshw after BIOS update(2.6.12-git10) none

Description Sean Bruno 2005-05-28 23:44:00 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050511 Firefox/1.0.4

Description of problem:
After installing from FC4 from rawhide, the first bootup of the system indicates some kind of error with the PCI-E bus on my ASUS K8N-DL Mobo:

PCI:  Cannot allocate region 9 of device 0000:00:0e.0
PCI:  Cannot allocate region 3 of device 0000:04:00.0
Failed to allocate mem resource 1000000@fe0000000 for 0000:04:00.0

After these errors, the system starts and attempts to run "firstboot"  The video that appears is garbled and I am unable to continue.  A ctrl-alt-del does reboot the machine, so it is not completely dead.

Version-Release number of selected component (if applicable):
kernel 2.6.11-1.1363-FC4smp

How reproducible:
Always

Steps to Reproduce:
1.  Install FC4 from rawhide(or Test 3)
2.  Boot the system after the install and firstboot will attempt to start.
3.
  

Actual Results:  The video was garbled and I was unable to continue testing.  A ctrl-alt-del will reboot the machine.

Expected Results:  Firstboot should start and I should be able to finish the installation of FC4.

Additional info:

Anaconda in text mode uses the "vesa" driver on this box.  It seems to be unable to detect and use the "nv" driver with the PCI-E video card(ASUS Nvidia 6200).

I am able to install FC4 from rawhide under the i386 installation.  It doesn't have the same garbled screen as the x86_64 version does.

I am attaching lspci -vv from the i386 version as I cannot get around firstboot crashing the display.

Comment 1 Sean Bruno 2005-05-28 23:46:01 UTC
Created attachment 114945 [details]
lspci -vv from the i386 installation

It appears that the two errors are from something on the PCI-Express bus and
the video card itself.

Comment 2 Sean Bruno 2005-05-29 00:04:16 UTC
Created attachment 114946 [details]
Output of dmesg

Comment 3 Sean Bruno 2005-05-29 21:15:29 UTC
Created attachment 114953 [details]
Logs from Xorg startup via "startx"

I booted up the system at run level 3 and executed "startx" as root.

Comment 4 Sean Bruno 2005-06-04 19:08:17 UTC
I attempted to load 2.6.12-rc5 with the git8 patch.  There was no change to the
errors.

I also opened a ticket with ASUS in reference to the mobo.  They seem to be open
to trying to fix the issue.

Comment 5 Dave Jones 2005-06-04 22:07:54 UTC
This is getting some attention upstream in the last few days. See
http://lkml.org/lkml/2005/6/3/203 for details


Comment 6 Sean Bruno 2005-06-06 02:29:22 UTC
Should I post a "me-too" to this thread from Comment #5?  I feel a bit anxious
about such a posting as I tend to be a bit light on specifics.

It looks like there is some attention, but no testers for possible fixes.

Comment 7 Dave Jones 2005-06-27 23:20:40 UTC
Mass update of -test bugs to update version to fc4.
(Please retest on final release, and report results if you have not already done
so).

Thanks.

Comment 8 Sean Bruno 2005-06-27 23:24:01 UTC
Already retested.  Same failures are noted and there is no change to the output
of the attachments in either the i386 or x86_64 version.

Comment 9 Sean Bruno 2005-06-29 04:27:23 UTC
Created attachment 116108 [details]
output of dmidecode after BIOS update(2.6.12-git10)

Comment 10 Sean Bruno 2005-06-29 04:28:40 UTC
Created attachment 116109 [details]
Updated output of dmesg after BIOS update(2.6.12-git10)

Comment 11 Sean Bruno 2005-06-29 04:29:23 UTC
Created attachment 116110 [details]
Outpue of lsusb after BIOS update(2.6.12-git10)

Comment 12 Sean Bruno 2005-06-29 04:29:59 UTC
Created attachment 116111 [details]
Output of lspci after BIOS update(2.6.12-git10)

Comment 13 Sean Bruno 2005-06-29 04:31:20 UTC
Created attachment 116112 [details]
Output of lshw after BIOS update(2.6.12-git10)

Comment 14 Sean Bruno 2005-07-02 05:35:16 UTC
*** Bug 158468 has been marked as a duplicate of this bug. ***

Comment 15 Sean Bruno 2005-07-02 05:36:33 UTC
*** Bug 158475 has been marked as a duplicate of this bug. ***

Comment 16 Sean Bruno 2005-07-02 05:40:18 UTC
I receieved an excellent analysis of my dmesg output from a gentleman at Nvidia
who pointed the finger squarely at the BIOS from ASUS(whose Tech Support is
ridculously bad btw).

I have attempted to speak with anyone at their Tech Support to address the
issues in the BIOS and have failed miserably(ASUS TKT#44951 in case someome else
is watching).

Here is what Mr. Currid at Nvidia pointed out to me any chance some of the
RedHat Kernel devs can analyze it and comment?

Sean

I went back through the LKML archives - is the dmesg dump you posted on
6/17 from the Asus K8N-DL? I'm assuming it is. 

There are several issues immediately apparent with the BIOS.

It has an ACPI interrupt override for IRQ0 to Global System Interrupt 2
(GSI 2) that is incorrect -

ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)

and is the root cause of the later warnings to do with the timer:

..MP-BIOS bug: 8254 timer not connected to IO-APIC
...trying to set up timer (IRQ0) through the 8259A ...  failed.
timer doesn't work through the IO-APIC - disabling NMI Watchdog!
...trying to set up timer as Virtual Wire IRQ...Uhhuh. NMI received for
unknown reason 3d.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?
 failed.
...trying to set up timer as ExtINT IRQ... works.

The Linux kernel (2.6.9 onwards) contains code to specifically detect
this interrupt redirect on NVIDIA hardware, and ignore it, but for some
reason it isn't kicking in on your setup. Not sure why that is.

Also, the ACPI PCI Interrupt Routing Table (PRT) contains references to
entries that don't exist elsewhere in the ACPI tables:

ACPI: Subsystem revision 20050309
    ACPI-0352: *** Error: Looking up [\_SB_.PCI0.LNK0] in namespace,
AE_NOT_FOUND
search_node ffff81013ffca240 start_node ffff81013ffca240 return_node
0000000000000000
    ACPI-0352: *** Error: Looking up [\_SB_.PCI0.APC0] in namespace,
AE_NOT_FOUND
search_node ffff81013ffca140 start_node ffff81013ffca140 return_node
0000000000000000

Linux unfortunately appears to give up on parsing the PRT when this
happens, unlike Windows, which will parse the table despite these
errors. Without parsing the PRT, Linux cannot know how to route
interrupts for various PCI devices, which results in the later errors:

...
ACPI: PCI Interrupt 0000:02:00.0[A]: no GSI - using IRQ 3
...
ACPI: PCI Interrupt 0000:00:04.0[A]: no GSI - using IRQ 11

I'm guessing that your Broadcom networking, AC97 sound and USB 1.1
controller may not be working correctly as a result of this.

The Linux kernel could be modified to continue parsing PRTs when errors
are encountered. However, it is the BIOS that is at fault here.

Andy
--
Andy Currid, NVIDIA Corporation

Comment 17 Dave Jones 2005-07-15 21:39:33 UTC
[This comment has been added as a mass update for all FC4 kernel bugs.
 If you have migrated this bug from an FC3 bug today, ignore this comment.]

Please retest your problem with todays 2.6.12-1.1398_FC4 update.

If your problem involved being unable to boot, or some hardware not being
detected correctly, please make sure your /etc/modprobe.conf is correct *BEFORE*
installing any kernel updates.
If in doubt, you can recreate this file using..

mv /etc/sysconfig/hwconf /etc/sysconfig/hwconf.bak
mv /etc/modprobe.conf /etc/modprobe.conf.bak
kudzu


Thank you.


Comment 18 Sean Bruno 2005-07-15 22:43:56 UTC
Since this is an issue with ACPI and the defective System Bios of this ASUS
Motherboard, this is still an issue.

I have disabled ACPI at this time to get the system working.  If you feel this
is not an issue with FC, feel free to close this bugzilla report.  

I would like to keep it open, just so I can document changes made by ASUS (if
they ever decide to do anything).

Comment 19 Sean Bruno 2005-07-15 22:45:09 UTC
Downgrading to low priority as there doesn't appear to be anything that the
kernel can do for an ACPI non-compliant BIOS.

Comment 20 Warren Togami 2005-07-15 23:02:37 UTC
Keeping it open so other users can more easily find this issue is fine.  Just as
long as nobody expects the kernel to fix it when it is Asus' problem.


Comment 21 Dave Jones 2005-07-15 23:28:45 UTC
The 'keep parsing the table in face of errors' sounds like something that would
be good to fix, especially if there are vendors that are a little slow at
pushing out fixed BIOS updates.

Intel folks ?


Comment 22 Robbie Barnett 2005-08-13 04:24:58 UTC
*** Bug 165654 has been marked as a duplicate of this bug. ***

Comment 23 Sean Bruno 2005-09-23 17:25:25 UTC
Well, ASUS has released a verion of the BIOS(1006) that resolves all of the ACPI
issues that are reported in this ticket.  So, I am closing this ticket.  Thanks
for your help folks.

P.S.  FC4 installs on both SATA controllers of the K8N-DL now.