Bug 425794

Summary: PCI allocation errors result in Xserver resetting the machine
Product: [Fedora] Fedora Reporter: Richard Henderson <rth>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: high    
Version: 8CC: erik-fedora, hancockrwd
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-09 05:30:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Fix for e820 rounding error
none
Fix 32-bit BARs being allocated above 4GB
none
Output from dmesg from 2.6.23.8-63.fc8
none
Contents of /proc/iomem from 2.6.23.8-63.fc8
none
Output from lspci -v none

Description Richard Henderson 2007-12-15 21:19:57 UTC
On an ASUS G1S laptop, w/ 3GB ram, the stock kernel will reallocate the
video card BAR to 0x1_0000_0000 - 0x1_0fff_ffff.  The Xserver does not
appear to be able to handle 64-bit PCI addresses, and the machine will
either hang (xv driver) or reset (nvidia driver).

The initial reason that the kernel reallocates the BAR in the first 
place is that there is an rounding error bug in the x86-64 e820 driver.
The algorithm "rounds" 0xc000_0000 to the nearest 1M and comes up with
0xc004_0000.  After this error, there is no longer a contiguous unallocated
256MB available below 4G, which leads to the BAR being placed at 4GB.

While investigating the problem, it occurred to me that there were other
potential PCI allocation problems on x86-64.  For instance, there was 
nothing preventing a true 32-bit BAR from being allocated above 4GB on
x86-64.  After that is fixed, it is easy to add a quirk to work around
the Xserver bug by preventing VGA BAR allocation above 4GB.

One patch for each problem is attached.

Comment 1 Richard Henderson 2007-12-15 21:19:57 UTC
Created attachment 289707 [details]
Fix for e820 rounding error

Comment 2 Richard Henderson 2007-12-15 21:23:56 UTC
Created attachment 289708 [details]
Fix 32-bit BARs being allocated above 4GB

In addition, it adds an x86 quirk that forces VGA BARs to be allocated
within the low 4GB.  Assuming that the Xserver were to be fixed, presumably
this restriction could be lifted.  Especially since video card memory
sizes keep growing, there soon won't be room within the 1GB window left
below 4GB...

Comment 3 Richard Henderson 2007-12-15 21:38:56 UTC
See also 

http://www.nvnews.net/vbulletin/showthread.php?t=93293

for reports that similar problems affect other systems besides my G1S.

Comment 4 Chuck Ebbert 2007-12-18 18:58:33 UTC
Can you post the complete dmesg and contents of /proc/iomem from the broken kernel?

Also, Linus has suggested an alternative for the second patch:

http://lkml.org/lkml/2007/12/18/223




Comment 5 Richard Henderson 2007-12-18 20:00:54 UTC
Created attachment 289932 [details]
Output from dmesg from 2.6.23.8-63.fc8

Comment 6 Richard Henderson 2007-12-18 20:01:54 UTC
Created attachment 289933 [details]
Contents of /proc/iomem from 2.6.23.8-63.fc8

Comment 7 Richard Henderson 2007-12-18 20:03:22 UTC
Created attachment 289934 [details]
Output from lspci -v

Comment 8 Chuck Ebbert 2007-12-18 20:15:55 UTC
I wonder, what is pnp device 0:0b and why does it want 256MB of memory?

Also, looks like PnP wants to reserve all of system memory:
pnp: 00:0c: iomem range 0x100000-0xbfffffff could not be reserved


Comment 9 Linus Torvalds 2007-12-18 21:12:57 UTC
[ This is just cut-and-pasted from an email I sent. I don't think the
  stupid rh bugzilla can interact with emails like sane people do, so
  I'll just add this to the bugzilla history manually ]

That really is very unlucky. That 256M only goes at one point in the low
4GB, but the thing is, it fits perfectly well above it, and dammit, that
resource is explicitly a 64-bit resource or a really good reason.

However, I wonder about that

        e0000000-efffffff : pnp 00:0b

thing. I actually suspect that that whole allocation is literally *meant*
for that 256MB graphics aperture, but the kernel explicitly avoids it
because it's listed in the PnP tables.

I wonder what the heck is the point of that pnp entry. Just for fun, can
you try to just disable CONFIG_PNP, and see if it all works then?

Björn Helgaas added to Cc to clarify what those pnp entries tend to mean,
and whether there is possibly some way to match up a specific pnp entry
with the PCI device that might want to use it. Because that is a nice
256MB region that really doesn't seem to make sense for anything else than
the graphics buffer - there's nothing else in your system that seems
likely (although I guess it could be for some docking port, but even then
I'd have expected one of the PCI bridges to map it!)

But apart from the question about that pnp 00:0b device, the kernel
resource allocation really does look perfectly fine, and while we could
shoe-horn it into the low 4GB in this case by just hoping that there is
nothing undocumented there (and there probably isn't), it's really
annoying considering that big graphics areas are a hell of a good reason
to use those 64-bit resources.

It's not like 256MB is even as large as they come, half-gig graphics cards
are getting to be fairly common at the high end, and X absolutely _has_ to
be able to handle a 64-bit address for those.

Also, I'm surprised it doesn't work with X already: the ChangeLog for X
says that there are "Minor fixes to the handling of 64-bit PCI BARs [..]"
in 4.6.99.18, so I'd have assumed that XFree86-4.7.0 should be able to
handle this perfectly well.

I'll add Keithp to the cc too, to see if the X issues can be clarified.
Maybe he can set us right. But maybe you just have an old X server? If so,
considering the situation, I really think the kernel has done a good job
already, and I'd be *very* nervous about making the kernel allocate new
PCI resources right after the end-of-memory thing.

I bet it would work in this case, but as mentioned, we definitely know of
cases where the BIOS did *not* document the magic memory region that was
stolen for UMA graphics, and trying to put PCI devices just after the top
of reserved memory in the e820 list causes machines to not work at all
because the address decoding will clash.

Of course, we could also make the minimum address more of a *hint*, and
only make the resource allocator only abut the top-of-known-memory when it
absolutely has to, but on the other hand, in this case it really doesn't
have to, since there's just _tons_ of space for 64-bit resources. So the
correct thing really does seem to be to just use the 64-bit hw that is
there.


Comment 10 Robert Hancock 2007-12-19 00:20:54 UTC
As posted on LKML, I suspect the reservation is for the MMCONFIG aperture and
therefore is valid and must be respected.

I don't think forcing VGA BARs to be allocated within the low 4GB is a good
approach. In some cases it may be impossible to allocate the BAR there, and it's
really an artificial restriction. The X server needs to be fixed to handle this
if it really can't right now.


Comment 11 Bug Zapper 2008-11-26 09:00:52 UTC
This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '8'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 8's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 8 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 12 Bug Zapper 2009-01-09 05:30:58 UTC
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.