Bug 524808 - swiotlb should be enabled when VT-d setup fails
swiotlb should be enabled when VT-d setup fails
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
x86_64 Linux
high Severity high
: ---
: ---
Assigned To: Chris Wright
Fedora Extras Quality Assurance
https://fedoraproject.org/wiki/Common...
: CommonBugs
: 522668 528545 530340 530455 532582 (view as bug list)
Depends On:
Blocks: F12Blocker/F12FinalBlocker
  Show dependency treegraph
 
Reported: 2009-09-22 06:05 EDT by dongjibing
Modified: 2013-01-10 00:29 EST (History)
40 users (show)

See Also:
Fixed In Version: 2.6.31.6-145.fc12
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-11-09 16:50:23 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
boot error log when enable VT-d and intel_iommu=on (32.90 KB, text/plain)
2009-09-24 05:03 EDT, dongjibing
no flags Details
Full error boot log on kernel-2.6.31.1-56.fc12.x86_64 (27.89 KB, text/plain)
2009-10-11 23:04 EDT, dongjibing
no flags Details
_full_ boot log when vt-d enable in BIOS and intel_iommu=1 in kernel-2.6.31.1-56.fc12.x86_64 (38.94 KB, text/plain)
2009-10-12 22:34 EDT, dongjibing
no flags Details
attempt at patch (2.58 KB, patch)
2009-10-16 16:56 EDT, David Woodhouse
no flags Details | Diff
v2, compiles (4.02 KB, patch)
2009-10-19 12:41 EDT, Chris Wright
no flags Details | Diff
setup swiotlb, free if hw iommu succeeds (2.81 KB, patch)
2009-10-19 12:51 EDT, Chris Wright
no flags Details | Diff
iommu: allow fallback to swiotlb upon hw iommu initialization failure (8.56 KB, patch)
2009-10-22 19:51 EDT, Chris Wright
no flags Details | Diff
Boot log with an old BIOS on an Intel DP35DP (99.64 KB, text/plain)
2009-10-23 14:28 EDT, Ville-Pekka Vainio
no flags Details
make v2 patch compile (8.73 KB, patch)
2009-10-26 01:46 EDT, Scott Tsai
no flags Details | Diff
Much simpler fix (2.07 KB, patch)
2009-11-07 19:45 EST, David Woodhouse
no flags Details | Diff
full dmesg from boot with iommu=soft option (59.39 KB, text/plain)
2009-11-19 18:18 EST, Ian Macnaughtan
no flags Details
full dmesg from boot without iommu=soft option (53.02 KB, text/plain)
2009-11-19 18:27 EST, Ian Macnaughtan
no flags Details
this should catch the all 1's problem and still allow swiotlb fallback (1.67 KB, patch)
2009-11-19 20:40 EST, Chris Wright
no flags Details | Diff
dmesg from M6400 with kernel-2.6.31.6-144.fc12.x86_64 (60.13 KB, text/plain)
2009-11-21 01:21 EST, Peter Larsen
no flags Details
virt enabled on BIOS no special options on cmdline (55.17 KB, application/octet-stream)
2009-11-24 20:20 EST, Robinson Maureira
no flags Details
virt disabled on BIOS no special options on cmdline (47.05 KB, application/octet-stream)
2009-11-24 20:20 EST, Robinson Maureira
no flags Details
virt disabled on BIOS and iommu=soft (47.42 KB, application/octet-stream)
2009-11-24 20:21 EST, Robinson Maureira
no flags Details
virt disabled on BIOS and iommu=off (54.73 KB, application/octet-stream)
2009-11-24 20:22 EST, Robinson Maureira
no flags Details
virt disabled on BIOS and mem=2G (46.41 KB, application/octet-stream)
2009-11-24 20:22 EST, Robinson Maureira
no flags Details
virt disabled on BIOS and mem=2G nomodeset (45.40 KB, application/octet-stream)
2009-11-24 20:23 EST, Robinson Maureira
no flags Details
virt enabled on BIOS and iommu=pt (47.55 KB, application/octet-stream)
2009-11-24 21:17 EST, Robinson Maureira
no flags Details
virt enabled on bios + nomodeset (54.12 KB, text/plain)
2009-11-25 11:31 EST, Robinson Maureira
no flags Details
virt enabled on BIOS and nomodeset (HP Quicklook 2 disabled) (54.01 KB, text/plain)
2009-11-25 12:51 EST, Robinson Maureira
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Linux Kernel 14003 None None None Never

  None (edit)
Description dongjibing 2009-09-22 06:05:43 EDT
Description of problem:
When "VT for Direct I/O" is enabled in the BIOS, kernel-2.6.31-33.fc12.x86_64 boot fail.However if add intel_iommu=off booting option, kernel works fine. And it work ok in kernel-2.6.31-33.fc12.i686. 

Fail message:

usb 8-2:new low speed USB device using uhci_hcd and address 7
nommu_map_single: overflow 11f554be0+8 of deveice mask ffffffff
nommu_map_single: overflow 11f5fe908+64 of deveice mask ffffffff
...
usb 8-2: device descriptor read/64, error -32
hub 8-0:1.0: unable to enumerate USB device on port 2

No root device found
Boot has failed, sleeping forever


Version-Release number of selected component (if applicable):
kernel-2.6.31-33.fc12.x86_64 rawhide


How reproducible:
100%

Steps to Reproduce:
1.enable VT for Direct I/O in BIOS
2.boot F12 rawhide x86_64 with intel_iommu=on option
3.booting failed
  
Actual results:
 

Expected results:


Additional info:
Comment 1 dongjibing 2009-09-22 06:09:46 EDT
Dell Optiplex 760 HW http://www.smolts.org/client/show/pub_f9572217-d67c-4ee8-954b-b4350273184e
Comment 2 Chuck Ebbert 2009-09-23 06:44:31 EDT
Can you capture the entire log from the failed boot?
Comment 3 dongjibing 2009-09-24 05:03:32 EDT
Created attachment 362450 [details]
boot error log when enable VT-d and intel_iommu=on
Comment 4 David Woodhouse 2009-09-24 11:55:32 EDT
Your log is incomplete, but I suspect you're hitting a Dell BIOS bug (see the associated kernel.org bug).

The BIOS, which they obviously shipped without any form of QA whatsoever, puts entirely bogus addresses into its DMAR tables, which are supposed to tell the kernel where the IOMMU hardware can be found.

The kernel now detects this error, and aborts the IOMMU setup.

However, instead of falling back to using swiotlb, you end up using the nommu code.

We need to fix that fallback code, and ensure that swiotlb gets properly initialised when needed. Someone was working on that, IIRC, but it all went quiet. I'll chase (and maybe just do it myself).

You _might_ find that it's triggered by having VT-d disabled in the BIOS, and if you _enable_ it in the BIOS, the problem goes away? Please let me know if that's the case.
Comment 5 CAI Qian 2009-10-01 09:22:55 EDT
Jibing, can you answer the questions and re-sumit logs mentioned in comment #4 when you have a chance?
Comment 6 Kevin Bowling 2009-10-09 07:40:56 EDT
I have the same problem on a Sun Ultra 27.  It uses an AMI bios.  What logs would be helpful?
Comment 7 David Woodhouse 2009-10-09 07:50:23 EDT
The part of dmesg where it says 'your BIOS is broken'.
Comment 8 dongjibing 2009-10-11 23:04:36 EDT
Created attachment 364412 [details]
Full error boot log on kernel-2.6.31.1-56.fc12.x86_64
Comment 9 David Woodhouse 2009-10-12 07:45:39 EDT
That isn't a full boot log, and doesn't contain the bit I asked for.

If you add 'mem=2M' to the kernel command line, you may find it boots all the way and then you can get the _full_ output from dmesg.
Comment 10 David Woodhouse 2009-10-12 08:09:13 EDT
er, 'mem=2G' even. I suspect that 'mem=2M' won't work either :)
Comment 11 dongjibing 2009-10-12 22:34:50 EDT
Created attachment 364541 [details]
_full_ boot log when vt-d enable in BIOS and intel_iommu=1 in kernel-2.6.31.1-56.fc12.x86_64

Without vt-d enabled in BIOS, kernel works fine.Booting failure just happen when enable vt-d.The physical memory size is 4GB.If not add mem boot option, the system hang up and output the messages I append before.
Comment 12 David Woodhouse 2009-10-13 02:28:30 EDT
Here's the relevant part of your dmesg again:
 
------------[ cut here ]------------
WARNING: at drivers/pci/dmar.c:642 alloc_iommu+0x12c/0x286() (Not tainted)
Hardware name: OptiPlex 760                 
Your BIOS is broken; DMAR reported at address fedc1000 returns all ones!
BIOS vendor: Dell Inc.; Ver: A03; Product Version: 


As it clearly says, your BIOS is broken.

Ask Dell for a new BIOS written by somebody sober. Maybe ask them if they can manage to do some QA on it this time.

Also, check whether VT-d is enabled in the BIOS. If it's disabled, try turning it on.

We should definitely make the kernel fall back to swiotlb when this happens rather than falling back to the noiommu code. But that won't actually give you functional VT-d support on broken machines like this; that'll just give you the same behaviour you can achieve with 'iommu=off'.
Comment 13 Kevin Bowling 2009-10-13 02:41:19 EDT
Same here.  I expect better from Sun Microsystems, especially as the Ultra 27 is billeted as a workstation.
Comment 14 Mark McLoughlin 2009-10-13 06:34:24 EDT
Okay, if we're saying this is just a broken BIOS, I'm removing it from F12VirtBlocker
Comment 15 David Woodhouse 2009-10-13 06:52:12 EDT
Nah, put it back. We should cope better; we should invoke swiotlb when aborting the VT-d setup, rather than falling back to nommu. And the machine shouldn't crash.

I'm not going to get to that before the kernel summit. Do you want to take a look?

I think it involves calling the swiotlb early init code (which allocates a big contiguous chunk of memory) even if we think we have a real IOMMU, then freeing it again if we actually _do_ use the real IOMMU.
Comment 16 Mark McLoughlin 2009-10-13 06:57:50 EDT
Okay, added back to F12VirtBlocker

cdub, ddd: could one of you take a look? Dave probably won't get to it for F12 GA
Comment 17 Chuck Ebbert 2009-10-13 22:07:10 EDT
*** Bug 528545 has been marked as a duplicate of this bug. ***
Comment 18 David Woodhouse 2009-10-16 16:56:02 EDT
Created attachment 365090 [details]
attempt at patch

This is a completely untested first attempt at a patch. I'm about to leave for the kernel summit, so may not be able to test this properly for a week or so.
Comment 19 Chris Wright 2009-10-16 18:14:26 EDT
I'll do some testing.  I had just finished a patch as well (had considered doing the same as you, although I tried to have a alloc/free to ensure the large order allocation would succeed...doesn't actually work).  Yours is simpler.
Comment 20 David Woodhouse 2009-10-17 07:33:47 EDT
I figured that since this only happens on boxes with >4GiB RAM, the odds are fairly good that we should be able to allocate a 64MiB chunk of memory even if we don't do it early. We're still before all the device drivers and file system stuff, anyway.

Only if that theory is disproved will I bother to try the 'allocate it early in all cases, then free it if we don't want it' complexity.

Do you agree with my choices for the 'gfp' variable in the late_init routine? I thought about passing it in from the caller, but figured it was saner to do it internally.
Comment 21 Chris Wright 2009-10-19 12:40:17 EDT
(In reply to comment #20)
> I figured that since this only happens on boxes with >4GiB RAM, the odds are
> fairly good that we should be able to allocate a 64MiB chunk of memory even if
> we don't do it early. We're still before all the device drivers and file system
> stuff, anyway.

I'm only able to get 4M (both on a 4G box as well as a 12G box).
 
> Only if that theory is disproved will I bother to try the 'allocate it early in
> all cases, then free it if we don't want it' complexity.

I have that patch, only trouble is freeing...The bootmem allocator is cleaned up and gone, so doing a free_bootmem hits a BUG().

> Do you agree with my choices for the 'gfp' variable in the late_init routine? I
> thought about passing it in from the caller, but figured it was saner to do it
> internally.  

That seemed sane to me, except I didn't fully grok the GFP_KERNEL case for !64 bit.
Comment 22 Chris Wright 2009-10-19 12:41:32 EDT
Created attachment 365248 [details]
v2, compiles

Tested this one a bit.  As expected, it's hard to get an order 14 allocation this late.  Both boxes I tested on could only come up with order 10:

Warning: only able to allocate 4 MB for software IO TLB

And was easy to trigger overflows (I cheated and forced overuse of swiotlb):

DMA: Out of SW-IOMMU space for 65536 bytes at device 0000:00:1f.2
DMA: Out of SW-IOMMU space for 65536 bytes at device 0000:00:1f.2
DMA: Out of SW-IOMMU space for 65536 bytes at device 0000:00:1f.2

Left one box limping, and other couldn't start X.

Tested by forcing an error condition and forcing use of swiotlb with this extra patch (swiotlb=force on cmdline, so putting unusual stress on swiotlb):

arch/x86/kernel/pci-swiotlb.c
-       if (swiotlb_force)
-               swiotlb = 1;
drivers/pci/dmar.c
-       if (iommu->cap == (uint64_t)-1 && iommu->ecap == (uint64_t)-1) {
+       if (1) {
Comment 23 Chris Wright 2009-10-19 12:51:16 EDT
Created attachment 365249 [details]
setup swiotlb, free if hw iommu succeeds

This is just for sake of completeness.  This patch doesn't actually work.
It effectively always allocates the swiotlb, and only uses it when needed (no hw iommu, or hw iommu is hidden behind broken BIOS).  When not needed it frees, which is the bit that doesn't work.
Comment 24 Chris Wright 2009-10-19 13:48:42 EDT
Err, ddd just reminded me that buddy allocator has a MAX_ORDER 11 on x86, so we'll never get more than order 10 allocation.
Comment 25 Jesse Keating 2009-10-21 18:29:41 EDT
Running out of time here folks.  Is this something we'd slip the release for?
Comment 26 David Woodhouse 2009-10-21 19:13:38 EDT
I don't think it's that important to slip the release.

We're discussing the fallback mode when the BIOS is broken and we have to abort IOMMU setup late in the boot, _and_ we have >4GiB of RAM.

If the user boots with 'iommu=off' when they're on a broken machine such as this, the problem will go away.

Perhaps a short-term answer would be to panic() if this happens and there's >4GiB RAM? At least the user sees the actual problem then.
Comment 27 Chris Wright 2009-10-22 18:48:37 EDT
I agree re: slippage.  Could have less that 4G if memory is pushed above 4G to make room for pci hole, etc.  I'll attach a patch that does the fallback in a moment.
Comment 28 Jeffrey C. Ollie 2009-10-22 19:08:06 EDT
My laptop appears to have this problem, but I'm able to boot, perhaps because I have 4G of memory and not more.  If I boot without intel_iommu=off USB doesn't work, but it does with intel_iommu=off.  There's a BIOS update but I wanted to wait until returning from the Fedora Talk FAD to upgrade the BIOS just in case I bricked the machine.  Here's the smolt profile:


http://www.smolts.org/client/show/pub_d31ee593-04e1-42b4-8e4a-79ccaef15d5c
Comment 29 Jesse Keating 2009-10-22 19:25:16 EDT
Moving to virt target then.
Comment 30 Chris Wright 2009-10-22 19:51:52 EDT
Created attachment 365798 [details]
iommu: allow fallback to swiotlb upon hw iommu initialization failure

This should allow the swiotlb to be used as a fallback.  I don't think it will work for AMD IOMMU in passthrough mode.  And the new interface for freeing bootmem pages directly to page allocator is not ideal.
Comment 31 Ville-Pekka Vainio 2009-10-23 14:23:19 EDT
I was also seeing DMAR/IOMMU errors on an Intel DP35DP motherboard, the product page is http://www.intel.com/products/desktop/motherboards/DP35DP/DP35DP-overview.htm . USB wasn't working at all when the kernel reported those errors. A BIOS update solved the problem. I only have one log of a problematic boot with the old BIOS, I'll attach it soon. Now that the system works, I obviously won't downgrade the BIOS just for testing ;)
Comment 32 Ville-Pekka Vainio 2009-10-23 14:28:28 EDT
Created attachment 365879 [details]
Boot log with an old BIOS on an Intel DP35DP
Comment 33 Scott Tsai 2009-10-26 01:46:33 EDT
Created attachment 366066 [details]
make v2 patch compile

Include <linux/swap.h> in mm/bootmem.c for totalram_pages, otherwise identical to the v2 patch.

I added this patch to the kernel-2.6.31.5-96.fc12.x86_64 SRPM and it made USB work again on my HP EliteBook 2530p.

I then hacked pci_swiotlb_init to always call swiotlb_free and not set dma_ops to exercise the code that frees the memory allocated in swiotlb_init and it seems to work as well.

My boot logs before and after the patch are here:
http://scottt.tw/bug/rhbz-524808/
Comment 34 Scott Tsai 2009-10-26 02:04:16 EDT
(In reply to comment #29)
> Moving to virt target then.  
If this bug is not fixed by F12 general availability, I recommend documenting the following in the release notes:

If you can't boot or USB doesn't work and you see "Your BIOS is broken; DMAR reported at address ..." in dmesg, try passing "intel_iommu=off" or "mem=3G" (if you have 4G of memory or more) on the kernel command line as a workaround.
Comment 35 Jeffrey C. Ollie 2009-11-02 23:25:31 EST
This seems to be "fixed" for me when I went into the BIOS and enabled virtualization.  I can remove the intel_iommu=off from my kernel command line and everything seems to be working.

Upgrading the BIOS didn't help.  Here's the smolt link to my system:

http://www.smolts.org/client/show/pub_d31ee593-04e1-42b4-8e4a-79ccaef15d5c
Comment 36 José Matos 2009-11-06 14:01:13 EST
With all the kernels after -112 (the previous I had tested was -97) the intel_iommu=off parameter does not work anymore. I tested the 112, 115, 117 and 122 versions.

This is what I get:
------------[ cut here ]------------
WARNING: at drivers/pci/dmar.c:183 dmar_table_init+0x161/0x3aa() (Not tainted)
Hardware name: HP EliteBook 8530p
Your BIOS is broken; DMAR reported at address zero!
BIOS vendor: Hewlett-Packard; Ver: 68PDV Ver. F.0E; Product Version: F.0E
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.31.5-122.fc12.x86_64 #1
Call Trace:
[<ffffffff81051694>] warn_slowpath_common+0x84/0x9c
[<ffffffff81051703>] warn_slowpath_fmt+0x41/0x43
[<ffffffff8173c893>] dmar_table_init+0x161/0x3aa
[<ffffffff81722ff6>] enable_IR_x2apic+0x12/0x1ce
[<ffffffff81720a2f>] native_smp_prepare_cpus+0x12d/0x361
[<ffffffff8105496c>] ? do_wait+0x299/0x2d7
[<ffffffff817145bb>] kernel_init+0x84/0x273
[<ffffffff81012daa>] child_rip+0xa/0x20
[<ffffffff81714537>] ? kernel_init+0x0/0x273
[<ffffffff81012da0>] ? child_rip+0x0/0x20
---[ end trace a7919e7f17c0a725 ]---

Following Jeffrey's tip I enabled virtualization in the BIOS and now it works without any parameter passed to the kernel. Thanks Jeff. :-)
Comment 37 Chuck Ebbert 2009-11-06 16:53:21 EST
Adding "iommu=soft" to the boot options should work to disable the Intel IOMMU.
Comment 38 Adam Williamson 2009-11-07 18:42:39 EST
*** Bug 522668 has been marked as a duplicate of this bug. ***
Comment 39 Adam Williamson 2009-11-07 18:44:17 EST
*** Bug 532582 has been marked as a duplicate of this bug. ***
Comment 40 Adam Williamson 2009-11-07 18:44:32 EST
*** Bug 530455 has been marked as a duplicate of this bug. ***
Comment 41 Adam Williamson 2009-11-07 19:03:28 EST
*** Bug 530340 has been marked as a duplicate of this bug. ***
Comment 42 Adam Williamson 2009-11-07 19:09:47 EST
to bring across the summary I wrote for 490477 (incorrectly):

so, here's the scoop on this.

it breaks USB functionality entirely, and in one case at least completely stops
the system booting, on several motherboards. (It's actually
really caused by buggy BIOSes, but we can't sell that to the users). We have enough information to say definitely that 8 people have hit this bug - that is, there are reports or comments on reports from 8 people which definitely inidcate they hit this precise problem.

If we ship with this, we will have people with unusable systems. The workaround
is simple: iommu=soft kernel parameter (or, disabling VT-d in the BIOS, or limit RAM to 4GB). But it relies on them finding the documentation.

If we disable it by default, the impact is that it breaks PCI passthrough for
KVMs. Kyle is almost positive it can't possibly break anything else. The
converse workaround would be possible for any virt users who need that to work:
intel_iommu=1 .

David believes the patches above would fix this, but it's quite late to be making actual code changes.

That's the state of play on this issue at the moment.


-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 43 Adam Williamson 2009-11-07 19:20:03 EST
*** Bug 522668 has been marked as a duplicate of this bug. ***
Comment 44 Adam Williamson 2009-11-07 19:27:28 EST
Dan Beard is definitely hitting this issue, so that makes 9 confirmed sufferers.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 45 Adam Williamson 2009-11-07 19:30:39 EST
clarification (as others will probably make the same mistake I did): when David says this happens only with RAM 'above 4GiB', he means above 4GiB in address space. This means it will likely affect any system with 3GiB or more of actual physical RAM - certainly systems with exactly 4GiB, and down to 2.5GiB in some cases.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 46 David Woodhouse 2009-11-07 19:45:31 EST
Created attachment 367983 [details]
Much simpler fix

This patch is much simpler than the ones which deal swiotlb memory allocation.

This particular problem is actually detectable _early_ -- before we even claim to have detected an IOMMU. So check for it then (by cutting and pasting existing code), and just pretend not to have an IOMMU if this BIOS bug is detected. So swiotlb is set up as normal.
Comment 47 Jesse Keating 2009-11-07 20:03:55 EST
I thought this bug only happens when VT-d is /disabled/?  Therefor /enabling/ VT-d would work around the issue.  Do I remember wrong?
Comment 48 Adam Williamson 2009-11-07 20:15:32 EST
More data: we can get a handle on affected system models from kerneloops.

http://www.kerneloops.org/search.php?search=dmar_table_init&btnG=Function+Search

any system which hits an incarnation of the "DMAR reported at address zero!' traceback, on which VT-d is disabled in the BIOS and which has enough RAM, will hit this bug (confirmed by David). We suspect these systems are shipping with VT-d disabled by default in the BIOS, or else that number of people would not hit the bug (it's unlikely that many people have gone into their BIOS and manually disabled virtualization).

It's safe to assume that any time the kernel oops has been reported to kerneloops.org, the system in question would hit this bug when running F12 as long as it had enough RAM. There are hundreds of occurrences of this oops at kerneloops.org, so that is a worrying indicator. My belief is that a lot of HP and several Acer models have the broken BIOSes, and they were shipped with VT-d disabled by default in their BIOSes. I'm going to check now how likely they are to have had enough RAM installed from the factory to hit this problem, but right now I suspect there's a lot of potential sufferers of this bug out there.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 49 Adam Williamson 2009-11-07 20:32:54 EST
Jesse: yes, that is correct, sorry for getting it wrong in the summary.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 50 David Woodhouse 2009-11-07 20:48:02 EST
Code fix from comment #46 is in 2.6.31.5-127.fc12, building at
http://koji.fedoraproject.org/koji/taskinfo?taskID=1794534
Comment 51 jeremy boyd 2009-11-07 20:54:26 EST
hello. i can confirm that i get this dmar notification every time i boot up.
here is my initial bug report
https://bugzilla.redhat.com/show_bug.cgi?id=532582 which was merged with this
one.
i have an hp pavilion dv7 and have always kept my bios updated. last week i
tried the three most recent bioses and they all had this issue so i didn't try
any that were older than that. also i checked my bios configuration and
couldn't find an option for VT-d.

please let me know if there is anything i can do to help you guys troubleshoot
or test out fixes.
Comment 52 David Woodhouse 2009-11-07 21:51:47 EST
Jeremy, thanks. The kernel at http://koji.fedoraproject.org/koji/taskinfo?taskID=1794537 ought to fix the problem. If you could confirm that, it would be much appreciated; thanks.
Comment 53 Scott Tsai 2009-11-07 22:12:06 EST
kernel-2.6.31.5-127.fc12.x86_64 from:
http://koji.fedoraproject.org/koji/taskinfo?taskID=1794537
does fix the "USB not working" problem for me.

I installed with "rpm -i --nodeps" to workaround the kernel-firmware version dependency.
Comment 54 David Woodhouse 2009-11-07 22:19:16 EST
Two testers on #fedora-kernel (jcollie, jcasale) have also confirmed that the -127 kernel fixes the problem.
Comment 55 Adam Williamson 2009-11-07 22:46:53 EST
they also confirmed the 126 kernel, david ;)

I have confirmed that 126 and 127 boot and work as normal on a system unaffected by the bug.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 56 Adam Williamson 2009-11-07 22:52:07 EST
My vote would be to re-spin with this fix tonight for Liam to test tomorrow, if Jesse's around and able to do it this evening. Otherwise we can leave it till Monday and discuss then - because we'd either have to go ahead and ship without the fix, or slip.

from a strict qa policy point of view i'd have to say we'd prefer the 126 fix to the 127 for GA, but honestly I don't feel bad about 127 either.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 57 Adam Williamson 2009-11-07 22:53:54 EST
oh, and if we do respin, I would be in favour of taking https://bugzilla.redhat.com/show_bug.cgi?id=533553 too - that is, drop PackageKit-command-not-found from any of the default install sets.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 58 Jeffrey C. Ollie 2009-11-08 01:31:40 EST
Yup, both -127 and -126 fix the problem for me.
Comment 59 jeremy boyd 2009-11-08 08:43:47 EST
127 fixes it for me. i didn't try 126.

thanks.
Comment 60 Vedran Miletić 2009-11-08 09:12:10 EST
Hi guys,

sorry for going off topic, but this is a great place for this question. I believe it would be very useful to have a list of desktop motherboards which have VT-d (not VT) option in BIOS.

From the information that is available online, it's easy to find out that only Q35 and Q45 have requirement to support VT-d, while it's optional for other desktop chipsets (P35, P45 etc.), but it's very hard to find that out for a particular board since it's possible that it has been added in later BIOS revisions and isn't listed in board manual.

Please, mail me name of board manufacturer and exact model (revision, if there is more than one), and note whether your board has VT-d option in BIOS. I'm also interested in laptops if you have one.

Once I have enough information, I will create a wiki page about it.

Thank you.
Comment 61 Adam Williamson 2009-11-08 14:11:33 EST
please, please don't go off-topic, this bug is absolutely vital to F12 release and it would be very easy to lose the thread if people start posting completely unrelated information. Anyone who would like to answer Vedran's question, PLEASE do it by private email and not in this bug.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 62 Rahul Sundaram 2009-11-09 16:50:23 EST
Fix is included in Fedora 12 RC4. Thanks everyone.
Comment 63 Mariusz Smykuła 2009-11-10 01:46:36 EST
127 fix it for me too, usb stack is working OK.
Comment 64 Adam Williamson 2009-11-18 19:04:42 EST
as per #533952, david believes the fix for this in 127 didn't actually work. hence we are confused as to how multiple testers reported 127 was OK. can you guys possibly double-check and confirm that 122 _really_ fails and 127 _really_ works on your systems? we're just confused as to how this could possibly have happened.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 65 Ian Macnaughtan 2009-11-19 00:58:11 EST
Thought this was worth adding even though status is closed.

Downloaded livecd x86_64 with kernel 2.6.31.5-127 last evening (AU DS Time) and installed to hdd. Been trying to figure out usb problem all day till i found this bug and others. Both VTx and VTd are enabled in bios and have to use 'iommu=soft' to enable usb support. Startup time is much faster. Have not tested disabling VTd yet.

Regardless of above settings i get a message in ABRT each boot about a kernel crash with the dmar 'broken bios' message.

Machine is a HP xw4600 with 4G ram. Bios was just flashed to 1.17 - latest available for download from hp.
Comment 66 Adam Williamson 2009-11-19 01:57:49 EST
ian: see https://bugzilla.redhat.com/show_bug.cgi?id=533952 . it appears that the fix we put in to fix this problem for people with the affected systems and *more than* around 2.5GB of RAM causes it to break for people with the affected systems and *less than* around 2.5GB of RAM.

le sigh.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 67 Adam Williamson 2009-11-19 01:58:25 EST
whoops, I really should read to the end of messages. can you check with kernel 137 from 533952 and see how that behaves, though, even so?

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 68 Adam Williamson 2009-11-19 11:49:09 EST
ian: also, can we get the _exact_ kernel message you see? Thanks.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 69 Ian Macnaughtan 2009-11-19 18:16:40 EST
Thanks Adam. Let me know if should add to 533952 instead. This is the portion of dmesg that i see in abrt every time i boot:

------------[ cut here ]------------
WARNING: at drivers/pci/dmar.c:668 alloc_iommu+0x11d/0x253() (Not tainted)
Hardware name: HP xw4600 Workstation
Your BIOS is broken; DMAR reported at address fed90000 returns all ones!
BIOS vendor: Hewlett-Packard; Ver: 786F3 v01.17; Product Version:  
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.31.5-127.fc12.x86_64 #1
Call Trace:
 [<ffffffff81051694>] warn_slowpath_common+0x84/0x9c
 [<ffffffff81051703>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff81222206>] alloc_iommu+0x11d/0x253
 [<ffffffff8173c8ea>] ? dmar_table_init+0x10d/0x34d
 [<ffffffff8173c960>] dmar_table_init+0x183/0x34d
 [<ffffffff81722ff6>] enable_IR_x2apic+0x12/0x1ce
 [<ffffffff81720a2f>] native_smp_prepare_cpus+0x12d/0x361
 [<ffffffff817145bb>] kernel_init+0x84/0x273
 [<ffffffff81012daa>] child_rip+0xa/0x20
 [<ffffffff81714537>] ? kernel_init+0x0/0x273
 [<ffffffff81012da0>] ? child_rip+0x0/0x20
---[ end trace a7919e7f17c0a725 ]---

I will shortly attach 2 full dmesg files from boot with and without iommu=soft option. I have also downloaded kernel 137 and will do some testing. I'm pretty new at this but will see what i can do :)
Comment 70 Ian Macnaughtan 2009-11-19 18:18:10 EST
Created attachment 372375 [details]
full dmesg from boot with iommu=soft option
Comment 71 Adam Williamson 2009-11-19 18:22:37 EST
ian: yours will likely end up being a completely new report, since it looks somewhat different from any of the currently-known cases. David, do you agree? I think 'DMAR reported at address fed90000 returns all ones!' indicates something different, right?

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 72 Ian Macnaughtan 2009-11-19 18:27:51 EST
Created attachment 372378 [details]
full dmesg from boot without iommu=soft option

booting stock kernel 2.6.31.5-127.fc12.x86_64 installed to hdd from live cd
100M /boot on raid1
/ on raid5 lv
Comment 73 Peter Larsen 2009-11-19 18:36:19 EST
I have the exact same problem with F12 on a Dell Precision M6400. Including the "bad bios" warning. This worked under F11. USB is lost and non-operational.
Comment 74 Adam Williamson 2009-11-19 18:40:06 EST
Peter: please check if your bug is bug #533952.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 75 Chris Wright 2009-11-19 20:25:53 EST
(In reply to comment #71)
> ian: yours will likely end up being a completely new report, since it looks
> somewhat different from any of the currently-known cases. David, do you agree?
> I think 'DMAR reported at address fed90000 returns all ones!' indicates
> something different, right?

Different yes, but variation on a theme.  The BIOS is broken.  The simpler patch that David provided will not catch this one, unfortunately.  Choices are to add more hackery, or backport the swiotlb fallback patches.
Comment 76 Chris Wright 2009-11-19 20:40:30 EST
Created attachment 372397 [details]
this should catch the all 1's problem and still allow swiotlb fallback

This adds to the current approach where we try to detect buggy BIOS during VT-d detection rather than VT-d initialization.  It's a little hacky, and I didn't remove the identical check that will happen due to interrupt remapping init.

Ian, if you're up for compiling a kernel, this should give you a functioning system (w/out needing to add iommu=soft to kernel cmdline).
Comment 77 Adam Williamson 2009-11-19 23:50:18 EST
thanks, Chris.

is it too much to ask for you and David to get together and do a kernel build which just solves all the iommu issues so Jesse and I can put down our whisky and stop crying? :) thanks :)

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 78 Peter Larsen 2009-11-20 09:34:28 EST
(In reply to comment #74)
> Peter: please check if your bug is bug #533952.

I'm not sure I'm able to see the difference between the two bug# at this point. It looks like both are a hit to me?  I have a workaround - seen here and in the above bug: iommu=soft. This takes out the error and USB gets initialized.
Comment 79 David Woodhouse 2009-11-20 09:55:46 EST
Ian: what you report in comment #69 is a different bug. It seems that HP managed to find a new way to screw up. Yippee!

In this one, they report an IOMMU at a bogus location but a _believable_ bogus location rather than at zero. So we don't actually find out about it until we ioremap it and start trying to talk to it.

Thanks, HP. Another stunning display of quality control.

We had seen that failure mode before (Lenovo) but there it was only ever a one-off when you first enabled VT-d in the BIOS, and it all worked properly after a power cycle.

Chris's patch seems like a reasonable way to address it in the short term; thanks.
Comment 80 David Woodhouse 2009-11-20 09:58:49 EST
(In reply to comment #78)
> (In reply to comment #74)
> > Peter: please check if your bug is bug #533952.
> 
> I'm not sure I'm able to see the difference between the two bug# at this point.
> It looks like both are a hit to me?  I have a workaround - seen here and in the
> above bug: iommu=soft. This takes out the error and USB gets initialized.  

Peter, please show the precise error. Is it that your BIOS is broken and it reports a DMAR at address zero, or that it reports a DMAR which returns all ones, as Ian reported?
Comment 81 Peter Larsen 2009-11-20 10:26:40 EST
(In reply to comment #80)
> (In reply to comment #78)
> > (In reply to comment #74)
> > > Peter: please check if your bug is bug #533952.
> > 
> > I'm not sure I'm able to see the difference between the two bug# at this point.
> > It looks like both are a hit to me?  I have a workaround - seen here and in the
> > above bug: iommu=soft. This takes out the error and USB gets initialized.  
> 
> Peter, please show the precise error. Is it that your BIOS is broken and it
> reports a DMAR at address zero, or that it reports a DMAR which returns all
> ones, as Ian reported?  

Ok - the oops isn't shown anymore on the boot console, but it IS happening. My USB is operational with the above work-around, but I'm still getting the same oops (and I think it is preventing nvidia etc. from loading drivers - the box dies when nouveau tries to load).

DMAR:Host address width 36
DMAR:DRHD base: 0x000000fed10000 flags: 0x0
------------[ cut here ]------------
WARNING: at drivers/pci/dmar.c:668 alloc_iommu+0x11d/0x253() (Not tainted)
Hardware name: Precision M6400                 
Your BIOS is broken; DMAR reported at address fed10000 returns all ones!
BIOS vendor: Dell Inc.; Ver: A07; Product Version: 
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.31.5-127.fc12.x86_64 #1
Call Trace:
 [<ffffffff81051694>] warn_slowpath_common+0x84/0x9c
 [<ffffffff81051703>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff81222206>] alloc_iommu+0x11d/0x253
 [<ffffffff8173c8ea>] ? dmar_table_init+0x10d/0x34d
 [<ffffffff8173c960>] dmar_table_init+0x183/0x34d
 [<ffffffff81722ff6>] enable_IR_x2apic+0x12/0x1ce
 [<ffffffff81720a2f>] native_smp_prepare_cpus+0x12d/0x361
 [<ffffffff817145bb>] kernel_init+0x84/0x273
 [<ffffffff81012daa>] child_rip+0xa/0x20
 [<ffffffff81714537>] ? kernel_init+0x0/0x273
 [<ffffffff81012da0>] ? child_rip+0x0/0x20
---[ end trace a7919e7f17c0a725 ]---
DMAR:parse DMAR table failure.
Comment 82 Adam Williamson 2009-11-20 16:24:09 EST
peter, your problem looks rather like Ian's, then. David, looks like a Dell with the same epic fial. "Precision M6400". right?

btw, peter, it's normal that you see the 'oops' even with the workaround, since it's not really an oops but more an informational message to let you know your manufacturer's BIOS engineers are complete idiots.

general message to all: if you ever find out that there is a BIOS Engineers' Convention occurring, please try and prevent David from discovering the location by absolutely any means necessary. That could only end very, very, badly. =)

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 83 Peter Larsen 2009-11-20 20:00:13 EST
(In reply to comment #82)
> peter, your problem looks rather like Ian's, then. David, looks like a Dell
> with the same epic fial. "Precision M6400". right?

yeah - I checked earlier today if there were any bios updates. Strange - same "version" but with a later date than when I downloaded A07.

I missed the little detail about 1s vs 0s .... the two bugs looked very similar to me.

Thanks for pointing that out to me.
Comment 84 Peter Larsen 2009-11-20 20:12:29 EST
(In reply to comment #82)
> btw, peter, it's normal that you see the 'oops' even with the workaround, since
> it's not really an oops but more an informational message to let you know your
> manufacturer's BIOS engineers are complete idiots.

Btw. thanks for the fedoraproject link. The PROBLEM is, that I am on 64bit already. And I HAVE the virtualization option turned on; this was my setup in F11, and I did an F12 upgrade and now have the problem.

It looks like this problem is connected to my inability to load nvidia drivers (they crash or just freeze the box) so I was seeking a way to remove the issue before filing nvidia/nouveau bugs. I guess I'll just have to go in that direction anyway.
Comment 85 Chris Wright 2009-11-20 20:33:40 EST
(In reply to comment #77)
> is it too much to ask for you and David to get together and do a kernel build
> which just solves all the iommu issues so Jesse and I can put down our whisky
> and stop crying? :) thanks :)

Bit much since I'd be hard pressed to ask you and Jesse to put down your whisky ;)

Variant of fix in comment #76 is being built now:

http://koji.fedoraproject.org/koji/buildinfo?buildID=142281

Peter, Ian, can you test this kernel please?
Comment 86 Peter Larsen 2009-11-20 23:08:31 EST
(In reply to comment #85)

> Variant of fix in comment #76 is being built now:
> 
> http://koji.fedoraproject.org/koji/buildinfo?buildID=142281
> 
> Peter, Ian, can you test this kernel please?  

It's been a while since I played with "custom" kernels. I get a dependency error when I try to install the above kernel:

kernel-2.6.31.6-144.fc12.x86_64 from /kernel-2.6.31.6-144.fc12.x86_64 has depsolving problems
  --> Missing Dependency: kernel-firmware >= 2.6.31.6-144.fc12 is needed by package kernel-2.6.31.6-144.fc12.x86_64 (/kernel-2.6.31.6-144.fc12.x86_64)

The site doesn't list any kernel-firmware packages?
Comment 87 Justin M. Forbes 2009-11-21 00:41:31 EST
Peter, The kernel-firmware package is noarch, but it is present:

http://kojipkgs.fedoraproject.org/packages/kernel/2.6.31.6/144.fc12/noarch/kernel-firmware-2.6.31.6-144.fc12.noarch.rpm
Comment 88 Peter Larsen 2009-11-21 01:21:50 EST
Created attachment 372692 [details]
dmesg from M6400 with kernel-2.6.31.6-144.fc12.x86_64

Calgary: detecting Calgary via BIOS EBDA area
Calgary: Unable to locate Rio Grande table in EBDA - bailing!
------------[ cut here ]------------
WARNING: at drivers/pci/dmar.c:615 check_zero_address+0x14f/0x191() (Not tainted)
Hardware name: Precision M6400                 
Your BIOS is broken; DMAR reported at address fed10000 returns all ones!
BIOS vendor: Dell Inc.; Ver: A07; Product Version: 
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.31.6-144.fc12.x86_64 #1
Call Trace:
 [<ffffffff810516f4>] warn_slowpath_common+0x84/0x9c
 [<ffffffff81051763>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff8173c82b>] check_zero_address+0x14f/0x191
 [<ffffffff8125f9c9>] ? acpi_tb_verify_table+0x57/0x5c
 [<ffffffff8125f027>] ? acpi_get_table_with_size+0x5a/0xb4
 [<ffffffff81420ec3>] ? _etext+0x0/0x1
 [<ffffffff8173c87f>] detect_intel_iommu+0x12/0x8c
 [<ffffffff8171bbc0>] pci_iommu_alloc+0x5e/0x6c
 [<ffffffff8172b13e>] mem_init+0x19/0x161
 [<ffffffff81714be5>] start_kernel+0x20b/0x3fa
 [<ffffffff817142a1>] x86_64_start_reservations+0xac/0xb0
 [<ffffffff8171439d>] x86_64_start_kernel+0xf8/0x107
---[ end trace a7919e7f17c0a725 ]---
Comment 89 Peter Larsen 2009-11-21 01:23:47 EST
(In reply to comment #87)
> Peter, The kernel-firmware package is noarch, but it is present:
> 
> http://kojipkgs.fedoraproject.org/packages/kernel/2.6.31.6/144.fc12/noarch/kernel-firmware-2.6.31.6-144.fc12.noarch.rpm  

Thanks Justin - I'm quite a bit rusty when it comes to the "new" way of doing kernel setups. Last I played around with them, we barely had kernel modules around.

I uploaded dmesg from my attempt - since nvidia wasn't supported by that version, I had a serious crash; I still get the same DMAR message though. Would it help to use a debug version of the kernel?
Comment 90 David Woodhouse 2009-11-21 01:40:28 EST
The DMAR message is expected -- your BIOS is still broken, and HP haven't yet provided you with a fixed upgrade.

The point is that now, the kernel will print the message and work properly.

You do get the message _twice_ in one boot, which is slightly suboptimal -- but that's only really cosmetic. Everything _is_ working fine.
Comment 91 Adam Williamson 2009-11-21 11:25:37 EST
peter: notice the 'in most cases' weasel-words in the common bugs. you and ian are the exception to this, since you have the slightly different bug ('all ones!'). the stuff about it only applying to 32-bit and low RAM and virt disabled and yadda yadda applies to the _other_ bug. you're speeeeeecial :)

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 92 Ian Macnaughtan 2009-11-22 17:03:42 EST
Just back from weekend. Tested 137 and 144 without iommu=soft. USB did not work in 137 but appears to be working in 144. I still get dmar message re broken bios but not too concerned about that at present as it does not appear to be affecting anything else. Thanks for the assistance :)
Comment 93 Fedora Update System 2009-11-23 08:03:55 EST
kernel-2.6.31.6-145.fc12 has been submitted as an update for Fedora 12.
http://admin.fedoraproject.org/updates/kernel-2.6.31.6-145.fc12
Comment 94 Adam Williamson 2009-11-23 15:27:55 EST
ian: the message still appearing is intentional; it's an informational message that your BIOS is broken. Your BIOS is still broken, so the message still appears. =)

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 95 Robinson Maureira 2009-11-24 19:18:30 EST
Hi all,

I'm using 2.6.31.6-148.fc12.x86_64, and still get the DMAR bug and can't even get to the GDM login screen.

I've tried various combinations with no luck so far... I have dmesg from the following combinations 

virt-disabled on BIOS
virt-disabled on BIOS + mem=2G nomodeset
virt-disabled on BIOS + iommu=soft
virt-disabled on BIOS + mem=2G
virt-disabled on BIOS + iommu=off
virt-enabled on BIOS

Want me to post them? I'm getting two traces

------------[ cut here ]------------
WARNING: at drivers/pci/dmar.c:596 check_zero_address+0x96/0x191() (Not tainted)
Hardware name: HP Compaq 6730b (FS242LA#ABM)
Your BIOS is broken; DMAR reported at address zero!
BIOS vendor: Hewlett-Packard; Ver: 68PDD Ver. F.11; Product Version: F.11
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.31.6-148.fc12.x86_64 #1
Call Trace:
 [<ffffffff810516f4>] warn_slowpath_common+0x84/0x9c
 [<ffffffff81420ec3>] ? _etext+0x0/0x1
 [<ffffffff81051763>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff8173c772>] check_zero_address+0x96/0x191
 [<ffffffff8125f9c9>] ? acpi_tb_verify_table+0x57/0x5c
 [<ffffffff8125f027>] ? acpi_get_table_with_size+0x5a/0xb4
 [<ffffffff81420ec3>] ? _etext+0x0/0x1
 [<ffffffff8173c87f>] detect_intel_iommu+0x12/0x8c
 [<ffffffff8171bbc0>] pci_iommu_alloc+0x5e/0x6c
 [<ffffffff8172b13e>] mem_init+0x19/0x161
 [<ffffffff81714be5>] start_kernel+0x20b/0x3fa
 [<ffffffff817142a1>] x86_64_start_reservations+0xac/0xb0
 [<ffffffff8171439d>] x86_64_start_kernel+0xf8/0x107
---[ end trace a7919e7f17c0a725 ]---

and

DMAR:Host address width 36
DMAR:DRHD base: 0x00000000000000 flags: 0x1
------------[ cut here ]------------
WARNING: at arch/x86/mm/ioremap.c:219 __ioremap_caller+0x145/0x30e() (Tainted: G        W )
Hardware name: HP Compaq 6730b (FS242LA#ABM)
Modules linked in:
Pid: 1, comm: swapper Tainted: G        W  2.6.31.6-148.fc12.x86_64 #1
Call Trace:
 [<ffffffff810516f4>] warn_slowpath_common+0x84/0x9c
 [<ffffffff81051720>] warn_slowpath_null+0x14/0x16
 [<ffffffff81034cb3>] __ioremap_caller+0x145/0x30e
 [<ffffffff810e243e>] ? free_unmap_vmap_area_noflush+0x3c/0x7c
 [<ffffffff812223d7>] ? alloc_iommu+0x1de/0x253
 [<ffffffff81034f5e>] ioremap_nocache+0x17/0x19
 [<ffffffff812223d7>] alloc_iommu+0x1de/0x253
 [<ffffffff8173ca06>] ? dmar_table_init+0x10d/0x34d
 [<ffffffff8173ca7c>] dmar_table_init+0x183/0x34d
 [<ffffffff81722ff6>] enable_IR_x2apic+0x12/0x1ce
 [<ffffffff81720a2f>] native_smp_prepare_cpus+0x12d/0x361
 [<ffffffff817145bb>] kernel_init+0x84/0x273
 [<ffffffff81012daa>] child_rip+0xa/0x20
 [<ffffffff81714537>] ? kernel_init+0x0/0x273
 [<ffffffff81012da0>] ? child_rip+0x0/0x20
---[ end trace a7919e7f17c0a726 ]---

almost every time. The only way to have a functional system is to add nomodeset to my boot options.

BIOS is latest, had the same problems with F.0A, and of course, everything worked on F11 :-)
Comment 96 Chris Wright 2009-11-24 19:44:46 EST
(In reply to comment #95)
> Want me to post them?

Yes, please.

> ------------[ cut here ]------------
> WARNING: at drivers/pci/dmar.c:596 check_zero_address+0x96/0x191() (Not
> tainted)
> Hardware name: HP Compaq 6730b (FS242LA#ABM)
> Your BIOS is broken; DMAR reported at address zero!
> BIOS vendor: Hewlett-Packard; Ver: 68PDD Ver. F.11; Product Version: F.11

This is just a warning telling you the BIOS is broken.  You should be falling back to the swiotlb in this case.

> DMAR:Host address width 36
> DMAR:DRHD base: 0x00000000000000 flags: 0x1
> ------------[ cut here ]------------
> WARNING: at arch/x86/mm/ioremap.c:219 __ioremap_caller+0x145/0x30e() (Tainted:
> G        W )

This, however, I was worried about...ugh.

> Hardware name: HP Compaq 6730b (FS242LA#ABM)
> Modules linked in:
> Pid: 1, comm: swapper Tainted: G        W  2.6.31.6-148.fc12.x86_64 #1
> Call Trace:
>  [<ffffffff810516f4>] warn_slowpath_common+0x84/0x9c
>  [<ffffffff81051720>] warn_slowpath_null+0x14/0x16
>  [<ffffffff81034cb3>] __ioremap_caller+0x145/0x30e
>  [<ffffffff810e243e>] ? free_unmap_vmap_area_noflush+0x3c/0x7c
>  [<ffffffff812223d7>] ? alloc_iommu+0x1de/0x253
>  [<ffffffff81034f5e>] ioremap_nocache+0x17/0x19
>  [<ffffffff812223d7>] alloc_iommu+0x1de/0x253

This is from the 2nd ioremap.  So, I think we first remapped pfn=0, then read garbage, and the increased the size and the second remap is for multiple pages and triggering a page_is_ram() check.

>  [<ffffffff8173ca06>] ? dmar_table_init+0x10d/0x34d
>  [<ffffffff8173ca7c>] dmar_table_init+0x183/0x34d
>  [<ffffffff81722ff6>] enable_IR_x2apic+0x12/0x1ce
>  [<ffffffff81720a2f>] native_smp_prepare_cpus+0x12d/0x361
>  [<ffffffff817145bb>] kernel_init+0x84/0x273
>  [<ffffffff81012daa>] child_rip+0xa/0x20
>  [<ffffffff81714537>] ? kernel_init+0x0/0x273
>  [<ffffffff81012da0>] ? child_rip+0x0/0x20
> ---[ end trace a7919e7f17c0a726 ]---

You should have a message after this saying:

IOMMU: can't map the region
Comment 97 Robinson Maureira 2009-11-24 20:19:14 EST
Ok, following a bunch of attachs...

About the message, nope, there's no message like that on any dmesg.
Comment 98 Robinson Maureira 2009-11-24 20:20:11 EST
Created attachment 373601 [details]
virt enabled on BIOS no special options on cmdline
Comment 99 Robinson Maureira 2009-11-24 20:20:48 EST
Created attachment 373602 [details]
virt disabled on BIOS no special options on cmdline
Comment 100 Robinson Maureira 2009-11-24 20:21:34 EST
Created attachment 373603 [details]
virt disabled on BIOS and iommu=soft
Comment 101 Robinson Maureira 2009-11-24 20:22:10 EST
Created attachment 373604 [details]
virt disabled on BIOS and iommu=off
Comment 102 Robinson Maureira 2009-11-24 20:22:43 EST
Created attachment 373605 [details]
virt disabled on BIOS and mem=2G
Comment 103 Robinson Maureira 2009-11-24 20:23:51 EST
Created attachment 373606 [details]
virt disabled on BIOS and mem=2G nomodeset

This is the only way to get X to work, using nomodeset.
Comment 104 Robinson Maureira 2009-11-24 20:26:08 EST
If you want me to test another combination of options, just let me know.

My SMOLT profile is at http://www.smolts.org/client/show/pub_484873bb-2045-4025-9f87-c221de679098
Comment 105 Robinson Maureira 2009-11-24 21:17:16 EST
Created attachment 373608 [details]
virt enabled on BIOS and iommu=pt
Comment 106 Chris Wright 2009-11-24 21:27:06 EST
Hmm, that one still looks like it has VT-d disabled in the BIOS.
According to comment #98, your laptop has 3 iommu's.

DMAR:Host address width 36
DMAR:DRHD base: 0x000000feb03000 flags: 0x0
IOMMU feb03000: ver 1:0 cap c9008020e30260 ecap 1000
DMAR:DRHD base: 0x000000feb01000 flags: 0x0
IOMMU feb01000: ver 1:0 cap c0000020630260 ecap 1000
DMAR:DRHD base: 0x000000feb02000 flags: 0x1
IOMMU feb02000: ver 1:0 cap c9008020630260 ecap 1000
DMAR:RMRR base: 0xf000674ef000fa11 end: 0xf000e987f000fea8
DMAR:RMRR base: 0x000000bbc00000 end: 0x000000bfffffff

It's that 2nd to last line (RMRR w/ top bits set) that's causing trouble.

In the others (like in comment #105) it shows only a single iommu:

DMAR:Host address width 36
DMAR:DRHD base: 0x00000000000000 flags: 0x1

Of course, now that I look closer...there's only a single iommu w/ PT enabled (the first of the three).  And the way we do PT setup, we'd notice the other 2 don't support it, and disable.  So, I'd expect that you'd still hit the RMRR issue.
Comment 107 Robinson Maureira 2009-11-24 22:04:45 EST
The good thing is, you're right, it has VT-d disabled on that, since I didn't power cycle after enabling it on BIOS.

The bad thing is, VT-d enabled + iommu=pt results in a panic really early on the boot.

I'm hand copying the panic here:

Pid: 1. comm: swapper Tainted: G      D     2.6.31.6-148.fc12.x86_64 #1
Call Trace:
[<ffffffff8141896c>] panic+0x7a/0x12c
[<ffffffff8105b51c>] ? exit_ptrace+0x38/0x121
[<ffffffff81054bfe>] do_exit+0x7b/0x6cb
[<ffffffff8141bbc5>] oops_end+0xba/0xc2
[<ffffffff8101551c>] die+0x5a/0x63
[<ffffffff8141b5e9>] do_trap+0x115/0x124
[<ffffffff8101386d>] do_invalid_op+0x9c/0xa5
[<ffffffff81222cee>] ? dma_pte_clear_range+0x30/0xee
[<ffffffff8106b578>] ? up+0x39/0x3e
[<ffffffff81012b3b>] invalid_op+0x1b/0x2
[<ffffffff81222cee>] ? dma_pte_clear_range+0x30/0xee
[<ffffffff81225486>] iommu_domain_identity_map+0x80/0xbf
[<ffffffff81226013>] iommu_prepare_identity_map+0xfc/0x12c
[<ffffffff8173d673>] init_dmars+0x543/0x6f1
[<ffffffff8173da8c>] intel_iommu_init+0x26b/0x32a
[<ffffffff8171bb41>] ? pci_iommu_init+0x0/0x21
[<ffffffff8171bb4f>] pci_iommu_init+0xe/0x21
[<ffffffff8100a069>] do_one_initcall+0x5e/0x162
[<ffffffff81714750>] kernel_init+0x219/0x273
[<ffffffff81012daa>] child_rip+0xa/0x20
[<ffffffff81714537>] ? kernel_init+0x0/0x273
[<ffffffff81012da0>] ? child_rip+0x0/0x20
Comment 108 David Woodhouse 2009-11-25 04:39:15 EST
Argh, would have been nice if the traces were attached as text files.

In your first trace, the 'virt enabled on BIOS no special options on cmdline' one, I don't see anything obviously wrong. There are a _lot_ of complaints about BIOS brokenness with silly RMRRs, but you're using HP so you expect some brokenness. I don't know why modesetting would fail -- we're explicitly exempting graphics devices from using the IOMMU.

In the second (and probably the rest), it's the interrupt remapping initialisation which is trying to ioremap the bogus IOMMU that your broken BIOS tells us about. It doesn't check the flag that we set when we generated the warning earlier (and in fact that flag might not even be compiled in, if CONFIG_INTR_REMAP && !CONFIG_DMAR.

The interrupt remapping thing is ugly, but I don't think it causes any problems.
What you have here is an inteldrmfb problem -- it's just not working even when you have the iommu disabled and only 2GiB of RAM enabled. Please file that as a separate bug.

Regarding comment #107 -- that's probably some HP brokenness that we're not completely working around. The last line _before_ that panic is the one we're really interested in -- probably the line _before_ the kernel stupidly said "cut here". It'll say something along the lines of

IOMMU: Setting identity map for device 0000:00:1d.0 [0xf000674ef000fa11 - 0xf000e987f000fea9]

If they're really putting random garbage into the ACPI tables, then they've probably found something that will crash us. I'm not entirely sure we check for 
the end being _before_ the start, for example. We'll need to make that bit more robust in the face of malicious or incompetent input.

And they tell me I'm not allowed to promote an attitude of violence...
Comment 109 David Woodhouse 2009-11-25 05:45:42 EST
Robinson, the kernel building at http://koji.fedoraproject.org/koji/taskinfo?taskID=1829957 should fix the issues with interrupt remapping, and should hopefully fix your crash when the BIOS gives an RMRR which ends before it starts (I'm guessing that's what happens).

It should make no difference to the modesetting thing, which I believe is a different bug.
Comment 110 David Woodhouse 2009-11-25 06:42:10 EST
Robinson, I think your graphics issue is probably bug #540218. The nice UPS man brought me an HP6930p this morning, and that's what I see on it.
Comment 111 Robinson Maureira 2009-11-25 07:32:12 EST
Thanks David, I'll wait for the koji build and report, and for the other bug, I'm posting a "me too" in that bug to avoid getting off topic.
Comment 112 Robinson Maureira 2009-11-25 11:29:50 EST
OK, booted with the last build, everything seems to be working ok, I'm attaching the new dmesg, now the only messages are about broken BIOS.
Comment 113 Robinson Maureira 2009-11-25 11:31:32 EST
Created attachment 373776 [details]
virt enabled on bios + nomodeset
Comment 114 Chris Wright 2009-11-25 12:33:21 EST
Heh, there's still something odd going on w/ your BIOS (surprise).  Comparing comment #113 to comment #98:

-DMAR:RMRR base: 0xf000674ef000fa11 end: 0xf000e987f000fea8
+DMAR:RMRR base: 0xffffffffffffffff end: 0xffefffffffffffff

So the BIOS can't decide what to put in the DMAR table when disabling VT-d and re-enabling it.
Comment 115 Robinson Maureira 2009-11-25 12:51:25 EST
Created attachment 373796 [details]
virt enabled on BIOS and nomodeset (HP Quicklook 2 disabled)

Take a look at that one, did a reboot, went to BIOS and disabled "HP Quicklook 2", then save and reboot.

No powercycle.
Comment 116 Chris Wright 2009-11-25 14:20:06 EST
OK, so multiple things were in flux here.  Thanks for followup.
Comment 117 David Woodhouse 2009-11-25 14:27:39 EST
Robinson, you're still attaching your kernel logs as binary files. Please remember to mark them as text/plain when attaching them.
Comment 118 Robinson Maureira 2009-11-25 14:52:57 EST
Oops, sorry... there I've fixed the last two...
Comment 119 Robinson Maureira 2009-11-26 08:56:48 EST
Now with -149 it doesn't resume, also, after power cycling X session detects 2 monitors and expands my desktop, if I go to System -> Preferences -> Display, it gets normal just by opening it.

I'll try -151 to see if it keeps happening...
Comment 120 David Woodhouse 2009-11-26 09:43:51 EST
It should hopefully resume with -151. That's bug #536675, I believe.
Comment 121 Robinson Maureira 2009-11-27 14:23:34 EST
Yep, -151 and -152 works OK.
Comment 122 Fedora Update System 2009-11-30 23:42:55 EST
kernel-2.6.31.6-145.fc12 has been pushed to the Fedora 12 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.