Description of problem: When "VT for Direct I/O" is enabled in the BIOS, kernel-2.6.31-33.fc12.x86_64 boot fail.However if add intel_iommu=off booting option, kernel works fine. And it work ok in kernel-2.6.31-33.fc12.i686. Fail message: usb 8-2:new low speed USB device using uhci_hcd and address 7 nommu_map_single: overflow 11f554be0+8 of deveice mask ffffffff nommu_map_single: overflow 11f5fe908+64 of deveice mask ffffffff ... usb 8-2: device descriptor read/64, error -32 hub 8-0:1.0: unable to enumerate USB device on port 2 No root device found Boot has failed, sleeping forever Version-Release number of selected component (if applicable): kernel-2.6.31-33.fc12.x86_64 rawhide How reproducible: 100% Steps to Reproduce: 1.enable VT for Direct I/O in BIOS 2.boot F12 rawhide x86_64 with intel_iommu=on option 3.booting failed Actual results: Expected results: Additional info:
Dell Optiplex 760 HW http://www.smolts.org/client/show/pub_f9572217-d67c-4ee8-954b-b4350273184e
Can you capture the entire log from the failed boot?
Created attachment 362450 [details] boot error log when enable VT-d and intel_iommu=on
Your log is incomplete, but I suspect you're hitting a Dell BIOS bug (see the associated kernel.org bug). The BIOS, which they obviously shipped without any form of QA whatsoever, puts entirely bogus addresses into its DMAR tables, which are supposed to tell the kernel where the IOMMU hardware can be found. The kernel now detects this error, and aborts the IOMMU setup. However, instead of falling back to using swiotlb, you end up using the nommu code. We need to fix that fallback code, and ensure that swiotlb gets properly initialised when needed. Someone was working on that, IIRC, but it all went quiet. I'll chase (and maybe just do it myself). You _might_ find that it's triggered by having VT-d disabled in the BIOS, and if you _enable_ it in the BIOS, the problem goes away? Please let me know if that's the case.
Jibing, can you answer the questions and re-sumit logs mentioned in comment #4 when you have a chance?
I have the same problem on a Sun Ultra 27. It uses an AMI bios. What logs would be helpful?
The part of dmesg where it says 'your BIOS is broken'.
Created attachment 364412 [details] Full error boot log on kernel-2.6.31.1-56.fc12.x86_64
That isn't a full boot log, and doesn't contain the bit I asked for. If you add 'mem=2M' to the kernel command line, you may find it boots all the way and then you can get the _full_ output from dmesg.
er, 'mem=2G' even. I suspect that 'mem=2M' won't work either :)
Created attachment 364541 [details] _full_ boot log when vt-d enable in BIOS and intel_iommu=1 in kernel-2.6.31.1-56.fc12.x86_64 Without vt-d enabled in BIOS, kernel works fine.Booting failure just happen when enable vt-d.The physical memory size is 4GB.If not add mem boot option, the system hang up and output the messages I append before.
Here's the relevant part of your dmesg again: ------------[ cut here ]------------ WARNING: at drivers/pci/dmar.c:642 alloc_iommu+0x12c/0x286() (Not tainted) Hardware name: OptiPlex 760 Your BIOS is broken; DMAR reported at address fedc1000 returns all ones! BIOS vendor: Dell Inc.; Ver: A03; Product Version: As it clearly says, your BIOS is broken. Ask Dell for a new BIOS written by somebody sober. Maybe ask them if they can manage to do some QA on it this time. Also, check whether VT-d is enabled in the BIOS. If it's disabled, try turning it on. We should definitely make the kernel fall back to swiotlb when this happens rather than falling back to the noiommu code. But that won't actually give you functional VT-d support on broken machines like this; that'll just give you the same behaviour you can achieve with 'iommu=off'.
Same here. I expect better from Sun Microsystems, especially as the Ultra 27 is billeted as a workstation.
Okay, if we're saying this is just a broken BIOS, I'm removing it from F12VirtBlocker
Nah, put it back. We should cope better; we should invoke swiotlb when aborting the VT-d setup, rather than falling back to nommu. And the machine shouldn't crash. I'm not going to get to that before the kernel summit. Do you want to take a look? I think it involves calling the swiotlb early init code (which allocates a big contiguous chunk of memory) even if we think we have a real IOMMU, then freeing it again if we actually _do_ use the real IOMMU.
Okay, added back to F12VirtBlocker cdub, ddd: could one of you take a look? Dave probably won't get to it for F12 GA
*** Bug 528545 has been marked as a duplicate of this bug. ***
Created attachment 365090 [details] attempt at patch This is a completely untested first attempt at a patch. I'm about to leave for the kernel summit, so may not be able to test this properly for a week or so.
I'll do some testing. I had just finished a patch as well (had considered doing the same as you, although I tried to have a alloc/free to ensure the large order allocation would succeed...doesn't actually work). Yours is simpler.
I figured that since this only happens on boxes with >4GiB RAM, the odds are fairly good that we should be able to allocate a 64MiB chunk of memory even if we don't do it early. We're still before all the device drivers and file system stuff, anyway. Only if that theory is disproved will I bother to try the 'allocate it early in all cases, then free it if we don't want it' complexity. Do you agree with my choices for the 'gfp' variable in the late_init routine? I thought about passing it in from the caller, but figured it was saner to do it internally.
(In reply to comment #20) > I figured that since this only happens on boxes with >4GiB RAM, the odds are > fairly good that we should be able to allocate a 64MiB chunk of memory even if > we don't do it early. We're still before all the device drivers and file system > stuff, anyway. I'm only able to get 4M (both on a 4G box as well as a 12G box). > Only if that theory is disproved will I bother to try the 'allocate it early in > all cases, then free it if we don't want it' complexity. I have that patch, only trouble is freeing...The bootmem allocator is cleaned up and gone, so doing a free_bootmem hits a BUG(). > Do you agree with my choices for the 'gfp' variable in the late_init routine? I > thought about passing it in from the caller, but figured it was saner to do it > internally. That seemed sane to me, except I didn't fully grok the GFP_KERNEL case for !64 bit.
Created attachment 365248 [details] v2, compiles Tested this one a bit. As expected, it's hard to get an order 14 allocation this late. Both boxes I tested on could only come up with order 10: Warning: only able to allocate 4 MB for software IO TLB And was easy to trigger overflows (I cheated and forced overuse of swiotlb): DMA: Out of SW-IOMMU space for 65536 bytes at device 0000:00:1f.2 DMA: Out of SW-IOMMU space for 65536 bytes at device 0000:00:1f.2 DMA: Out of SW-IOMMU space for 65536 bytes at device 0000:00:1f.2 Left one box limping, and other couldn't start X. Tested by forcing an error condition and forcing use of swiotlb with this extra patch (swiotlb=force on cmdline, so putting unusual stress on swiotlb): arch/x86/kernel/pci-swiotlb.c - if (swiotlb_force) - swiotlb = 1; drivers/pci/dmar.c - if (iommu->cap == (uint64_t)-1 && iommu->ecap == (uint64_t)-1) { + if (1) {
Created attachment 365249 [details] setup swiotlb, free if hw iommu succeeds This is just for sake of completeness. This patch doesn't actually work. It effectively always allocates the swiotlb, and only uses it when needed (no hw iommu, or hw iommu is hidden behind broken BIOS). When not needed it frees, which is the bit that doesn't work.
Err, ddd just reminded me that buddy allocator has a MAX_ORDER 11 on x86, so we'll never get more than order 10 allocation.
Running out of time here folks. Is this something we'd slip the release for?
I don't think it's that important to slip the release. We're discussing the fallback mode when the BIOS is broken and we have to abort IOMMU setup late in the boot, _and_ we have >4GiB of RAM. If the user boots with 'iommu=off' when they're on a broken machine such as this, the problem will go away. Perhaps a short-term answer would be to panic() if this happens and there's >4GiB RAM? At least the user sees the actual problem then.
I agree re: slippage. Could have less that 4G if memory is pushed above 4G to make room for pci hole, etc. I'll attach a patch that does the fallback in a moment.
My laptop appears to have this problem, but I'm able to boot, perhaps because I have 4G of memory and not more. If I boot without intel_iommu=off USB doesn't work, but it does with intel_iommu=off. There's a BIOS update but I wanted to wait until returning from the Fedora Talk FAD to upgrade the BIOS just in case I bricked the machine. Here's the smolt profile: http://www.smolts.org/client/show/pub_d31ee593-04e1-42b4-8e4a-79ccaef15d5c
Moving to virt target then.
Created attachment 365798 [details] iommu: allow fallback to swiotlb upon hw iommu initialization failure This should allow the swiotlb to be used as a fallback. I don't think it will work for AMD IOMMU in passthrough mode. And the new interface for freeing bootmem pages directly to page allocator is not ideal.
I was also seeing DMAR/IOMMU errors on an Intel DP35DP motherboard, the product page is http://www.intel.com/products/desktop/motherboards/DP35DP/DP35DP-overview.htm . USB wasn't working at all when the kernel reported those errors. A BIOS update solved the problem. I only have one log of a problematic boot with the old BIOS, I'll attach it soon. Now that the system works, I obviously won't downgrade the BIOS just for testing ;)
Created attachment 365879 [details] Boot log with an old BIOS on an Intel DP35DP
Created attachment 366066 [details] make v2 patch compile Include <linux/swap.h> in mm/bootmem.c for totalram_pages, otherwise identical to the v2 patch. I added this patch to the kernel-2.6.31.5-96.fc12.x86_64 SRPM and it made USB work again on my HP EliteBook 2530p. I then hacked pci_swiotlb_init to always call swiotlb_free and not set dma_ops to exercise the code that frees the memory allocated in swiotlb_init and it seems to work as well. My boot logs before and after the patch are here: http://scottt.tw/bug/rhbz-524808/
(In reply to comment #29) > Moving to virt target then. If this bug is not fixed by F12 general availability, I recommend documenting the following in the release notes: If you can't boot or USB doesn't work and you see "Your BIOS is broken; DMAR reported at address ..." in dmesg, try passing "intel_iommu=off" or "mem=3G" (if you have 4G of memory or more) on the kernel command line as a workaround.
This seems to be "fixed" for me when I went into the BIOS and enabled virtualization. I can remove the intel_iommu=off from my kernel command line and everything seems to be working. Upgrading the BIOS didn't help. Here's the smolt link to my system: http://www.smolts.org/client/show/pub_d31ee593-04e1-42b4-8e4a-79ccaef15d5c
With all the kernels after -112 (the previous I had tested was -97) the intel_iommu=off parameter does not work anymore. I tested the 112, 115, 117 and 122 versions. This is what I get: ------------[ cut here ]------------ WARNING: at drivers/pci/dmar.c:183 dmar_table_init+0x161/0x3aa() (Not tainted) Hardware name: HP EliteBook 8530p Your BIOS is broken; DMAR reported at address zero! BIOS vendor: Hewlett-Packard; Ver: 68PDV Ver. F.0E; Product Version: F.0E Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.31.5-122.fc12.x86_64 #1 Call Trace: [<ffffffff81051694>] warn_slowpath_common+0x84/0x9c [<ffffffff81051703>] warn_slowpath_fmt+0x41/0x43 [<ffffffff8173c893>] dmar_table_init+0x161/0x3aa [<ffffffff81722ff6>] enable_IR_x2apic+0x12/0x1ce [<ffffffff81720a2f>] native_smp_prepare_cpus+0x12d/0x361 [<ffffffff8105496c>] ? do_wait+0x299/0x2d7 [<ffffffff817145bb>] kernel_init+0x84/0x273 [<ffffffff81012daa>] child_rip+0xa/0x20 [<ffffffff81714537>] ? kernel_init+0x0/0x273 [<ffffffff81012da0>] ? child_rip+0x0/0x20 ---[ end trace a7919e7f17c0a725 ]--- Following Jeffrey's tip I enabled virtualization in the BIOS and now it works without any parameter passed to the kernel. Thanks Jeff. :-)
Adding "iommu=soft" to the boot options should work to disable the Intel IOMMU.
*** Bug 522668 has been marked as a duplicate of this bug. ***
*** Bug 532582 has been marked as a duplicate of this bug. ***
*** Bug 530455 has been marked as a duplicate of this bug. ***
*** Bug 530340 has been marked as a duplicate of this bug. ***
to bring across the summary I wrote for 490477 (incorrectly): so, here's the scoop on this. it breaks USB functionality entirely, and in one case at least completely stops the system booting, on several motherboards. (It's actually really caused by buggy BIOSes, but we can't sell that to the users). We have enough information to say definitely that 8 people have hit this bug - that is, there are reports or comments on reports from 8 people which definitely inidcate they hit this precise problem. If we ship with this, we will have people with unusable systems. The workaround is simple: iommu=soft kernel parameter (or, disabling VT-d in the BIOS, or limit RAM to 4GB). But it relies on them finding the documentation. If we disable it by default, the impact is that it breaks PCI passthrough for KVMs. Kyle is almost positive it can't possibly break anything else. The converse workaround would be possible for any virt users who need that to work: intel_iommu=1 . David believes the patches above would fix this, but it's quite late to be making actual code changes. That's the state of play on this issue at the moment. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
Dan Beard is definitely hitting this issue, so that makes 9 confirmed sufferers. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
clarification (as others will probably make the same mistake I did): when David says this happens only with RAM 'above 4GiB', he means above 4GiB in address space. This means it will likely affect any system with 3GiB or more of actual physical RAM - certainly systems with exactly 4GiB, and down to 2.5GiB in some cases. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
Created attachment 367983 [details] Much simpler fix This patch is much simpler than the ones which deal swiotlb memory allocation. This particular problem is actually detectable _early_ -- before we even claim to have detected an IOMMU. So check for it then (by cutting and pasting existing code), and just pretend not to have an IOMMU if this BIOS bug is detected. So swiotlb is set up as normal.
I thought this bug only happens when VT-d is /disabled/? Therefor /enabling/ VT-d would work around the issue. Do I remember wrong?
More data: we can get a handle on affected system models from kerneloops. http://www.kerneloops.org/search.php?search=dmar_table_init&btnG=Function+Search any system which hits an incarnation of the "DMAR reported at address zero!' traceback, on which VT-d is disabled in the BIOS and which has enough RAM, will hit this bug (confirmed by David). We suspect these systems are shipping with VT-d disabled by default in the BIOS, or else that number of people would not hit the bug (it's unlikely that many people have gone into their BIOS and manually disabled virtualization). It's safe to assume that any time the kernel oops has been reported to kerneloops.org, the system in question would hit this bug when running F12 as long as it had enough RAM. There are hundreds of occurrences of this oops at kerneloops.org, so that is a worrying indicator. My belief is that a lot of HP and several Acer models have the broken BIOSes, and they were shipped with VT-d disabled by default in their BIOSes. I'm going to check now how likely they are to have had enough RAM installed from the factory to hit this problem, but right now I suspect there's a lot of potential sufferers of this bug out there. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
Jesse: yes, that is correct, sorry for getting it wrong in the summary. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
Code fix from comment #46 is in 2.6.31.5-127.fc12, building at http://koji.fedoraproject.org/koji/taskinfo?taskID=1794534
hello. i can confirm that i get this dmar notification every time i boot up. here is my initial bug report https://bugzilla.redhat.com/show_bug.cgi?id=532582 which was merged with this one. i have an hp pavilion dv7 and have always kept my bios updated. last week i tried the three most recent bioses and they all had this issue so i didn't try any that were older than that. also i checked my bios configuration and couldn't find an option for VT-d. please let me know if there is anything i can do to help you guys troubleshoot or test out fixes.
Jeremy, thanks. The kernel at http://koji.fedoraproject.org/koji/taskinfo?taskID=1794537 ought to fix the problem. If you could confirm that, it would be much appreciated; thanks.
kernel-2.6.31.5-127.fc12.x86_64 from: http://koji.fedoraproject.org/koji/taskinfo?taskID=1794537 does fix the "USB not working" problem for me. I installed with "rpm -i --nodeps" to workaround the kernel-firmware version dependency.
Two testers on #fedora-kernel (jcollie, jcasale) have also confirmed that the -127 kernel fixes the problem.
they also confirmed the 126 kernel, david ;) I have confirmed that 126 and 127 boot and work as normal on a system unaffected by the bug. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
My vote would be to re-spin with this fix tonight for Liam to test tomorrow, if Jesse's around and able to do it this evening. Otherwise we can leave it till Monday and discuss then - because we'd either have to go ahead and ship without the fix, or slip. from a strict qa policy point of view i'd have to say we'd prefer the 126 fix to the 127 for GA, but honestly I don't feel bad about 127 either. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
oh, and if we do respin, I would be in favour of taking https://bugzilla.redhat.com/show_bug.cgi?id=533553 too - that is, drop PackageKit-command-not-found from any of the default install sets. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
Yup, both -127 and -126 fix the problem for me.
127 fixes it for me. i didn't try 126. thanks.
Hi guys, sorry for going off topic, but this is a great place for this question. I believe it would be very useful to have a list of desktop motherboards which have VT-d (not VT) option in BIOS. From the information that is available online, it's easy to find out that only Q35 and Q45 have requirement to support VT-d, while it's optional for other desktop chipsets (P35, P45 etc.), but it's very hard to find that out for a particular board since it's possible that it has been added in later BIOS revisions and isn't listed in board manual. Please, mail me name of board manufacturer and exact model (revision, if there is more than one), and note whether your board has VT-d option in BIOS. I'm also interested in laptops if you have one. Once I have enough information, I will create a wiki page about it. Thank you.
please, please don't go off-topic, this bug is absolutely vital to F12 release and it would be very easy to lose the thread if people start posting completely unrelated information. Anyone who would like to answer Vedran's question, PLEASE do it by private email and not in this bug. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
Fix is included in Fedora 12 RC4. Thanks everyone.
127 fix it for me too, usb stack is working OK.
as per #533952, david believes the fix for this in 127 didn't actually work. hence we are confused as to how multiple testers reported 127 was OK. can you guys possibly double-check and confirm that 122 _really_ fails and 127 _really_ works on your systems? we're just confused as to how this could possibly have happened. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
Thought this was worth adding even though status is closed. Downloaded livecd x86_64 with kernel 2.6.31.5-127 last evening (AU DS Time) and installed to hdd. Been trying to figure out usb problem all day till i found this bug and others. Both VTx and VTd are enabled in bios and have to use 'iommu=soft' to enable usb support. Startup time is much faster. Have not tested disabling VTd yet. Regardless of above settings i get a message in ABRT each boot about a kernel crash with the dmar 'broken bios' message. Machine is a HP xw4600 with 4G ram. Bios was just flashed to 1.17 - latest available for download from hp.
ian: see https://bugzilla.redhat.com/show_bug.cgi?id=533952 . it appears that the fix we put in to fix this problem for people with the affected systems and *more than* around 2.5GB of RAM causes it to break for people with the affected systems and *less than* around 2.5GB of RAM. le sigh. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
whoops, I really should read to the end of messages. can you check with kernel 137 from 533952 and see how that behaves, though, even so? -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
ian: also, can we get the _exact_ kernel message you see? Thanks. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
Thanks Adam. Let me know if should add to 533952 instead. This is the portion of dmesg that i see in abrt every time i boot: ------------[ cut here ]------------ WARNING: at drivers/pci/dmar.c:668 alloc_iommu+0x11d/0x253() (Not tainted) Hardware name: HP xw4600 Workstation Your BIOS is broken; DMAR reported at address fed90000 returns all ones! BIOS vendor: Hewlett-Packard; Ver: 786F3 v01.17; Product Version: Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.31.5-127.fc12.x86_64 #1 Call Trace: [<ffffffff81051694>] warn_slowpath_common+0x84/0x9c [<ffffffff81051703>] warn_slowpath_fmt+0x41/0x43 [<ffffffff81222206>] alloc_iommu+0x11d/0x253 [<ffffffff8173c8ea>] ? dmar_table_init+0x10d/0x34d [<ffffffff8173c960>] dmar_table_init+0x183/0x34d [<ffffffff81722ff6>] enable_IR_x2apic+0x12/0x1ce [<ffffffff81720a2f>] native_smp_prepare_cpus+0x12d/0x361 [<ffffffff817145bb>] kernel_init+0x84/0x273 [<ffffffff81012daa>] child_rip+0xa/0x20 [<ffffffff81714537>] ? kernel_init+0x0/0x273 [<ffffffff81012da0>] ? child_rip+0x0/0x20 ---[ end trace a7919e7f17c0a725 ]--- I will shortly attach 2 full dmesg files from boot with and without iommu=soft option. I have also downloaded kernel 137 and will do some testing. I'm pretty new at this but will see what i can do :)
Created attachment 372375 [details] full dmesg from boot with iommu=soft option
ian: yours will likely end up being a completely new report, since it looks somewhat different from any of the currently-known cases. David, do you agree? I think 'DMAR reported at address fed90000 returns all ones!' indicates something different, right? -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
Created attachment 372378 [details] full dmesg from boot without iommu=soft option booting stock kernel 2.6.31.5-127.fc12.x86_64 installed to hdd from live cd 100M /boot on raid1 / on raid5 lv
I have the exact same problem with F12 on a Dell Precision M6400. Including the "bad bios" warning. This worked under F11. USB is lost and non-operational.
Peter: please check if your bug is bug #533952. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
(In reply to comment #71) > ian: yours will likely end up being a completely new report, since it looks > somewhat different from any of the currently-known cases. David, do you agree? > I think 'DMAR reported at address fed90000 returns all ones!' indicates > something different, right? Different yes, but variation on a theme. The BIOS is broken. The simpler patch that David provided will not catch this one, unfortunately. Choices are to add more hackery, or backport the swiotlb fallback patches.
Created attachment 372397 [details] this should catch the all 1's problem and still allow swiotlb fallback This adds to the current approach where we try to detect buggy BIOS during VT-d detection rather than VT-d initialization. It's a little hacky, and I didn't remove the identical check that will happen due to interrupt remapping init. Ian, if you're up for compiling a kernel, this should give you a functioning system (w/out needing to add iommu=soft to kernel cmdline).
thanks, Chris. is it too much to ask for you and David to get together and do a kernel build which just solves all the iommu issues so Jesse and I can put down our whisky and stop crying? :) thanks :) -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
(In reply to comment #74) > Peter: please check if your bug is bug #533952. I'm not sure I'm able to see the difference between the two bug# at this point. It looks like both are a hit to me? I have a workaround - seen here and in the above bug: iommu=soft. This takes out the error and USB gets initialized.
Ian: what you report in comment #69 is a different bug. It seems that HP managed to find a new way to screw up. Yippee! In this one, they report an IOMMU at a bogus location but a _believable_ bogus location rather than at zero. So we don't actually find out about it until we ioremap it and start trying to talk to it. Thanks, HP. Another stunning display of quality control. We had seen that failure mode before (Lenovo) but there it was only ever a one-off when you first enabled VT-d in the BIOS, and it all worked properly after a power cycle. Chris's patch seems like a reasonable way to address it in the short term; thanks.
(In reply to comment #78) > (In reply to comment #74) > > Peter: please check if your bug is bug #533952. > > I'm not sure I'm able to see the difference between the two bug# at this point. > It looks like both are a hit to me? I have a workaround - seen here and in the > above bug: iommu=soft. This takes out the error and USB gets initialized. Peter, please show the precise error. Is it that your BIOS is broken and it reports a DMAR at address zero, or that it reports a DMAR which returns all ones, as Ian reported?
(In reply to comment #80) > (In reply to comment #78) > > (In reply to comment #74) > > > Peter: please check if your bug is bug #533952. > > > > I'm not sure I'm able to see the difference between the two bug# at this point. > > It looks like both are a hit to me? I have a workaround - seen here and in the > > above bug: iommu=soft. This takes out the error and USB gets initialized. > > Peter, please show the precise error. Is it that your BIOS is broken and it > reports a DMAR at address zero, or that it reports a DMAR which returns all > ones, as Ian reported? Ok - the oops isn't shown anymore on the boot console, but it IS happening. My USB is operational with the above work-around, but I'm still getting the same oops (and I think it is preventing nvidia etc. from loading drivers - the box dies when nouveau tries to load). DMAR:Host address width 36 DMAR:DRHD base: 0x000000fed10000 flags: 0x0 ------------[ cut here ]------------ WARNING: at drivers/pci/dmar.c:668 alloc_iommu+0x11d/0x253() (Not tainted) Hardware name: Precision M6400 Your BIOS is broken; DMAR reported at address fed10000 returns all ones! BIOS vendor: Dell Inc.; Ver: A07; Product Version: Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.31.5-127.fc12.x86_64 #1 Call Trace: [<ffffffff81051694>] warn_slowpath_common+0x84/0x9c [<ffffffff81051703>] warn_slowpath_fmt+0x41/0x43 [<ffffffff81222206>] alloc_iommu+0x11d/0x253 [<ffffffff8173c8ea>] ? dmar_table_init+0x10d/0x34d [<ffffffff8173c960>] dmar_table_init+0x183/0x34d [<ffffffff81722ff6>] enable_IR_x2apic+0x12/0x1ce [<ffffffff81720a2f>] native_smp_prepare_cpus+0x12d/0x361 [<ffffffff817145bb>] kernel_init+0x84/0x273 [<ffffffff81012daa>] child_rip+0xa/0x20 [<ffffffff81714537>] ? kernel_init+0x0/0x273 [<ffffffff81012da0>] ? child_rip+0x0/0x20 ---[ end trace a7919e7f17c0a725 ]--- DMAR:parse DMAR table failure.
peter, your problem looks rather like Ian's, then. David, looks like a Dell with the same epic fial. "Precision M6400". right? btw, peter, it's normal that you see the 'oops' even with the workaround, since it's not really an oops but more an informational message to let you know your manufacturer's BIOS engineers are complete idiots. general message to all: if you ever find out that there is a BIOS Engineers' Convention occurring, please try and prevent David from discovering the location by absolutely any means necessary. That could only end very, very, badly. =) -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
(In reply to comment #82) > peter, your problem looks rather like Ian's, then. David, looks like a Dell > with the same epic fial. "Precision M6400". right? yeah - I checked earlier today if there were any bios updates. Strange - same "version" but with a later date than when I downloaded A07. I missed the little detail about 1s vs 0s .... the two bugs looked very similar to me. Thanks for pointing that out to me.
(In reply to comment #82) > btw, peter, it's normal that you see the 'oops' even with the workaround, since > it's not really an oops but more an informational message to let you know your > manufacturer's BIOS engineers are complete idiots. Btw. thanks for the fedoraproject link. The PROBLEM is, that I am on 64bit already. And I HAVE the virtualization option turned on; this was my setup in F11, and I did an F12 upgrade and now have the problem. It looks like this problem is connected to my inability to load nvidia drivers (they crash or just freeze the box) so I was seeking a way to remove the issue before filing nvidia/nouveau bugs. I guess I'll just have to go in that direction anyway.
(In reply to comment #77) > is it too much to ask for you and David to get together and do a kernel build > which just solves all the iommu issues so Jesse and I can put down our whisky > and stop crying? :) thanks :) Bit much since I'd be hard pressed to ask you and Jesse to put down your whisky ;) Variant of fix in comment #76 is being built now: http://koji.fedoraproject.org/koji/buildinfo?buildID=142281 Peter, Ian, can you test this kernel please?
(In reply to comment #85) > Variant of fix in comment #76 is being built now: > > http://koji.fedoraproject.org/koji/buildinfo?buildID=142281 > > Peter, Ian, can you test this kernel please? It's been a while since I played with "custom" kernels. I get a dependency error when I try to install the above kernel: kernel-2.6.31.6-144.fc12.x86_64 from /kernel-2.6.31.6-144.fc12.x86_64 has depsolving problems --> Missing Dependency: kernel-firmware >= 2.6.31.6-144.fc12 is needed by package kernel-2.6.31.6-144.fc12.x86_64 (/kernel-2.6.31.6-144.fc12.x86_64) The site doesn't list any kernel-firmware packages?
Peter, The kernel-firmware package is noarch, but it is present: http://kojipkgs.fedoraproject.org/packages/kernel/2.6.31.6/144.fc12/noarch/kernel-firmware-2.6.31.6-144.fc12.noarch.rpm
Created attachment 372692 [details] dmesg from M6400 with kernel-2.6.31.6-144.fc12.x86_64 Calgary: detecting Calgary via BIOS EBDA area Calgary: Unable to locate Rio Grande table in EBDA - bailing! ------------[ cut here ]------------ WARNING: at drivers/pci/dmar.c:615 check_zero_address+0x14f/0x191() (Not tainted) Hardware name: Precision M6400 Your BIOS is broken; DMAR reported at address fed10000 returns all ones! BIOS vendor: Dell Inc.; Ver: A07; Product Version: Modules linked in: Pid: 0, comm: swapper Not tainted 2.6.31.6-144.fc12.x86_64 #1 Call Trace: [<ffffffff810516f4>] warn_slowpath_common+0x84/0x9c [<ffffffff81051763>] warn_slowpath_fmt+0x41/0x43 [<ffffffff8173c82b>] check_zero_address+0x14f/0x191 [<ffffffff8125f9c9>] ? acpi_tb_verify_table+0x57/0x5c [<ffffffff8125f027>] ? acpi_get_table_with_size+0x5a/0xb4 [<ffffffff81420ec3>] ? _etext+0x0/0x1 [<ffffffff8173c87f>] detect_intel_iommu+0x12/0x8c [<ffffffff8171bbc0>] pci_iommu_alloc+0x5e/0x6c [<ffffffff8172b13e>] mem_init+0x19/0x161 [<ffffffff81714be5>] start_kernel+0x20b/0x3fa [<ffffffff817142a1>] x86_64_start_reservations+0xac/0xb0 [<ffffffff8171439d>] x86_64_start_kernel+0xf8/0x107 ---[ end trace a7919e7f17c0a725 ]---
(In reply to comment #87) > Peter, The kernel-firmware package is noarch, but it is present: > > http://kojipkgs.fedoraproject.org/packages/kernel/2.6.31.6/144.fc12/noarch/kernel-firmware-2.6.31.6-144.fc12.noarch.rpm Thanks Justin - I'm quite a bit rusty when it comes to the "new" way of doing kernel setups. Last I played around with them, we barely had kernel modules around. I uploaded dmesg from my attempt - since nvidia wasn't supported by that version, I had a serious crash; I still get the same DMAR message though. Would it help to use a debug version of the kernel?
The DMAR message is expected -- your BIOS is still broken, and HP haven't yet provided you with a fixed upgrade. The point is that now, the kernel will print the message and work properly. You do get the message _twice_ in one boot, which is slightly suboptimal -- but that's only really cosmetic. Everything _is_ working fine.
peter: notice the 'in most cases' weasel-words in the common bugs. you and ian are the exception to this, since you have the slightly different bug ('all ones!'). the stuff about it only applying to 32-bit and low RAM and virt disabled and yadda yadda applies to the _other_ bug. you're speeeeeecial :) -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
Just back from weekend. Tested 137 and 144 without iommu=soft. USB did not work in 137 but appears to be working in 144. I still get dmar message re broken bios but not too concerned about that at present as it does not appear to be affecting anything else. Thanks for the assistance :)
kernel-2.6.31.6-145.fc12 has been submitted as an update for Fedora 12. http://admin.fedoraproject.org/updates/kernel-2.6.31.6-145.fc12
ian: the message still appearing is intentional; it's an informational message that your BIOS is broken. Your BIOS is still broken, so the message still appears. =) -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
Hi all, I'm using 2.6.31.6-148.fc12.x86_64, and still get the DMAR bug and can't even get to the GDM login screen. I've tried various combinations with no luck so far... I have dmesg from the following combinations virt-disabled on BIOS virt-disabled on BIOS + mem=2G nomodeset virt-disabled on BIOS + iommu=soft virt-disabled on BIOS + mem=2G virt-disabled on BIOS + iommu=off virt-enabled on BIOS Want me to post them? I'm getting two traces ------------[ cut here ]------------ WARNING: at drivers/pci/dmar.c:596 check_zero_address+0x96/0x191() (Not tainted) Hardware name: HP Compaq 6730b (FS242LA#ABM) Your BIOS is broken; DMAR reported at address zero! BIOS vendor: Hewlett-Packard; Ver: 68PDD Ver. F.11; Product Version: F.11 Modules linked in: Pid: 0, comm: swapper Not tainted 2.6.31.6-148.fc12.x86_64 #1 Call Trace: [<ffffffff810516f4>] warn_slowpath_common+0x84/0x9c [<ffffffff81420ec3>] ? _etext+0x0/0x1 [<ffffffff81051763>] warn_slowpath_fmt+0x41/0x43 [<ffffffff8173c772>] check_zero_address+0x96/0x191 [<ffffffff8125f9c9>] ? acpi_tb_verify_table+0x57/0x5c [<ffffffff8125f027>] ? acpi_get_table_with_size+0x5a/0xb4 [<ffffffff81420ec3>] ? _etext+0x0/0x1 [<ffffffff8173c87f>] detect_intel_iommu+0x12/0x8c [<ffffffff8171bbc0>] pci_iommu_alloc+0x5e/0x6c [<ffffffff8172b13e>] mem_init+0x19/0x161 [<ffffffff81714be5>] start_kernel+0x20b/0x3fa [<ffffffff817142a1>] x86_64_start_reservations+0xac/0xb0 [<ffffffff8171439d>] x86_64_start_kernel+0xf8/0x107 ---[ end trace a7919e7f17c0a725 ]--- and DMAR:Host address width 36 DMAR:DRHD base: 0x00000000000000 flags: 0x1 ------------[ cut here ]------------ WARNING: at arch/x86/mm/ioremap.c:219 __ioremap_caller+0x145/0x30e() (Tainted: G W ) Hardware name: HP Compaq 6730b (FS242LA#ABM) Modules linked in: Pid: 1, comm: swapper Tainted: G W 2.6.31.6-148.fc12.x86_64 #1 Call Trace: [<ffffffff810516f4>] warn_slowpath_common+0x84/0x9c [<ffffffff81051720>] warn_slowpath_null+0x14/0x16 [<ffffffff81034cb3>] __ioremap_caller+0x145/0x30e [<ffffffff810e243e>] ? free_unmap_vmap_area_noflush+0x3c/0x7c [<ffffffff812223d7>] ? alloc_iommu+0x1de/0x253 [<ffffffff81034f5e>] ioremap_nocache+0x17/0x19 [<ffffffff812223d7>] alloc_iommu+0x1de/0x253 [<ffffffff8173ca06>] ? dmar_table_init+0x10d/0x34d [<ffffffff8173ca7c>] dmar_table_init+0x183/0x34d [<ffffffff81722ff6>] enable_IR_x2apic+0x12/0x1ce [<ffffffff81720a2f>] native_smp_prepare_cpus+0x12d/0x361 [<ffffffff817145bb>] kernel_init+0x84/0x273 [<ffffffff81012daa>] child_rip+0xa/0x20 [<ffffffff81714537>] ? kernel_init+0x0/0x273 [<ffffffff81012da0>] ? child_rip+0x0/0x20 ---[ end trace a7919e7f17c0a726 ]--- almost every time. The only way to have a functional system is to add nomodeset to my boot options. BIOS is latest, had the same problems with F.0A, and of course, everything worked on F11 :-)
(In reply to comment #95) > Want me to post them? Yes, please. > ------------[ cut here ]------------ > WARNING: at drivers/pci/dmar.c:596 check_zero_address+0x96/0x191() (Not > tainted) > Hardware name: HP Compaq 6730b (FS242LA#ABM) > Your BIOS is broken; DMAR reported at address zero! > BIOS vendor: Hewlett-Packard; Ver: 68PDD Ver. F.11; Product Version: F.11 This is just a warning telling you the BIOS is broken. You should be falling back to the swiotlb in this case. > DMAR:Host address width 36 > DMAR:DRHD base: 0x00000000000000 flags: 0x1 > ------------[ cut here ]------------ > WARNING: at arch/x86/mm/ioremap.c:219 __ioremap_caller+0x145/0x30e() (Tainted: > G W ) This, however, I was worried about...ugh. > Hardware name: HP Compaq 6730b (FS242LA#ABM) > Modules linked in: > Pid: 1, comm: swapper Tainted: G W 2.6.31.6-148.fc12.x86_64 #1 > Call Trace: > [<ffffffff810516f4>] warn_slowpath_common+0x84/0x9c > [<ffffffff81051720>] warn_slowpath_null+0x14/0x16 > [<ffffffff81034cb3>] __ioremap_caller+0x145/0x30e > [<ffffffff810e243e>] ? free_unmap_vmap_area_noflush+0x3c/0x7c > [<ffffffff812223d7>] ? alloc_iommu+0x1de/0x253 > [<ffffffff81034f5e>] ioremap_nocache+0x17/0x19 > [<ffffffff812223d7>] alloc_iommu+0x1de/0x253 This is from the 2nd ioremap. So, I think we first remapped pfn=0, then read garbage, and the increased the size and the second remap is for multiple pages and triggering a page_is_ram() check. > [<ffffffff8173ca06>] ? dmar_table_init+0x10d/0x34d > [<ffffffff8173ca7c>] dmar_table_init+0x183/0x34d > [<ffffffff81722ff6>] enable_IR_x2apic+0x12/0x1ce > [<ffffffff81720a2f>] native_smp_prepare_cpus+0x12d/0x361 > [<ffffffff817145bb>] kernel_init+0x84/0x273 > [<ffffffff81012daa>] child_rip+0xa/0x20 > [<ffffffff81714537>] ? kernel_init+0x0/0x273 > [<ffffffff81012da0>] ? child_rip+0x0/0x20 > ---[ end trace a7919e7f17c0a726 ]--- You should have a message after this saying: IOMMU: can't map the region
Ok, following a bunch of attachs... About the message, nope, there's no message like that on any dmesg.
Created attachment 373601 [details] virt enabled on BIOS no special options on cmdline
Created attachment 373602 [details] virt disabled on BIOS no special options on cmdline
Created attachment 373603 [details] virt disabled on BIOS and iommu=soft
Created attachment 373604 [details] virt disabled on BIOS and iommu=off
Created attachment 373605 [details] virt disabled on BIOS and mem=2G
Created attachment 373606 [details] virt disabled on BIOS and mem=2G nomodeset This is the only way to get X to work, using nomodeset.
If you want me to test another combination of options, just let me know. My SMOLT profile is at http://www.smolts.org/client/show/pub_484873bb-2045-4025-9f87-c221de679098
Created attachment 373608 [details] virt enabled on BIOS and iommu=pt
Hmm, that one still looks like it has VT-d disabled in the BIOS. According to comment #98, your laptop has 3 iommu's. DMAR:Host address width 36 DMAR:DRHD base: 0x000000feb03000 flags: 0x0 IOMMU feb03000: ver 1:0 cap c9008020e30260 ecap 1000 DMAR:DRHD base: 0x000000feb01000 flags: 0x0 IOMMU feb01000: ver 1:0 cap c0000020630260 ecap 1000 DMAR:DRHD base: 0x000000feb02000 flags: 0x1 IOMMU feb02000: ver 1:0 cap c9008020630260 ecap 1000 DMAR:RMRR base: 0xf000674ef000fa11 end: 0xf000e987f000fea8 DMAR:RMRR base: 0x000000bbc00000 end: 0x000000bfffffff It's that 2nd to last line (RMRR w/ top bits set) that's causing trouble. In the others (like in comment #105) it shows only a single iommu: DMAR:Host address width 36 DMAR:DRHD base: 0x00000000000000 flags: 0x1 Of course, now that I look closer...there's only a single iommu w/ PT enabled (the first of the three). And the way we do PT setup, we'd notice the other 2 don't support it, and disable. So, I'd expect that you'd still hit the RMRR issue.
The good thing is, you're right, it has VT-d disabled on that, since I didn't power cycle after enabling it on BIOS. The bad thing is, VT-d enabled + iommu=pt results in a panic really early on the boot. I'm hand copying the panic here: Pid: 1. comm: swapper Tainted: G D 2.6.31.6-148.fc12.x86_64 #1 Call Trace: [<ffffffff8141896c>] panic+0x7a/0x12c [<ffffffff8105b51c>] ? exit_ptrace+0x38/0x121 [<ffffffff81054bfe>] do_exit+0x7b/0x6cb [<ffffffff8141bbc5>] oops_end+0xba/0xc2 [<ffffffff8101551c>] die+0x5a/0x63 [<ffffffff8141b5e9>] do_trap+0x115/0x124 [<ffffffff8101386d>] do_invalid_op+0x9c/0xa5 [<ffffffff81222cee>] ? dma_pte_clear_range+0x30/0xee [<ffffffff8106b578>] ? up+0x39/0x3e [<ffffffff81012b3b>] invalid_op+0x1b/0x2 [<ffffffff81222cee>] ? dma_pte_clear_range+0x30/0xee [<ffffffff81225486>] iommu_domain_identity_map+0x80/0xbf [<ffffffff81226013>] iommu_prepare_identity_map+0xfc/0x12c [<ffffffff8173d673>] init_dmars+0x543/0x6f1 [<ffffffff8173da8c>] intel_iommu_init+0x26b/0x32a [<ffffffff8171bb41>] ? pci_iommu_init+0x0/0x21 [<ffffffff8171bb4f>] pci_iommu_init+0xe/0x21 [<ffffffff8100a069>] do_one_initcall+0x5e/0x162 [<ffffffff81714750>] kernel_init+0x219/0x273 [<ffffffff81012daa>] child_rip+0xa/0x20 [<ffffffff81714537>] ? kernel_init+0x0/0x273 [<ffffffff81012da0>] ? child_rip+0x0/0x20
Argh, would have been nice if the traces were attached as text files. In your first trace, the 'virt enabled on BIOS no special options on cmdline' one, I don't see anything obviously wrong. There are a _lot_ of complaints about BIOS brokenness with silly RMRRs, but you're using HP so you expect some brokenness. I don't know why modesetting would fail -- we're explicitly exempting graphics devices from using the IOMMU. In the second (and probably the rest), it's the interrupt remapping initialisation which is trying to ioremap the bogus IOMMU that your broken BIOS tells us about. It doesn't check the flag that we set when we generated the warning earlier (and in fact that flag might not even be compiled in, if CONFIG_INTR_REMAP && !CONFIG_DMAR. The interrupt remapping thing is ugly, but I don't think it causes any problems. What you have here is an inteldrmfb problem -- it's just not working even when you have the iommu disabled and only 2GiB of RAM enabled. Please file that as a separate bug. Regarding comment #107 -- that's probably some HP brokenness that we're not completely working around. The last line _before_ that panic is the one we're really interested in -- probably the line _before_ the kernel stupidly said "cut here". It'll say something along the lines of IOMMU: Setting identity map for device 0000:00:1d.0 [0xf000674ef000fa11 - 0xf000e987f000fea9] If they're really putting random garbage into the ACPI tables, then they've probably found something that will crash us. I'm not entirely sure we check for the end being _before_ the start, for example. We'll need to make that bit more robust in the face of malicious or incompetent input. And they tell me I'm not allowed to promote an attitude of violence...
Robinson, the kernel building at http://koji.fedoraproject.org/koji/taskinfo?taskID=1829957 should fix the issues with interrupt remapping, and should hopefully fix your crash when the BIOS gives an RMRR which ends before it starts (I'm guessing that's what happens). It should make no difference to the modesetting thing, which I believe is a different bug.
Robinson, I think your graphics issue is probably bug #540218. The nice UPS man brought me an HP6930p this morning, and that's what I see on it.
Thanks David, I'll wait for the koji build and report, and for the other bug, I'm posting a "me too" in that bug to avoid getting off topic.
OK, booted with the last build, everything seems to be working ok, I'm attaching the new dmesg, now the only messages are about broken BIOS.
Created attachment 373776 [details] virt enabled on bios + nomodeset
Heh, there's still something odd going on w/ your BIOS (surprise). Comparing comment #113 to comment #98: -DMAR:RMRR base: 0xf000674ef000fa11 end: 0xf000e987f000fea8 +DMAR:RMRR base: 0xffffffffffffffff end: 0xffefffffffffffff So the BIOS can't decide what to put in the DMAR table when disabling VT-d and re-enabling it.
Created attachment 373796 [details] virt enabled on BIOS and nomodeset (HP Quicklook 2 disabled) Take a look at that one, did a reboot, went to BIOS and disabled "HP Quicklook 2", then save and reboot. No powercycle.
OK, so multiple things were in flux here. Thanks for followup.
Robinson, you're still attaching your kernel logs as binary files. Please remember to mark them as text/plain when attaching them.
Oops, sorry... there I've fixed the last two...
Now with -149 it doesn't resume, also, after power cycling X session detects 2 monitors and expands my desktop, if I go to System -> Preferences -> Display, it gets normal just by opening it. I'll try -151 to see if it keeps happening...
It should hopefully resume with -151. That's bug #536675, I believe.
Yep, -151 and -152 works OK.
kernel-2.6.31.6-145.fc12 has been pushed to the Fedora 12 stable repository. If problems still persist, please make note of it in this bug report.