Bug 1008958 - Enabling VT-D in BIOS and intel_iommu=on floods dmesg logs with DMAR faults [NEEDINFO]
Enabling VT-D in BIOS and intel_iommu=on floods dmesg logs with DMAR faults
Status: CLOSED INSUFFICIENT_DATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
19
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-17 08:03 EDT by Daniel Berrange
Modified: 2014-03-10 10:45 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-03-10 10:45:22 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
jforbes: needinfo?


Attachments (Terms of Use)
dmesg from successful boot with firewire device disabled in bios and intel_iommu=on set (77.54 KB, text/plain)
2013-09-17 08:04 EDT, Daniel Berrange
no flags Details
lspci -vv when firewire is enabled in BIOS, which results in unbootable machine with intel_iommu=on (11.56 KB, text/plain)
2013-09-17 08:05 EDT, Daniel Berrange
no flags Details
lspci -vv after disabling firewire in BIOS which allows intel_iommu=on to boot successfully (11.02 KB, text/plain)
2013-09-17 08:06 EDT, Daniel Berrange
no flags Details

  None (edit)
Description Daniel Berrange 2013-09-17 08:03:35 EDT
Description of problem:
I have a Lenovo Thinkpad T530 laptop which does VT-D. I enabled VT-D in the BIOS, and hard-power cycled the machine. I then attempted to boot the kernl with intel_iommu=on on the command line

The kernel would not even get as far as handing off to the init process. The console logs were flooded with thousands upon thousands of these 3 messages:

[    0.025170] dmar: DRHD: handling fault status reg 3
[    0.025221] dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr fffff000 
DMAR:[fault reason 02] Present bit in context entry is clear


The PCI device 02:00.0 is a "System peripheral: Ricoh Co Ltd PCIe SDXC/MMC Host Controller (rev 08)". It has a child device "FireWire (IEEE 1394): Ricoh Co Ltd R5C832 PCIe IEEE 1394 Controller (rev 04)"  (see attachments which will provide full PCI info)


Based on a comment in this ancient Fedora 13 bug report https://bugzilla.redhat.com/show_bug.cgi?id=605888, I disabled the firewire device in the BIOS. With this change I was able to succesfully boot the with intel_iommu=on enabled.

IMHO I should not be required to disable the firewire device in the BIOS.  If the PCI device hierarchy has problems the kernel should take care of resolving it for me.

I'm unclear what fix / patch was incorporated in the kernel for bug 605888, but if this is indeed the same error scenario, then it seems like it might have regressed in F19 kernels

Version-Release number of selected component (if applicable):
3.10.9-200.fc19.x86_64

How reproducible:
Always, when intel_iommu=on and VT-D is enabled in BIOS and all PCI devices are enabled in BIOS

Steps to Reproduce:
1. Get a T530 laptop
2. Enable VT-D in the BIOS (if it is already enabled then disable it, power cycle, enable it again, power cycle)
3. Boot with intel_iommu=on

Actual results:
Boot never completes, console is flooded in

[    0.025170] dmar: DRHD: handling fault status reg 3
[    0.025221] dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr fffff000 
DMAR:[fault reason 02] Present bit in context entry is clear


Expected results:
Boot suceeds normally with VT-D / iommu enabled

Additional info:
Comment 1 Daniel Berrange 2013-09-17 08:04:22 EDT
Created attachment 798771 [details]
dmesg from successful boot with firewire device disabled in bios and intel_iommu=on set
Comment 2 Daniel Berrange 2013-09-17 08:05:24 EDT
Created attachment 798778 [details]
lspci -vv   when firewire is enabled in BIOS, which results in unbootable machine with intel_iommu=on
Comment 3 Daniel Berrange 2013-09-17 08:06:04 EDT
Created attachment 798783 [details]
lspci -vv  after disabling firewire in BIOS which allows intel_iommu=on to boot successfully
Comment 4 Josh Boyer 2013-09-18 16:14:58 EDT
We used to carry a patch called dmar-disable-when-ricoh-multifunction.patch for this issue.  According to bug 880051 it should have no longer been needed and was dropped some time ago.

Alex, do you know if there are quirks upstream that should handle the issue that Daniel has reported?
Comment 5 Josh Boyer 2013-09-18 16:48:05 EDT
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.11.1-200.fc19.  Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.
Comment 6 Alex Williamson 2013-09-19 00:30:59 EDT
(In reply to Josh Boyer from comment #4)
> We used to carry a patch called dmar-disable-when-ricoh-multifunction.patch
> for this issue.  According to bug 880051 it should have no longer been
> needed and was dropped some time ago.
> 
> Alex, do you know if there are quirks upstream that should handle the issue
> that Daniel has reported?

No, the patch mentioned in bug 880051 comment 3 makes the IOMMU groups for broken Ricoh devices work.  That solves the problem for things like VFIO which makes use of IOMMU groups.  DMA ops does not yet do that.  The long term solution for DMA ops is something like the requester ID interface that I've proposed, but hasn't made it past the RFC stage.
Comment 7 Justin M. Forbes 2014-01-03 17:07:04 EST
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.12.6-200.fc19.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 20, and are still experiencing this issue, please change the version to Fedora 20.

If you experience different issues, please open a new bug report for those.
Comment 8 Justin M. Forbes 2014-03-10 10:45:22 EDT
*********** MASS BUG UPDATE **************

This bug has been in a needinfo state for more than 1 month and is being closed with insufficient data due to inactivity. If this is still an issue with Fedora 19, please feel free to reopen the bug and provide the additional information requested.

Note You need to log in before you can comment on or make changes to this bug.