Bug 757166 - Fedora 16 can't find sata III drives in a marvell sata III controller.
Summary: Fedora 16 can't find sata III drives in a marvell sata III controller.
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 16
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-11-25 16:11 UTC by fernandosj2k4
Modified: 2013-01-21 13:53 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-11-14 14:58:14 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
storage log. (139.48 KB, text/plain)
2011-11-28 15:13 UTC, fernandosj2k4
no flags Details
find attached syslog. (133.38 KB, text/plain)
2011-11-28 15:25 UTC, fernandosj2k4
no flags Details
Anaconda log. (9.91 KB, text/plain)
2011-11-28 15:27 UTC, fernandosj2k4
no flags Details

Description fernandosj2k4 2011-11-25 16:11:45 UTC
I'm try to install fedora 15 and 16 in a computer with a sata III marvell 88SEXXX controller, however both fedora 15 and 16 do not locale the sata III drives.

The system installer return the message: There is no disk to install.
And the installation fault.

The unbelievable is that the scientific linux (based on fedora) works with sata III marvell controller.

Thanks.

Fernando.

Comment 1 Chris Lumens 2011-11-27 15:45:18 UTC
Please attach /tmp/syslog and /tmp/storage.log from the failed installation to this bug report.  Thanks.

Comment 2 fernandosj2k4 2011-11-28 15:13:46 UTC
Created attachment 537504 [details]
storage log.

please, find attached the file storage.log

Comment 3 fernandosj2k4 2011-11-28 15:25:19 UTC
Created attachment 537508 [details]
find attached syslog.

please attention to lines 989:

That error below appears after I enable the sata III at motherboard bios.  

11:55:38,401 ERR kernel:[    2.305657] DRHD: handling fault status reg 2
11:55:38,401 ERR kernel:[    2.305721] DMAR:[DMA Write] Request device [02:00.1] fault addr fff00000 
11:55:38,401 ERR kernel:[    2.305723] DMAR:[fault reason 02] Present bit in context entry is clear

Comment 4 fernandosj2k4 2011-11-28 15:27:10 UTC
Created attachment 537509 [details]
Anaconda log.

Find attached log from anaconda.

Comment 5 fernandosj2k4 2011-11-28 15:38:08 UTC
My system have two HD's samsung HD204UI 2.0Tb at a sata III marvel 88SE91XX adapter.

My motherboard is an Intel DX580G extreme series.
thanks.

Comment 6 Chuck Ebbert 2011-11-29 12:45:34 UTC
(In reply to comment #3)
> 
> 11:55:38,401 ERR kernel:[    2.305657] DRHD: handling fault status reg 2
> 11:55:38,401 ERR kernel:[    2.305721] DMAR:[DMA Write] Request device
> [02:00.1] fault addr fff00000 
> 11:55:38,401 ERR kernel:[    2.305723] DMAR:[fault reason 02] Present bit in
> context entry is clear

This is a problem in the Intel DMAR driver. Try adding:

  intel_iommu=off

to the kernel boot options. (Or disable VT-d in the machine's BIOS settings.)

Comment 7 fernandosj2k4 2011-11-29 13:42:23 UTC
(In reply to comment #6)
 
> This is a problem in the Intel DMAR driver. Try adding:
> 
>   intel_iommu=off
> 
> to the kernel boot options. (Or disable VT-d in the machine's BIOS settings.)

SOLVED: 

After I disable VT-d in the machine's BIOS, Fedora 15 and 16 works with sata III drives.

Thanks.

Comment 8 David Woodhouse 2011-11-29 14:58:40 UTC
(In reply to comment #6)
> This is a problem in the Intel DMAR driver. Try adding:
> 
>   intel_iommu=off

Can you explain why you believe that to be the case?

The message 'Present bit in context entry is clear' seems to indicate that *no* DMA mappings have been set up for this device (02:00.1) at all.

In fact, the AHCI device is 02:00.0, isn't it? Not 02:00.1? So the problem here seems to be that the Marvell hardware is buggy, and does its DMA with the wrong source-address? We probably need a quirk to make the code handle that, like we {have,need} for the broken Ricoh devices that do something similar.

Chuck, is that why you said the problem is in the Intel DMAR driver? Had you already worked that out and sent a new quirk upstream?

Or was your comment just the same as when you get a SEGV in a userspace program, and say "The problem is in the MMU. Run on ucLinux and your program won't segfault"? :)

Comment 9 Dave Jones 2012-03-22 17:05:16 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 10 Dave Jones 2012-03-22 17:08:25 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 11 Dave Jones 2012-03-22 17:19:07 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 12 Andrew Cooks 2012-04-15 01:45:42 UTC
I still have this problem on 3.3.1 from kernel.org. If there is a separate patch that's been applied to the Fedora kernel I'd be happy to test it (the patch). I haven't seen any patch on lkml yet, but there's a short description of how a quirk might be implemented at https://lkml.org/lkml/2011/12/1/461

Comment 13 Andrew Cooks 2012-05-13 01:25:55 UTC
I think this is the same bug as 42679 on bugzilla.kernel.org. (https://bugzilla.kernel.org/show_bug.cgi?id=42679)

Comment 14 Dave Jones 2012-10-23 15:30:41 UTC
# Mass update to all open bugs.

Kernel 3.6.2-1.fc16 has just been pushed to updates.
This update is a significant rebase from the previous version.

Please retest with this kernel, and let us know if your problem has been fixed.

In the event that you have upgraded to a newer release and the bug you reported
is still present, please change the version field to the newest release you have
encountered the issue with.  Before doing so, please ensure you are testing the
latest kernel update in that release and attach any new and relevant information
you may have gathered.

If you are not the original bug reporter and you still experience this bug,
please file a new report, as it is possible that you may be seeing a
different problem. 
(Please don't clone this bug, a fresh bug referencing this bug in the comment is sufficient).

Comment 15 Justin M. Forbes 2012-11-14 14:58:14 UTC
With no response, we are closing this bug under the assumption that it is no longer an issue. If you still experience this bug, please feel free to reopen the bug report.

Comment 16 Ying Chu 2012-11-26 08:38:39 UTC
I found an approach to workaround the issue but still I have no idea who trigger the DMA READ/WRITE to an existed function (e.g 02:00.1 or any other under _ANY_ Marvell Magni series). For more details, I captured the PCIe trace and there was no _ANY_ TLP transactions related to 02:00.1.

The workaround idea is simple, since The bridge refused the DMA access to/from
02:00.1, and I set the 02:00.1 as present in context entry and point it to the same domain as 02:00.0 does. Then it works well. It's tricky and I'm not sure if it _MAY_ corrupt the memory as I still can not explain who trigger the un-existed 
DMA access.

Patch is as following and against 3.7 kernel.

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index d4a4cd4..188390f 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1542,7 +1542,7 @@ static void domain_exit(struct dmar_domain *domain)
        free_domain_mem(domain);
 }

-static int domain_context_mapping_one(struct dmar_domain *domain, int segment,
+static int domain_context_mapping_one(struct dmar_domain *domain, struct pci_dev *pdev, int segment,
                                 u8 bus, u8 devfn, int translation)
 {
        struct context_entry *context;
@@ -1555,6 +1555,7 @@ static int domain_context_mapping_one(struct dmar_domain *domain, int segment,
        int agaw;
        struct device_domain_info *info = NULL;

+set:
        pr_debug("Set context mapping for %02x:%02x.%d\n",
                bus, PCI_SLOT(devfn), PCI_FUNC(devfn));

@@ -1669,6 +1670,13 @@ static int domain_context_mapping_one(struct dmar_domain *domain, int segment,
                domain_update_iommu_cap(domain);
        }
        spin_unlock_irqrestore(&domain->iommu_lock, flags);
+
+       if (pdev->vendor == 0x1b4b &&
+               (pdev->device == 0x9125 || pdev->device == 0x9123) &&
+                       devfn == 0) {
+               devfn = 1;
+               goto set;
+       }
        return 0;
 }

@@ -1679,7 +1687,7 @@ domain_context_mapping(struct dmar_domain *domain, struct pci_dev *pdev,
        int ret;
        struct pci_dev *tmp, *parent;

-       ret = domain_context_mapping_one(domain, pci_domain_nr(pdev->bus),
+       ret = domain_context_mapping_one(domain, pdev, pci_domain_nr(pdev->bus),
                                         pdev->bus->number, pdev->devfn,
                                         translation);
        if (ret)
@@ -1692,7 +1700,7 @@ domain_context_mapping(struct dmar_domain *domain, struct pci_dev *pdev,
        /* Secondary interface's bus number and devfn 0 */
        parent = pdev->bus->self;
        while (parent != tmp) {
-               ret = domain_context_mapping_one(domain,
+               ret = domain_context_mapping_one(domain, parent,
                                                 pci_domain_nr(parent->bus),
                                                 parent->bus->number,
                                                 parent->devfn, translation);
@@ -1701,12 +1709,12 @@ domain_context_mapping(struct dmar_domain *domain, struct pci_dev *pdev,
                parent = parent->bus->self;
        }
        if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
-               return domain_context_mapping_one(domain,
+               return domain_context_mapping_one(domain, tmp,
                                        pci_domain_nr(tmp->subordinate),
                                        tmp->subordinate->number, 0,
                                        translation);
        else /* this is a legacy PCI bridge */
-               return domain_context_mapping_one(domain,
+               return domain_context_mapping_one(domain, tmp,
                                                  pci_domain_nr(tmp->bus),
                                                  tmp->bus->number,
                                                  tmp->devfn,

Comment 17 Dietrich 2013-01-20 15:45:45 UTC
Unfortunately this Bug is still there with Kernel 3.7.2 (strange enough everything works for me with the 3.6 kernel)

I have the same Symptoms:
Messages on boot:
DRHD: handling fault status reg 2
DMAR:[DMA Write] Request device [02:00.1] fault addr fff00000 
DMAR:[fault reason 02] Present bit in context entry is clear

While running:
SATA3 devices not recognized.

everything works after adding to kernel commandline:
intel_iommu=off

The only thing that is new I guess is that it worked with all the other kernels just the 3.7 line has some problems.

What would be the next step in this procedure?

Comment 18 Josh Boyer 2013-01-21 13:53:25 UTC
(In reply to comment #17)
> Unfortunately this Bug is still there with Kernel 3.7.2 (strange enough
> everything works for me with the 3.6 kernel)
> 
> I have the same Symptoms:
> Messages on boot:
> DRHD: handling fault status reg 2
> DMAR:[DMA Write] Request device [02:00.1] fault addr fff00000 
> DMAR:[fault reason 02] Present bit in context entry is clear
> 
> While running:
> SATA3 devices not recognized.
> 
> everything works after adding to kernel commandline:
> intel_iommu=off
> 
> The only thing that is new I guess is that it worked with all the other
> kernels just the 3.7 line has some problems.
> 
> What would be the next step in this procedure?

Your specific issue is caused by the intel IOMMU being enabled by default in the -201 kernel.  The -204 kernel has this disabled again and should work fine for you.  We'll leave this disabled going forward.


Note You need to log in before you can comment on or make changes to this bug.