Bug 1113399 - [PATCH] ACS override patch required to assign PCIe devices to VMs using VFIO-PCI
Summary: [PATCH] ACS override patch required to assign PCIe devices to VMs using VFIO-PCI
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 20
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-06-26 06:19 UTC by oakwhiz
Modified: 2017-11-15 08:15 UTC (History)
11 users (show)

Fixed In Version:
Clone Of:
: 1141399 (view as bug list)
Environment:
Last Closed: 2015-06-30 01:19:42 UTC
Type: Bug
Embargoed:
nupur.priya: needinfo?
nupur.priya: needinfo+


Attachments (Terms of Use)
ACS override patch for 3.14.8-200.fc20.x86_64 (4.26 KB, patch)
2014-06-28 16:08 UTC, oakwhiz
no flags Details | Diff
Output from 'lspci -nnvvv' (52.64 KB, text/plain)
2015-01-08 19:54 UTC, Alex
no flags Details

Description oakwhiz 2014-06-26 06:19:13 UTC
When assigning a PCIe device to a VM with KVM and vfio-pci, some PCIe devices will cause the following error message:

Error starting domain: internal error: early end of file from monitor: possible problem:
2014-06-26T05:58:47.482875Z qemu-system-x86_64: -device vfio-pci,host=07:00.0,id=hostdev0,bus=pci.0,addr=0x8: vfio: error, group 13 is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver.
2014-06-26T05:58:47.483075Z qemu-system-x86_64: -device vfio-pci,host=07:00.0,id=hostdev0,bus=pci.0,addr=0x8: vfio: failed to get group 13
2014-06-26T05:58:47.483102Z qemu-system-x86_64: -device vfio-pci,host=07:00.0,id=hostdev0,bus=pci.0,addr=0x8: Device initialization failed.
2014-06-26T05:58:47.483128Z qemu-system-x86_64: -device vfio-pci,host=07:00.0,id=hostdev0,bus=pci.0,addr=0x8: Device 'vfio-pci' could not be initialized


As you can see below, other PCIe devices are present in the same iommu_group. I don't want to pass through these other devices into the VM, but since they are in the same iommu_group, they are interfering with the operation of vfio-pci.

$ ls /sys/kernel/iommu_groups/13/devices/
0000:00:15.0  0000:00:15.2  0000:00:15.3  0000:06:00.0  0000:07:00.0  0000:08:00.0


It seems that the following patch is required to fix this: https://lkml.org/lkml/2013/5/30/513
The patch allows the user to set a kernel argument that assumes that PCIe devices can be isolated from each other in their own iommu group. Without this patch, vfio-pci is essentially broken for certain PCIe devices which do not correctly utilize ACS functionality.

Is it possible to have this patch added to the Fedora kernel?

Comment 1 Josh Boyer 2014-06-26 12:35:04 UTC
(In reply to oakwhiz from comment #0)
> Is it possible to have this patch added to the Fedora kernel?

At the moment, no.  Mostly because it's still in the middle of being discussed and isn't in the upstream tree yet.  We'll keep an eye on it and see if it's applicable for backport once it hits the mainline kernel tree.

Comment 2 Alex Williamson 2014-06-26 14:12:11 UTC
The patch is not expected to be accepted upstream but downstreams can obviously choose to carry it.  The argument upstream is that issues that arise from a user overriding known, hardware advertised device isolation can be subtle and incredibly difficult to debug.  The path forward to allowing configurations that are currently prevented is to work with the hardware vendors to determine whether devices are isolated and encourage future products to support PCI ACS so that the hardware advertises this isolation automatically.

Comment 3 oakwhiz 2014-06-28 16:08:10 UTC
Created attachment 913028 [details]
ACS override patch for 3.14.8-200.fc20.x86_64

ACS override patch for 3.14.8-200.fc20.x86_64
This patch seems like it works on the latest Fedora kernel. I think.

Comment 4 Justin M. Forbes 2014-11-13 15:58:37 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs.

Fedora 20 has now been rebased to 3.17.2-200.fc20.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 21, and are still experiencing this issue, please change the version to Fedora 21.

If you experience different issues, please open a new bug report for those.

Comment 5 Alex 2015-01-08 13:11:03 UTC
I am observing the same vfio error message after upgrading a system from Fedora 20 to Fedora 21.

My scenario is the passthrough of an Intel RMS25KB080 controller, installed in an Intel S1200V3RPL mainboard (in the PCIe x8 Gen 2.x slot).

With Fedora 20 I was able to passthrough without getting this error, and the controller was working fine attached to a guest.

However, after I upgraded the host system to Fedora 21, I am now seeing this error when I attempt to start the guest with 'sudo virsh start', e.g. 

"group <n> is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver"

I would prefer not to have to build and maintain a patched kernel :( What further information should I provide, and should I create a new bug for this specific hardware scenario?

Thanks,

Alex

Comment 6 Alex Williamson 2015-01-08 13:59:29 UTC
(In reply to Alex from comment #5)
> What further information should I provide, and should I create a new bug 
> for this specific hardware scenario?

lspci -nnvvv and processor model

Comment 7 Alex 2015-01-08 19:54:35 UTC
Created attachment 977932 [details]
Output from 'lspci -nnvvv'

Attached 'lspci_output.txt', contains the output from 'lspci -nnvvv' on a system with Intel RMS25KB080 controller, installed in an Intel S1200V3RPL mainboard. The device address of the controller being passed through is '02:00.0'.

Comment 8 Alex 2015-01-08 19:57:00 UTC
(In reply to Alex Williamson from comment #6)
> (In reply to Alex from comment #5)
> > What further information should I provide, and should I create a new bug 
> > for this specific hardware scenario?
> 
> lspci -nnvvv and processor model

CPU processor model: Intel Xeon CPU E3-1220v3 @ 3.10GHz

Comment 9 Alex Williamson 2015-01-08 20:38:56 UTC
(In reply to Alex from comment #7)
> Created attachment 977932 [details]
> Output from 'lspci -nnvvv'
> 
> Attached 'lspci_output.txt', contains the output from 'lspci -nnvvv' on a
> system with Intel RMS25KB080 controller, installed in an Intel S1200V3RPL
> mainboard. The device address of the controller being passed through is
> '02:00.0'.

Sorry to say, but Xeon E3-1200 series are the only Xeons (afaik) that do not (and apparently will not) support isolation on the processor root ports:

https://www-ssl.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e3-1200v3-spec-update.pdf

---
HSW2. Intel® Virtualization Technology (Intel® VT) Clarification

Section 3.1 will be modified to include the following paragraph:

It is recommended to avoid device direct assignment to Virtual Machines in virtualized environments with this processor due to the lack of Access Control Services (ACS) support in PCI-Express root ports. Some Operating Systems may check for ACS support and potentially disable direct device assignment (that is, affects SR-IOV setup/configuration within the server) as well.
---

My assumption is that this applies to v1 and v2 E3-1200 processors as well.  Unless you can get Intel to change their mind, you're going to need to install the device elsewhere or patch your kernel and risk unintended peer-to-peer between devices.  We have quirks for most of the PCH root ports, but Intel seems to have no intention of supporting isolation on client processor (Core-i5/7) or Xeon E3-1200 series processor root ports.

Comment 10 Alex 2015-01-08 22:31:11 UTC
(In reply to Alex Williamson from comment #9)
> (In reply to Alex from comment #7)
> > Created attachment 977932 [details]
> > Output from 'lspci -nnvvv'
> > 
> > Attached 'lspci_output.txt', contains the output from 'lspci -nnvvv' on a
> > system with Intel RMS25KB080 controller, installed in an Intel S1200V3RPL
> > mainboard. The device address of the controller being passed through is
> > '02:00.0'.
> 
> Sorry to say, but Xeon E3-1200 series are the only Xeons (afaik) that do not
> (and apparently will not) support isolation on the processor root ports:
> 
> https://www-ssl.intel.com/content/dam/www/public/us/en/documents/
> specification-updates/xeon-e3-1200v3-spec-update.pdf
> 
> My assumption is that this applies to v1 and v2 E3-1200 processors as well. 
> Unless you can get Intel to change their mind, you're going to need to
> install the device elsewhere or patch your kernel and risk unintended
> peer-to-peer between devices.  We have quirks for most of the PCH root
> ports, but Intel seems to have no intention of supporting isolation on
> client processor (Core-i5/7) or Xeon E3-1200 series processor root ports.

I'm confused now.. this was working fine in Fedora 20 for several months (no error on virsh start), how could this be the case if it's a hardware issue? In fact, my plan was to go back to Fedora 20 to get it working again.

Thanks

Alex.

Comment 11 Alex Williamson 2015-01-08 22:46:31 UTC
(In reply to Alex from comment #10)
> 
> I'm confused now.. this was working fine in Fedora 20 for several months (no
> error on virsh start), how could this be the case if it's a hardware issue?
> In fact, my plan was to go back to Fedora 20 to get it working again.

Perhaps you were using legacy KVM device assignment on FC20 (pci-assign).  That driver leaves ACS requirements to userspace, which mostly gets it wrong and allows it to be easily disabled anyway.  The new driver, vfio-pci has kernel enforced isolation requirements because the kernel must protect itself.  You can switch back to legacy assignment by specifying the driver option in your xml, but you can expect that at some point legacy KVM device assignment will go away.

Comment 12 Alex 2015-01-09 00:35:35 UTC
(In reply to Alex Williamson from comment #11)
> (In reply to Alex from comment #10)
> > 
> > I'm confused now.. this was working fine in Fedora 20 for several months (no
> > error on virsh start), how could this be the case if it's a hardware issue?
> > In fact, my plan was to go back to Fedora 20 to get it working again.
> 
> Perhaps you were using legacy KVM device assignment on FC20 (pci-assign). 
> That driver leaves ACS requirements to userspace, which mostly gets it wrong
> and allows it to be easily disabled anyway.  The new driver, vfio-pci has
> kernel enforced isolation requirements because the kernel must protect
> itself.  You can switch back to legacy assignment by specifying the driver
> option in your xml, but you can expect that at some point legacy KVM device
> assignment will go away.

Aha.. based on your advice I did a quick web search and came across this doc for libvirt: https://libvirt.org/formatdomain.html#elementsHostDev

In this doc, regarding the 'name' attribute of the 'driver' sub-element of 'hostdev', it states: "When not specified, the default is 'vfio' on systems where the VFIO driver is available and loaded, and 'kvm' on older systems, or those where the VFIO driver hasn't been loaded [Since 1.1.3] (prior to that the default was always 'kvm')."

So I guess this is the change from F20 to F21 that is causing the error. I hadn't touched the xml definition for the guest, but the default value has changed in libvirt; and with my system not supporting the vfio driver properly, I hit the error.

Anyway I just tried a workaround by adding "<device name='kvm'/>" to the 'hostdev' in the xml definition, and "virsh start" is now working :)

So to summarise..

- If I want to stay on F21, my options are:

1) Use the "<device name='kvm'/>" workaround (aka legacy device assignment)
2) Try the disk controller in a PCH-connected pcie slot, instead of a CPU-connected pcie slot? I think this would mean an x4 connection instead of an x8 connection, for the mainboard I have..
3) Use the ACS override patch
4) Upgrade the system to use hardware with ACS support; presumably this would mean a Xeon E5 CPU/mainboard instead of an E3, or something like that...?

- The disadvantages of legacy device assignment, compared to ACS/vfio are:

1) Legacy method doesn't support EFI systems & secure boot etc
2) Legacy method gives lower system performance/less stability?
3) Legacy method will be removed at some point

Thanks

Alex.

Comment 13 Alex 2015-01-20 12:55:42 UTC
I just wanted to give a quick update following my previous comment, with some results after trying options 1 and 2:

- Option 1 worked - with this option, I didn't need to relocate the disk controller to a different (PCH) slot; I used 'virsh edit' to modify the guest definition and insert the "<device name='kvm'/>" workaround;

- Option 2 worked - with this option, I didn't need the "<device name='kvm'/>" workaround; I relocated the disk controller from a CPU-connected pcie slot to a PCH-connected pcie slot.

Thanks for your help,

Alex.

Comment 14 Fedora End Of Life 2015-05-29 12:13:36 UTC
This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 15 Fedora End Of Life 2015-06-30 01:19:42 UTC
Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 16 nupur priya 2017-11-15 08:12:36 UTC
Hi, 
By updating to kernel (3.10.0-514.21.1.el7.x86_64),I found the following error when try to deploy VM.

Libvirt cannot start the VM because of the following errors:
2017-11-01T14:07:44.930993Z qemu-kvm: -device vfio-pci,host=04:10.6,id=hostdev0,bus=pci.0,addr=0x4: vfio: error, group 2 is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver.

With old kernel, 3.10.0-514.10.2.el7.x86_64 has no such issue.


I suspect the ACS override patch for AMD is missing in current kernel package. Here are the list iommmu group for working kernel:

/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:02.0
/sys/kernel/iommu_groups/2/devices/0000:00:02.3
/sys/kernel/iommu_groups/3/devices/0000:00:02.4
/sys/kernel/iommu_groups/4/devices/0000:00:03.0
/sys/kernel/iommu_groups/5/devices/0000:00:03.1
/sys/kernel/iommu_groups/6/devices/0000:00:08.0
/sys/kernel/iommu_groups/7/devices/0000:00:09.0
/sys/kernel/iommu_groups/7/devices/0000:00:09.2
/sys/kernel/iommu_groups/8/devices/0000:00:10.0
/sys/kernel/iommu_groups/9/devices/0000:00:11.0
/sys/kernel/iommu_groups/10/devices/0000:00:12.0
/sys/kernel/iommu_groups/11/devices/0000:00:14.0
/sys/kernel/iommu_groups/11/devices/0000:00:14.3
/sys/kernel/iommu_groups/12/devices/0000:00:18.0
/sys/kernel/iommu_groups/12/devices/0000:00:18.1
/sys/kernel/iommu_groups/12/devices/0000:00:18.2
/sys/kernel/iommu_groups/12/devices/0000:00:18.3
/sys/kernel/iommu_groups/12/devices/0000:00:18.4
/sys/kernel/iommu_groups/12/devices/0000:00:18.5
/sys/kernel/iommu_groups/13/devices/0000:01:00.0
/sys/kernel/iommu_groups/14/devices/0000:03:00.0
/sys/kernel/iommu_groups/15/devices/0000:03:00.1
/sys/kernel/iommu_groups/16/devices/0000:03:00.2
/sys/kernel/iommu_groups/17/devices/0000:03:00.3
/sys/kernel/iommu_groups/18/devices/0000:04:10.2
/sys/kernel/iommu_groups/19/devices/0000:04:10.6
/sys/kernel/iommu_groups/20/devices/0000:04:10.3
/sys/kernel/iommu_groups/21/devices/0000:04:10.7
/sys/kernel/iommu_groups/22/devices/0000:04:10.0
/sys/kernel/iommu_groups/23/devices/0000:04:10.4
/sys/kernel/iommu_groups/24/devices/0000:04:10.1
/sys/kernel/iommu_groups/25/devices/0000:04:10.5
[root@nfvis ~]# 

Here is one with issues:

/sys/kernel/iommu_groups/2/devices/0000:00:03.0
/sys/kernel/iommu_groups/2/devices/0000:00:03.1
/sys/kernel/iommu_groups/2/devices/0000:03:00.0
/sys/kernel/iommu_groups/2/devices/0000:03:00.1
/sys/kernel/iommu_groups/2/devices/0000:03:00.2
/sys/kernel/iommu_groups/2/devices/0000:03:00.3
/sys/kernel/iommu_groups/2/devices/0000:04:10.0
/sys/kernel/iommu_groups/2/devices/0000:04:10.1
/sys/kernel/iommu_groups/2/devices/0000:04:10.2
/sys/kernel/iommu_groups/2/devices/0000:04:10.3
/sys/kernel/iommu_groups/2/devices/0000:04:10.4
/sys/kernel/iommu_groups/2/devices/0000:04:10.5
/sys/kernel/iommu_groups/2/devices/0000:04:10.6
/sys/kernel/iommu_groups/2/devices/0000:04:10.7

0000:03:xx.x is for physical nic
0000:04:xx.x is for virtual nic

Any suggestion??

Comment 17 nupur priya 2017-11-15 08:15:36 UTC
Any other information required??


Note You need to log in before you can comment on or make changes to this bug.