Bug 1406837 - Regression using vfio with mount namespaces enabled
Summary: Regression using vfio with mount namespaces enabled
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libvirt
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Michal Privoznik
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-12-21 15:29 UTC by sL1pKn07
Modified: 2017-01-06 08:45 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-04 14:43:13 UTC
Embargoed:


Attachments (Terms of Use)

Description sL1pKn07 2016-12-21 15:29:49 UTC
Description of problem:

I have problems when launch my VM through libvirt builded from GIT since this commit: https://libvirt.org/git/?p=libvirt.git;a=commit;h=f444faa94a0e30f7dfdd47dce18b526abb0aaa9f


Version-Release number of selected component (if applicable):

3.0.0 from GIT commit f444faa94a0e30f7dfdd47dce18b526abb0aaa9f

How reproducible:

build libvrit from git
merge changes in /etc/libvirt/qemu.conf (in my case, the part of the the namespaces)
launch the libvirtd service (by hand or throught systemd)
start the VM with: 'virsh -c qemu:///system start VM' with normal user 

Actual results:

└───╼  LC_ALL=C virsh -c qemu:///system start windoze
error: Failed to start domain windoze
error: internal error: Process exited prior to exec: libvirt: QEMU Driver error : Failed to make device /var/run/libvirt/qemu/windoze.dev//vfio/22: File exists

───╼  LC_ALL=C sudo libvirtd
[sudo] password for sl1pkn07: 
2016-12-21 14:40:45.628+0000: 24681: info : libvirt version: 3.0.0
2016-12-21 14:40:45.628+0000: 24681: info : hostname: sL1pKn07
2016-12-21 14:40:45.628+0000: 24681: warning : qemuDomainObjTaint:4035 : Domain id=1 name='windoze' uuid=167cfa49-c88f-46df-a6bf-3127d5bf4d38 is tainted: custom-argv
2016-12-21 14:40:45.628+0000: 24681: warning : qemuDomainObjTaint:4035 : Domain id=1 name='windoze' uuid=167cfa49-c88f-46df-a6bf-3127d5bf4d38 is tainted: host-cpu
2016-12-21 14:40:45.648+0000: 24681: error : virCommandHandshakeWait:2689 : Child quit during startup handshake: Input/output error
2016-12-21 14:40:45.648+0000: 24681: error : qemuProcessReportLogError:1792 : internal error: Process exited prior to exec: libvirt: QEMU Driver error : Failed to make device /var/run/libvirt/qemu/windoze.dev//vfio/22: File exists

Expected results:

start the VM without problem

Additional info:

Archlinux
Linux sL1pKn07 4.8.11-1-ARCH #1 SMP PREEMPT Sun Nov 27 09:26:14 CET 2016 x86_64 GNU/Linux (distribution stock kernel)

version of quemu: 2.7.94 (v2.8.0-rc4-dirty) (builded from GIT, commit 6a928d25b6)

VM with VFIO (VGA) passtrough

the VM XML: https://sl1pkn07.wtf/paste/view/bafa1f26
the /etc/libvirt/qemu.conf: https://sl1pkn07.wtf/paste/view/198eb5cf

└───╼  cat /etc/modprobe.d/vfio.conf 
options vfio_pci ids=10de:17c2,10de:0fb0
options vfio_iommu_type1 allow_unsafe_interrupts=1

└───╼  cat /etc/modprobe.d/kvm.conf 
options kvm ignore_msrs=1

└───╼  cat /etc/mkinitcpio.conf 
*snip
MODULES="vfio vfio_iommu_type1 vfio_pci vfio_virqfd"
*snip

the IOMMU groups: https://sl1pkn07.wtf/paste/view/2a86bff4


Talking about of this in the IRC. @laine and @mprivozn found the solution:

edit the /etc/libvirt.d/qemu.conf and add/edit in the namespaces part 'namespaces = []'

if leave comment and comment out the line '#namespaces = [ "mount" ]' gets the fail

the other solution is revert the commit 444faa94a0e30f7dfdd47dce18b526abb0aaa9f

greetings

Comment 1 Michal Privoznik 2017-01-04 12:58:19 UTC
Ah, found the root cause. You are trying to assign two PCI devices which fall into the same IOMMU group. So while for the first one /dev/vfio/X entry is created, trying to do so for the second device fails as the path already exists.
What's worse is that this can happen with other combinations of devices, e.g. RNG/chardev with /dev/null backend, as /dev/null is created regardless of domain configuration. For instance the following XML fails too:

    <rng model='virtio'>
      <backend model='random'>/dev/null</backend>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </rng>

Weirdly, if this were hotplug you wouldn't see any error as EEXIST is correctly handled there.

Comment 2 Michal Privoznik 2017-01-04 14:14:41 UTC
Patches posted on the upstream list:

https://www.redhat.com/archives/libvir-list/2017-January/msg00073.html

Comment 3 Michal Privoznik 2017-01-04 14:43:13 UTC
I've just pushed the patch upstream:

commit 3aae99fe71ccee523bafeb54ebd0338eeed66868
Author:     Michal Privoznik <mprivozn>
AuthorDate: Wed Jan 4 13:57:06 2017 +0100
Commit:     Michal Privoznik <mprivozn>
CommitDate: Wed Jan 4 15:36:42 2017 +0100

    qemu: Handle EEXIST gracefully in qemuDomainCreateDevice
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1406837
    
    Imagine you have a domain configured in such way that you are
    assigning two PCI devices that fall into the same IOMMU group.
    With mount namespace enabled what happens is that for the first
    PCI device corresponding /dev/vfio/X entry is created and when
    the code tries to do the same for the second mknod() fails as
    /dev/vfio/X already exists:
    
    2016-12-21 14:40:45.648+0000: 24681: error :
    qemuProcessReportLogError:1792 : internal error: Process exited
    prior to exec: libvirt: QEMU Driver error : Failed to make device
    /var/run/libvirt/qemu/windoze.dev//vfio/22: File exists
    
    Worse, by default there are some devices that are created in the
    namespace regardless of domain configuration (e.g. /dev/null,
    /dev/urandom, etc.). If one of them is set as backend for some
    guest device (e.g. rng, chardev, etc.) it's the same story as
    described above.
    
    Weirdly, in attach code this is already handled.
    
    Signed-off-by: Michal Privoznik <mprivozn>


v2.5.0-291-g3aae99fe71

Comment 4 sL1pKn07 2017-01-05 15:59:14 UTC
seems working now (before edit the file qemu.conf for revert the changes in the namespaces part)

but the directoty /var/run/libvirt/qemu/foo.dev/ and others is empty when the VM is launched

is normal?

greetings

Comment 5 Michal Privoznik 2017-01-06 08:45:46 UTC
Yes. that is normal(In reply to sL1pKn07 from comment #4)
> seems working now (before edit the file qemu.conf for revert the changes in
> the namespaces part)

Cool.

> 
> but the directoty /var/run/libvirt/qemu/foo.dev/ and others is empty when
> the VM is launched
> 
> is normal?

Yes. That is normal. Those dirs serve as a temporary point where real /dev/* mount points are moved. The idea is, we want /dev/* mount points to be shared with the parent namespace so that this namespace is transparent to the other applications. For instance, qemu creates /dev/pts/NNN for guest consoles. If /dev/pts/ would not be preserved, then applications have no way of attaching to the console. It's the same story with other mount points there. So just before building new /dev, all /dev/* mountpoints are moved to /var/run/libvirt/qemu/foo.* location, and moved back from there right after the /dev building is completed.

Now that I am writing these lines I *think* those temp dirs can be safely removed once /dev building is completed. But that's out of scope of this bug. But I'll look into it.


Note You need to log in before you can comment on or make changes to this bug.