Bug 1884659 - RFE: vhost-user-blk-pci device support [TechPreview]
Summary: RFE: vhost-user-blk-pci device support [TechPreview]
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.4
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: 8.3
Assignee: Pavel Hrdina
QA Contact: Han Han
URL:
Whiteboard:
Depends On: 1901323 1930033
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-02 15:25 UTC by Stefan Hajnoczi
Modified: 2021-05-25 06:44 UTC (History)
15 users (show)

Fixed In Version: libvirt-7.0.0-5.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-25 06:43:36 UTC
Type: Feature Request
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Stefan Hajnoczi 2020-10-02 15:25:10 UTC
Please add domain XML for vhost-user-blk-pci devices.

QEMU's --device vhost-user-blk-pci connects to a vhost-user device backend running in a separate process. The guest sees a virtio-blk PCI device. This allows the virtio-blk virtqueues to be emulated outside QEMU.

The syntax is --device vhost-user-blk-pci,chardev=CHARDEV,num-queues=N,queue-size=M,config-wce=WCE.

The chardev is a UNIX domain socket. The vhost-user-blk device backend typically acts as the server and QEMU is the client. In theory the reverse direction is possible too but it's not used much in practice.


There are several implementations of vhost-user device backends, including QEMU's new vhost-user-blk server, QEMU contrib/vhost-user-blk/vhost-user-blk, and SPDK. For testing you can use:

  $ qemu-storage-daemon --blockdev file,node-name=drive0,filename=test.img \
                        --export vhost-user-blk,node-name=drive0,addr.type=unix,addr.path=/tmp/vhost-user-blk.sock

As with all vhost-user devices it is necessary to enable MAP_SHARED for QEMU's guest RAM:

  $ qemu-system-x86_64 -M accel=kvm,memory-backend=mem \
                       --object host-memory-memfd,id=mem,size=1G,share=on \
                       --chardev socket,path=/tmp/vhost-user-blk.sock,id=char0 \
                       --device vhost-user-blk-pci,chardev=char0

Comment 11 Pavel Hrdina 2021-02-04 17:52:16 UTC
Upstream patches:

d763466edc qemu: implement vhost-user-blk support
c8b0d5b0ad qemu_capabilities: introduce vhost-user-blk capability
f00fe96eb0 conf: implement support for vhostuser disk
e88bdaf789 docs: introduces new vhostuser disk type
592fb164e9 qemu_validate: move and refactor qemuValidateDomainDefVirtioFSSharedMemory
6799cc3ada qemu_alias: introduce qemuDomainGetVhostUserAlias helper

Comment 12 Pavel Hrdina 2021-02-04 17:54:07 UTC
To backport we would need this cleanup patches as well:

836e0a960b storage_source: use virStorageSource prefix for all functions
5ac39c4ab0 util: move virStorageEncryption code into conf
3e54766414 util: move virStorageSource code into conf
2cdd833eae util: move virStorageFileProbe code into storage_file
65abeb058f util: move virStorageFileBackend code into storage_file
01f7ade912 util: extract virStorageFile code into storage_source
296032bfb2 util: extract storage file probe code into virtstoragefileprobe.c
eaa0b3288e util: move virStorageSourceFindByNodeName into qemu_domain
90caf9d763 storage: move storage file sources to separate directory
3e210d204c virstoragefile: change virStorageSource->drv to void pointer
7b4e3bab5b virstoragefile: properly include virstoragefile.h header
23a68a0ed9 src: add missing virstoragefile.h includes

Comment 14 Han Han 2021-02-05 11:05:09 UTC
Hi Pavel,
I find a description of vhostuser disk at https://libvirt.org/formatdomain.html :
Note that the vhost server replaces both the disk frontend and backend thus almost all of the disk properties can't be configured via the <disk> XML for this disk type


Here I think we should list which disk properties are supported and which are not.


And I get the properties of vhost-user-blk-pci:
➜  ~ qemu-kvm -device vhost-user-blk-pci,\?                                                                                                                    
vhost-user-blk-pci options:
  addr=<int32>           - Slot and optional function number, example: 06.0 or 06 (default: (null))
  aer=<bool>             - on/off (default: (null))
  any_layout=<bool>      - on/off (default: (null))
  ats=<bool>             - on/off (default: (null))
  bootindex=<int32>
  chardev=<str>          - ID of a chardev to use as a backend
  class=<uint32>         -  (default: (null))
  config-wce=<bool>      - on/off (default: (null))
  disable-legacy=<OnOffAuto> - on/off/auto (default: (null))
  disable-modern=<bool>  -  (default: (null))
  event_idx=<bool>       - on/off (default: (null))
  failover_pair_id=<str>
  indirect_desc=<bool>   - on/off (default: (null))
  iommu_platform=<bool>  - on/off (default: (null))
  migrate-extra=<bool>   - on/off (default: (null))
  modern-pio-notify=<bool> - on/off (default: (null))
  multifunction=<bool>   - on/off (default: (null))
  notify_on_empty=<bool> - on/off (default: (null))
  num-queues=<uint16>    -  (default: (null))
  packed=<bool>          - on/off (default: (null))
  page-per-vq=<bool>     - on/off (default: (null))
  queue-size=<uint32>    -  (default: (null))
  rombar=<uint32>        -  (default: (null))
  romfile=<str>
  use-disabled-flag=<bool> -  (default: (null))
  use-started=<bool>     -  (default: (null))
  vectors=<uint32>       -  (default: (null))
  virtio-backend=<child<vhost-user-blk>>
  virtio-pci-bus-master-bug-migration=<bool> - on/off (default: (null))
  x-disable-legacy-check=<bool> -  (default: (null))
  x-disable-pcie=<bool>  - on/off (default: (null))
  x-ignore-backend-features=<bool> -  (default: (null))
  x-pcie-deverr-init=<bool> - on/off (default: (null))
  x-pcie-extcap-init=<bool> - on/off (default: (null))
  x-pcie-flr-init=<bool> - on/off (default: (null))
  x-pcie-lnkctl-init=<bool> - on/off (default: (null))
  x-pcie-lnksta-dllla=<bool> - on/off (default: (null))
  x-pcie-pm-init=<bool>  - on/off (default: (null))

For the properties above, ats, event_idx, iommu_platform, packed, num-queues are support in the vhost-user-blk-pci device of qemu.
While in libvirt, packed, ats, iommu are reported as unsupported for vhostuser disk:
➜  ~ virsh edit test   
error: unsupported configuration: packed is not supported with vhostuser disk
Failed. Try again? [y,n,i,f,?]: 
error: unsupported configuration: ats is not supported with vhostuser disk
Failed. Try again? [y,n,i,f,?]: 
error: unsupported configuration: iommu is not supported with vhostuser disk
Failed. Try again? [y,n,i,f,?]: 

Would you like to fix that?

Comment 15 Han Han 2021-02-05 11:06:14 UTC
The results of comment14 is from qemu-5.2.0-5.fc34.1.x86_64 libvirt v7.0.0-331-gc0ae2ca081.

Comment 16 Pavel Hrdina 2021-02-05 12:03:13 UTC
Hi Han,

thanks for the quick testing. You are correct and it looks like the all of the virtio specific options "iommu", "ats" and "packed" are indeed supported. I'll post a patch to fix that.

For the documentation update I agree that it would be better to have an explicit list of all the valid XML combinations, but that should be done basically for the whole libvirt domain XML documentation. Sometimes we have some explicit limitation described in the documentation but it's far from ideal so I would probably leave it like it is for now.

Comment 17 Han Han 2021-02-07 02:17:43 UTC
(In reply to Pavel Hrdina from comment #16)
> Hi Han,
> 
> thanks for the quick testing. You are correct and it looks like the all of
> the virtio specific options "iommu", "ats" and "packed" are indeed
> supported. I'll post a patch to fix that.
> 
OK> thank you
> For the documentation update I agree that it would be better to have an
> explicit list of all the valid XML combinations, but that should be done
> basically for the whole libvirt domain XML documentation. Sometimes we have
> some explicit limitation described in the documentation but it's far from
> ideal so I would probably leave it like it is for now.

For the implicit doc, I file a bug here: https://bugzilla.redhat.com/show_bug.cgi?id=1925848

Comment 18 Han Han 2021-02-07 05:47:40 UTC
Test results for the virtio fix: conf: allow virtio driver attributes for vhostuser disk
https://listman.redhat.com/archives/libvir-list/2021-February/msg00510.html

Comment 19 Pavel Hrdina 2021-02-08 11:28:07 UTC
One more patch needs to be backported:

d3f4f01fa7 conf: allow virtio driver attributes for vhostuser disk

Comment 25 Han Han 2021-03-08 08:17:05 UTC
Tested on libvirt-7.0.0-8.module+el8.4.0+10233+8b7fd9eb.x86_64 qemu-kvm-5.2.0-10.module+el8.4.0+10217+cbdd2152.x86_64:
Setup:
Prepare qemu-storage-daemon:
➜  ~ qemu-storage-daemon --blockdev '{"driver":"file","filename":"/tmp/new","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' --blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"raw","file":"libvirt-1-storage"}' --export vhost-user-blk,id=vhost-user-blk0,node-name=libvirt-1-format,addr.type=unix,addr.path=/var/lib/libvirt/qemu/vhost.sock --chardev stdio,mux=on,id=char0

Change the ower of vhost sock to qemu:
➜  ~ chown qemu:qemu /var/lib/libvirt/qemu/vhost.sock

As a workaround of the deny of svirt, I will set selinux to Permissive mode:
➜  ~ getenforce
Permissive

SC1: Start with vhostuser disk:
Domain xml:
...
  <memoryBacking>
    <source type='memfd'/>
    <access mode='shared'/>
  </memoryBacking>
...
    <disk type='vhostuser' device='disk' snapshot='no'>
      <driver name='qemu' type='raw'/>
      <source type='unix' path='/var/lib/libvirt/qemu/vhost.sock'/>
      <target dev='vdb' bus='virtio'/>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </disk>
...

VM started.

SC2:Live attach/detach the vhostuser disk
➜  ~ virsh attach-device rhel /tmp/vhos1t.xml
Device attached successfully

➜  ~ virsh detach-device rhel /tmp/vhos1t.xml
Device detached successfully


I have 3 questions here:
1. Is there anyway to avoid the permission deny while selinux is enforcing? I have tried changing selinux label by `chcon -u system_u -r object_r -t svirt_image_t /var/lib/libvirt/qemu/vhost.sock` but permission deny happens as well. 
2. Will libvirt implement the dac and sac update of vhost socket file? Just like what it does in image file
3. For the disk xml as above, the vhost disk in guest is actually readonly. It is not expected:
[root@localhost ~]# lsblk /dev/vdb
NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vdb  252:16   0  100M  1 disk 
[root@localhost ~]# dd if=/dev/zero of=/dev/vdb
dd: writing to '/dev/vdb': Operation not permitted
1+0 records in
0+0 records out
0 bytes copied, 0.0249928 s, 0.0 kB/s

Is it a kernel or qemu issue?

Comment 26 Han Han 2021-03-08 08:31:59 UTC
Question3 is resolved: Miss writable=on for the --export option of rqemu-storage-daemon

Comment 27 Pavel Hrdina 2021-03-15 12:55:40 UTC
(In reply to Han Han from comment #25)
> Tested on libvirt-7.0.0-8.module+el8.4.0+10233+8b7fd9eb.x86_64
> qemu-kvm-5.2.0-10.module+el8.4.0+10217+cbdd2152.x86_64:
> Setup:
> Prepare qemu-storage-daemon:
> ➜  ~ qemu-storage-daemon --blockdev
> '{"driver":"file","filename":"/tmp/new","node-name":"libvirt-1-storage",
> "auto-read-only":true,"discard":"unmap"}' --blockdev
> '{"node-name":"libvirt-1-format","read-only":false,"driver":"raw","file":
> "libvirt-1-storage"}' --export
> vhost-user-blk,id=vhost-user-blk0,node-name=libvirt-1-format,addr.type=unix,
> addr.path=/var/lib/libvirt/qemu/vhost.sock --chardev stdio,mux=on,id=char0
> 
> Change the ower of vhost sock to qemu:
> ➜  ~ chown qemu:qemu /var/lib/libvirt/qemu/vhost.sock
> 
> As a workaround of the deny of svirt, I will set selinux to Permissive mode:
> ➜  ~ getenforce
> Permissive
> 
> SC1: Start with vhostuser disk:
> Domain xml:
> ...
>   <memoryBacking>
>     <source type='memfd'/>
>     <access mode='shared'/>
>   </memoryBacking>
> ...
>     <disk type='vhostuser' device='disk' snapshot='no'>
>       <driver name='qemu' type='raw'/>
>       <source type='unix' path='/var/lib/libvirt/qemu/vhost.sock'/>
>       <target dev='vdb' bus='virtio'/>
>       <alias name='virtio-disk1'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x09'
> function='0x0'/>
>     </disk>
> ...
> 
> VM started.
> 
> SC2:Live attach/detach the vhostuser disk
> ➜  ~ virsh attach-device rhel /tmp/vhos1t.xml
> Device attached successfully
> 
> ➜  ~ virsh detach-device rhel /tmp/vhos1t.xml
> Device detached successfully
> 
> 
> I have 3 questions here:
> 1. Is there anyway to avoid the permission deny while selinux is enforcing?
> I have tried changing selinux label by `chcon -u system_u -r object_r -t
> svirt_image_t /var/lib/libvirt/qemu/vhost.sock` but permission deny happens
> as well.

The issue here is not the socket itself but the qemu-storage-daemon process that is
running with incorrect selinux label.

These steps will allow you to test it with enforcing:

1. set virtd_exec_t on qemu-storage-daemon:

    chcon -t virtd_exec_t /usr/bin/qemu-storage-daemon

2. run qemu-storage-daemon using systemd-run:

    systemd-run --uid qemu --gid qemu /usr/bin/qemu-storage-daemon ...

3. relabel the created socket

    chcon -t svirt_image_t /var/lib/libvirt/qemu/vhost.sock

4. start the VM


> 2. Will libvirt implement the dac and sac update of vhost socket file? Just
> like what it does in image file

No, libvirt cannot simply change the selinux label for that socket automatically because
libvirt doesn't manage the qemu-sotrage-daemon process. Changing the selinux label could
deny qemu-storage-daemon accessing the socket.

Comment 28 Han Han 2021-03-18 10:23:13 UTC
Verified on libvirt-7.0.0-9.module+el8.4.0+10326+5e50a3b6.x86_64 qemu-kvm-5.2.0-13.module+el8.4.0+10369+fd280775.x86_64

Comment 29 Han Han 2021-03-19 01:32:55 UTC
One more thing to mention, the selinux boolean domain_can_mmap_files should be on. Otherwise, there will be selinux permission deny error on memfd

Comment 31 errata-xmlrpc 2021-05-25 06:43:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2098


Note You need to log in before you can comment on or make changes to this bug.