Please add domain XML for vhost-user-blk-pci devices. QEMU's --device vhost-user-blk-pci connects to a vhost-user device backend running in a separate process. The guest sees a virtio-blk PCI device. This allows the virtio-blk virtqueues to be emulated outside QEMU. The syntax is --device vhost-user-blk-pci,chardev=CHARDEV,num-queues=N,queue-size=M,config-wce=WCE. The chardev is a UNIX domain socket. The vhost-user-blk device backend typically acts as the server and QEMU is the client. In theory the reverse direction is possible too but it's not used much in practice. There are several implementations of vhost-user device backends, including QEMU's new vhost-user-blk server, QEMU contrib/vhost-user-blk/vhost-user-blk, and SPDK. For testing you can use: $ qemu-storage-daemon --blockdev file,node-name=drive0,filename=test.img \ --export vhost-user-blk,node-name=drive0,addr.type=unix,addr.path=/tmp/vhost-user-blk.sock As with all vhost-user devices it is necessary to enable MAP_SHARED for QEMU's guest RAM: $ qemu-system-x86_64 -M accel=kvm,memory-backend=mem \ --object host-memory-memfd,id=mem,size=1G,share=on \ --chardev socket,path=/tmp/vhost-user-blk.sock,id=char0 \ --device vhost-user-blk-pci,chardev=char0
Upstream patches: d763466edc qemu: implement vhost-user-blk support c8b0d5b0ad qemu_capabilities: introduce vhost-user-blk capability f00fe96eb0 conf: implement support for vhostuser disk e88bdaf789 docs: introduces new vhostuser disk type 592fb164e9 qemu_validate: move and refactor qemuValidateDomainDefVirtioFSSharedMemory 6799cc3ada qemu_alias: introduce qemuDomainGetVhostUserAlias helper
To backport we would need this cleanup patches as well: 836e0a960b storage_source: use virStorageSource prefix for all functions 5ac39c4ab0 util: move virStorageEncryption code into conf 3e54766414 util: move virStorageSource code into conf 2cdd833eae util: move virStorageFileProbe code into storage_file 65abeb058f util: move virStorageFileBackend code into storage_file 01f7ade912 util: extract virStorageFile code into storage_source 296032bfb2 util: extract storage file probe code into virtstoragefileprobe.c eaa0b3288e util: move virStorageSourceFindByNodeName into qemu_domain 90caf9d763 storage: move storage file sources to separate directory 3e210d204c virstoragefile: change virStorageSource->drv to void pointer 7b4e3bab5b virstoragefile: properly include virstoragefile.h header 23a68a0ed9 src: add missing virstoragefile.h includes
Hi Pavel, I find a description of vhostuser disk at https://libvirt.org/formatdomain.html : Note that the vhost server replaces both the disk frontend and backend thus almost all of the disk properties can't be configured via the <disk> XML for this disk type Here I think we should list which disk properties are supported and which are not. And I get the properties of vhost-user-blk-pci: ➜ ~ qemu-kvm -device vhost-user-blk-pci,\? vhost-user-blk-pci options: addr=<int32> - Slot and optional function number, example: 06.0 or 06 (default: (null)) aer=<bool> - on/off (default: (null)) any_layout=<bool> - on/off (default: (null)) ats=<bool> - on/off (default: (null)) bootindex=<int32> chardev=<str> - ID of a chardev to use as a backend class=<uint32> - (default: (null)) config-wce=<bool> - on/off (default: (null)) disable-legacy=<OnOffAuto> - on/off/auto (default: (null)) disable-modern=<bool> - (default: (null)) event_idx=<bool> - on/off (default: (null)) failover_pair_id=<str> indirect_desc=<bool> - on/off (default: (null)) iommu_platform=<bool> - on/off (default: (null)) migrate-extra=<bool> - on/off (default: (null)) modern-pio-notify=<bool> - on/off (default: (null)) multifunction=<bool> - on/off (default: (null)) notify_on_empty=<bool> - on/off (default: (null)) num-queues=<uint16> - (default: (null)) packed=<bool> - on/off (default: (null)) page-per-vq=<bool> - on/off (default: (null)) queue-size=<uint32> - (default: (null)) rombar=<uint32> - (default: (null)) romfile=<str> use-disabled-flag=<bool> - (default: (null)) use-started=<bool> - (default: (null)) vectors=<uint32> - (default: (null)) virtio-backend=<child<vhost-user-blk>> virtio-pci-bus-master-bug-migration=<bool> - on/off (default: (null)) x-disable-legacy-check=<bool> - (default: (null)) x-disable-pcie=<bool> - on/off (default: (null)) x-ignore-backend-features=<bool> - (default: (null)) x-pcie-deverr-init=<bool> - on/off (default: (null)) x-pcie-extcap-init=<bool> - on/off (default: (null)) x-pcie-flr-init=<bool> - on/off (default: (null)) x-pcie-lnkctl-init=<bool> - on/off (default: (null)) x-pcie-lnksta-dllla=<bool> - on/off (default: (null)) x-pcie-pm-init=<bool> - on/off (default: (null)) For the properties above, ats, event_idx, iommu_platform, packed, num-queues are support in the vhost-user-blk-pci device of qemu. While in libvirt, packed, ats, iommu are reported as unsupported for vhostuser disk: ➜ ~ virsh edit test error: unsupported configuration: packed is not supported with vhostuser disk Failed. Try again? [y,n,i,f,?]: error: unsupported configuration: ats is not supported with vhostuser disk Failed. Try again? [y,n,i,f,?]: error: unsupported configuration: iommu is not supported with vhostuser disk Failed. Try again? [y,n,i,f,?]: Would you like to fix that?
The results of comment14 is from qemu-5.2.0-5.fc34.1.x86_64 libvirt v7.0.0-331-gc0ae2ca081.
Hi Han, thanks for the quick testing. You are correct and it looks like the all of the virtio specific options "iommu", "ats" and "packed" are indeed supported. I'll post a patch to fix that. For the documentation update I agree that it would be better to have an explicit list of all the valid XML combinations, but that should be done basically for the whole libvirt domain XML documentation. Sometimes we have some explicit limitation described in the documentation but it's far from ideal so I would probably leave it like it is for now.
(In reply to Pavel Hrdina from comment #16) > Hi Han, > > thanks for the quick testing. You are correct and it looks like the all of > the virtio specific options "iommu", "ats" and "packed" are indeed > supported. I'll post a patch to fix that. > OK> thank you > For the documentation update I agree that it would be better to have an > explicit list of all the valid XML combinations, but that should be done > basically for the whole libvirt domain XML documentation. Sometimes we have > some explicit limitation described in the documentation but it's far from > ideal so I would probably leave it like it is for now. For the implicit doc, I file a bug here: https://bugzilla.redhat.com/show_bug.cgi?id=1925848
Test results for the virtio fix: conf: allow virtio driver attributes for vhostuser disk https://listman.redhat.com/archives/libvir-list/2021-February/msg00510.html
One more patch needs to be backported: d3f4f01fa7 conf: allow virtio driver attributes for vhostuser disk
Tested on libvirt-7.0.0-8.module+el8.4.0+10233+8b7fd9eb.x86_64 qemu-kvm-5.2.0-10.module+el8.4.0+10217+cbdd2152.x86_64: Setup: Prepare qemu-storage-daemon: ➜ ~ qemu-storage-daemon --blockdev '{"driver":"file","filename":"/tmp/new","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' --blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"raw","file":"libvirt-1-storage"}' --export vhost-user-blk,id=vhost-user-blk0,node-name=libvirt-1-format,addr.type=unix,addr.path=/var/lib/libvirt/qemu/vhost.sock --chardev stdio,mux=on,id=char0 Change the ower of vhost sock to qemu: ➜ ~ chown qemu:qemu /var/lib/libvirt/qemu/vhost.sock As a workaround of the deny of svirt, I will set selinux to Permissive mode: ➜ ~ getenforce Permissive SC1: Start with vhostuser disk: Domain xml: ... <memoryBacking> <source type='memfd'/> <access mode='shared'/> </memoryBacking> ... <disk type='vhostuser' device='disk' snapshot='no'> <driver name='qemu' type='raw'/> <source type='unix' path='/var/lib/libvirt/qemu/vhost.sock'/> <target dev='vdb' bus='virtio'/> <alias name='virtio-disk1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </disk> ... VM started. SC2:Live attach/detach the vhostuser disk ➜ ~ virsh attach-device rhel /tmp/vhos1t.xml Device attached successfully ➜ ~ virsh detach-device rhel /tmp/vhos1t.xml Device detached successfully I have 3 questions here: 1. Is there anyway to avoid the permission deny while selinux is enforcing? I have tried changing selinux label by `chcon -u system_u -r object_r -t svirt_image_t /var/lib/libvirt/qemu/vhost.sock` but permission deny happens as well. 2. Will libvirt implement the dac and sac update of vhost socket file? Just like what it does in image file 3. For the disk xml as above, the vhost disk in guest is actually readonly. It is not expected: [root@localhost ~]# lsblk /dev/vdb NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vdb 252:16 0 100M 1 disk [root@localhost ~]# dd if=/dev/zero of=/dev/vdb dd: writing to '/dev/vdb': Operation not permitted 1+0 records in 0+0 records out 0 bytes copied, 0.0249928 s, 0.0 kB/s Is it a kernel or qemu issue?
Question3 is resolved: Miss writable=on for the --export option of rqemu-storage-daemon
(In reply to Han Han from comment #25) > Tested on libvirt-7.0.0-8.module+el8.4.0+10233+8b7fd9eb.x86_64 > qemu-kvm-5.2.0-10.module+el8.4.0+10217+cbdd2152.x86_64: > Setup: > Prepare qemu-storage-daemon: > ➜ ~ qemu-storage-daemon --blockdev > '{"driver":"file","filename":"/tmp/new","node-name":"libvirt-1-storage", > "auto-read-only":true,"discard":"unmap"}' --blockdev > '{"node-name":"libvirt-1-format","read-only":false,"driver":"raw","file": > "libvirt-1-storage"}' --export > vhost-user-blk,id=vhost-user-blk0,node-name=libvirt-1-format,addr.type=unix, > addr.path=/var/lib/libvirt/qemu/vhost.sock --chardev stdio,mux=on,id=char0 > > Change the ower of vhost sock to qemu: > ➜ ~ chown qemu:qemu /var/lib/libvirt/qemu/vhost.sock > > As a workaround of the deny of svirt, I will set selinux to Permissive mode: > ➜ ~ getenforce > Permissive > > SC1: Start with vhostuser disk: > Domain xml: > ... > <memoryBacking> > <source type='memfd'/> > <access mode='shared'/> > </memoryBacking> > ... > <disk type='vhostuser' device='disk' snapshot='no'> > <driver name='qemu' type='raw'/> > <source type='unix' path='/var/lib/libvirt/qemu/vhost.sock'/> > <target dev='vdb' bus='virtio'/> > <alias name='virtio-disk1'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x09' > function='0x0'/> > </disk> > ... > > VM started. > > SC2:Live attach/detach the vhostuser disk > ➜ ~ virsh attach-device rhel /tmp/vhos1t.xml > Device attached successfully > > ➜ ~ virsh detach-device rhel /tmp/vhos1t.xml > Device detached successfully > > > I have 3 questions here: > 1. Is there anyway to avoid the permission deny while selinux is enforcing? > I have tried changing selinux label by `chcon -u system_u -r object_r -t > svirt_image_t /var/lib/libvirt/qemu/vhost.sock` but permission deny happens > as well. The issue here is not the socket itself but the qemu-storage-daemon process that is running with incorrect selinux label. These steps will allow you to test it with enforcing: 1. set virtd_exec_t on qemu-storage-daemon: chcon -t virtd_exec_t /usr/bin/qemu-storage-daemon 2. run qemu-storage-daemon using systemd-run: systemd-run --uid qemu --gid qemu /usr/bin/qemu-storage-daemon ... 3. relabel the created socket chcon -t svirt_image_t /var/lib/libvirt/qemu/vhost.sock 4. start the VM > 2. Will libvirt implement the dac and sac update of vhost socket file? Just > like what it does in image file No, libvirt cannot simply change the selinux label for that socket automatically because libvirt doesn't manage the qemu-sotrage-daemon process. Changing the selinux label could deny qemu-storage-daemon accessing the socket.
Verified on libvirt-7.0.0-9.module+el8.4.0+10326+5e50a3b6.x86_64 qemu-kvm-5.2.0-13.module+el8.4.0+10369+fd280775.x86_64
One more thing to mention, the selinux boolean domain_can_mmap_files should be on. Otherwise, there will be selinux permission deny error on memfd
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2098