Bug 1886123

Summary: [RFE] Supporting vDPA block in QEMU
Product: Red Hat Enterprise Linux 9 Reporter: Stefano Garzarella <sgarzare>
Component: qemu-kvmAssignee: Stefano Garzarella <sgarzare>
qemu-kvm sub component: virtio-blk,scsi QA Contact: qing.wang <qinwang>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: coli, jinzhao, jjongsma, juzhang, kwolf, mrezanin, qinwang, vgoyal, virt-maint
Version: unspecifiedKeywords: FutureFeature, Reopened, Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-8.0.0-1.el9 Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-11-07 08:26:38 UTC Type: Feature Request
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2166106, 2213317    
Bug Blocks: 1900770    

Description Stefano Garzarella 2020-10-07 17:04:51 UTC
We will add the support of vDPA block devices in QEMU.

It will support both hardware and software vDPA implementations, exposing a virtio block device to the guest, that can use the standard virtio-blk device driver to access both of them.

Comment 9 John Ferlan 2021-09-09 11:43:15 UTC
Bulk update: Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release.

Comment 11 RHEL Program Management 2022-04-07 07:27:26 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 23 Jonathon Jongsma 2023-07-11 15:30:20 UTC
Testing this in centos-stream, it doesn't seem that the vhost-vdpa driver was actually enabled

$ rpm -q qemu-kvm
qemu-kvm-8.0.0-6.el9.x86_64

$ rpm -q qemu-kvm-block-blkio
qemu-kvm-block-blkio-8.0.0-6.el9.x86_64

$ /usr/libexec/qemu-kvm -device virtio-blk-pci,id=src1,drive=drive_src1 -blockdev node-name=drive_src1,driver=virtio-blk-vhost-vdpa,path=/dev/vhost-vdpa-0,cache.direct=on
qemu-kvm: -blockdev node-name=drive_src1,driver=virtio-blk-vhost-vdpa,path=/dev/vhost-vdpa-0,cache.direct=on: Unknown driver 'virtio-blk-vhost-vdpa'

Comment 24 Jonathon Jongsma 2023-07-11 15:49:59 UTC
This may be unrelated (since I get the same 'Unknown driver' error for e.g. virtio-blk-vfio-pci): I notice that in the downstream qemu patch[0] that enables the libblkio drivers, the driver whitelist in the spec file refers to a driver named 'virtio-blk-vdpa-blk'. As far as I can tell, this should actually be 'virtio-blk-vhost-vdpa'?

[0] https://gitlab.com/redhat/centos-stream/src/qemu-kvm/-/commit/fd437e474f50e49d76ca9c092577c9ef0eb37eb7

Comment 25 Stefano Garzarella 2023-07-11 15:55:09 UTC
Right, we need to enable blkio driver in QEMU: BZ2213317
I just set the dependency.

There was an issue with the QEMU's blkio module, but Stefan just sent the MR downstream to fix that issue: https://gitlab.com/redhat/centos-stream/src/qemu-kvm/-/merge_requests/181

Comment 26 Stefano Garzarella 2023-07-20 09:47:52 UTC
As we discussed in https://bugzilla.redhat.com/show_bug.cgi?id=2213317#c29, we need to use hugepages, since vhost-vdpa devices still require pinning all the guest pages.
So using 4k pages can take a lot of time.

This requirement may disappear if we will backport commit 4bb94d2de2fa ("vdpa_sim: add support for user VA") [1] for the vDPA simulators but in general HW device don't support this yet, so for now my advice is to use hugepages with vhost-vdpa.

Something like this:

  # allocate 3100 hugepages (2M): 6G/2M = 3072 + some extra pages
  echo 3100  > /proc/sys/vm/nr_hugepages

  /usr/libexec/qemu-kvm \
  ...
  -machine q35,memory-backend=mem \
  -object memory-backend-file,share=on,id=mem,size=6G,mem-path=/dev/hugepages/libvirt/qemu \

Or as Qing suggested a memory-backend-memfd since Linux should support transparent huge-pages of shmem/memfd memory:

  /usr/libexec/qemu-kvm \
  ...
  -machine q35,memory-backend=mem \
  -object memory-backend-memfd,id=mem,size=6G,share=on \


We also need to increase memlock limits in /etc/security/limits.conf:

* soft memlock unlimited
* hard memlock unlimited

Comment 27 qing.wang 2023-07-24 10:49:27 UTC
The main function passed on 
Red Hat Enterprise Linux release 9.3 Beta (Plow)
5.14.0-340.el9.x86_64
qemu-kvm-8.0.0-8.el9.x86_64
seabios-bin-1.16.1-1.el9.noarch
edk2-ovmf-20230524-2.el9.noarch
libvirt-9.3.0-2.el9.x86_64




Test steps:
1. parare vhost vdpa disks on host

  modprobe vhost-vdpa
  modprobe vdpa-sim-blk
  vdpa dev add mgmtdev vdpasim_blk name blk0
  vdpa dev add mgmtdev vdpasim_blk name blk1
  vdpa dev list -jp
  ls /dev/vhost-vdpa*
  [ $? -ne 0 ] && echo "wrong create vdpa device"

2. boot VM with virtio-blk-vhost-vdpa driver
/usr/libexec/qemu-kvm \
  -name testvm \
  -machine q35,memory-backend=mem \
  -object memory-backend-memfd,id=mem,size=6G,share=on \
  -m  6G \
  -smp 2 \
  -cpu host,+kvm_pv_unhalt \
  -device ich9-usb-ehci1,id=usb1 \
  -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
   \
   \
  -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x3,chassis=1 \
  -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x3.0x1,bus=pcie.0,chassis=2 \
  -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x3.0x2,bus=pcie.0,chassis=3 \
  -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x3.0x3,bus=pcie.0,chassis=4 \
  -device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x3.0x4,bus=pcie.0,chassis=5 \
  -device pcie-root-port,id=pcie-root-port-5,port=0x5,addr=0x3.0x5,bus=pcie.0,chassis=6 \
  -device pcie-root-port,id=pcie-root-port-6,port=0x6,addr=0x3.0x6,bus=pcie.0,chassis=7 \
  -device pcie-root-port,id=pcie-root-port-7,port=0x7,addr=0x3.0x7,bus=pcie.0,chassis=8 \
  -device pcie-root-port,id=pcie_extra_root_port_0,bus=pcie.0,addr=0x4  \
  -object iothread,id=iothread0 \
  -device virtio-scsi-pci,id=scsi0,bus=pcie-root-port-0,iothread=iothread0 \
  -blockdev driver=qcow2,file.driver=file,cache.direct=off,cache.no-flush=on,file.filename=/home/kvm_autotest_root/images/rhel930-64-virtio-scsi.qcow2,node-name=drive_image1,file.aio=threads   \
  -device scsi-hd,id=os,drive=drive_image1,bus=scsi0.0,bootindex=0,serial=OS_DISK   \
  \
  -blockdev node-name=prot_stg0,driver=virtio-blk-vhost-vdpa,path=/dev/vhost-vdpa-0,cache.direct=on \
  -blockdev node-name=fmt_stg0,driver=raw,file=prot_stg0 \
  -device virtio-blk-pci,iothread=iothread0,bus=pcie-root-port-4,addr=0,id=stg0,drive=fmt_stg0,bootindex=1 \
  \
  -blockdev node-name=prot_stg1,driver=virtio-blk-vhost-vdpa,path=/dev/vhost-vdpa-1,cache.direct=on \
  -blockdev node-name=fmt_stg1,driver=raw,file=prot_stg1 \
  -device scsi-hd,id=stg1,drive=fmt_stg1,bootindex=2 \
  -vnc :5 \
  -monitor stdio \
  -qmp tcp:0:5955,server=on,wait=off \
  -device virtio-net-pci,mac=9a:b5:b6:b1:b2:b7,id=nic1,netdev=nicpci,bus=pcie-root-port-7 \
  -netdev tap,id=nicpci \
  -boot menu=on,reboot-timeout=1000,strict=off \
  \
  -chardev socket,id=socket-serial,path=/var/tmp/socket-serial,logfile=/var/tmp/file-serial.log,mux=on,server=on,wait=off \
  -serial chardev:socket-serial \
  -chardev file,path=/var/tmp/file-bios.log,id=file-bios \
  -device isa-debugcon,chardev=file-bios,iobase=0x402 \
  \
  -chardev socket,id=socket-qmp,path=/var/tmp/socket-qmp,logfile=/var/tmp/file-qmp.log,mux=on,server=on,wait=off \
  -mon chardev=socket-qmp,mode=control \
  -chardev socket,id=socket-hmp,path=/var/tmp/socket-hmp,logfile=/var/tmp/file-hmp.log,mux=on,server=on,wait=off \
  -mon chardev=socket-hmp,mode=readline \

3. login guest and execute simple io
cat guest_io.sh 

function tests_failed() {
        exit_code="$?"
        echo "Test failed: $1"
        exit "${exit_code}"
}
vdpa_devs=`lsblk -nd|grep 128M|awk '{print $1}'`
echo ${vdpa_devs}

for dev in ${vdpa_devs};do
  echo "$dev"
  mkfs.xfs -f /dev/${dev} || tests_failed "format"
  mkdir -p /home/${dev}
  mount /dev/${dev} /home/${dev} || tests_failed "mount"
  dd if=/dev/zero of=/home/${dev}/test.img count=100 bs=1M oflag=direct || tests_failed "IO"
  umount -fl /home/${dev}
done

4.unplug disks

{"execute": "device_del", "arguments": {"id": "stg1"}}

{"execute": "blockdev-del","arguments": {"node-name": "fmt_stg1"}}
{"execute": "blockdev-del","arguments": {"node-name":"prot_stg1"}}

{"execute": "device_del", "arguments": {"id": "stg0"}}

{"execute": "blockdev-del","arguments": {"node-name": "fmt_stg0"}}
{"execute": "blockdev-del","arguments": {"node-name":"prot_stg0"}}

5. check disks in guest, the disks should disappear

6. plug disks

{"execute": "blockdev-add", "arguments": {"node-name": "prot_stg0", "driver": "virtio-blk-vhost-vdpa",  "path": "/dev/vhost-vdpa-0","cache": {"direct": true, "no-flush": false}}}
{"execute": "blockdev-add", "arguments": {"node-name": "fmt_stg0", "driver": "raw",   "file": "prot_stg0"}}
{"execute": "device_add", "arguments": {"driver": "virtio-blk-pci", "id": "stg0", "drive": "fmt_stg0","bus":"pcie-root-port-4"}}

{"execute": "blockdev-add", "arguments": {"node-name": "prot_stg1", "driver": "virtio-blk-vhost-vdpa",  "path": "/dev/vhost-vdpa-1","cache": {"direct": true, "no-flush": false}}}
{"execute": "blockdev-add", "arguments": {"node-name": "fmt_stg1", "driver": "raw",   "file": "prot_stg1"}}
{"execute": "device_add", "arguments": {"driver": "scsi-hd", "id": "stg1", "drive": "fmt_stg1"}}

7.check disks in guest, the disks should exist

8. run io like step 3

Comment 28 qing.wang 2023-07-27 14:43:25 UTC
Also Test blockdev options with driver virtio-blk-vhost-vdpa


/usr/libexec/qemu-kvm \
  -name testvm \
  -machine q35,memory-backend=mem \
  -object memory-backend-memfd,id=mem,size=6G,share=on \
  -m  6G \
  -smp 2 \
  -cpu host,+kvm_pv_unhalt \
  -device ich9-usb-ehci1,id=usb1 \
  -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
   \
   \
  -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x3,chassis=1 \
  -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x3.0x1,bus=pcie.0,chassis=2 \
  -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x3.0x2,bus=pcie.0,chassis=3 \
  -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x3.0x3,bus=pcie.0,chassis=4 \
  -device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x3.0x4,bus=pcie.0,chassis=5 \
  -device pcie-root-port,id=pcie-root-port-5,port=0x5,addr=0x3.0x5,bus=pcie.0,chassis=6 \
  -device pcie-root-port,id=pcie-root-port-6,port=0x6,addr=0x3.0x6,bus=pcie.0,chassis=7 \
  -device pcie-root-port,id=pcie-root-port-7,port=0x7,addr=0x3.0x7,bus=pcie.0,chassis=8 \
  -device pcie-root-port,id=pcie_extra_root_port_0,bus=pcie.0,addr=0x4  \
  -object iothread,id=iothread0 \
  -device virtio-scsi-pci,id=scsi0,bus=pcie-root-port-0,iothread=iothread0 \
  -blockdev driver=qcow2,file.driver=file,cache.direct=off,cache.no-flush=on,file.filename=/home/kvm_autotest_root/images/rhel930-64-virtio-scsi.qcow2,node-name=drive_image1,file.aio=threads   \
  -device scsi-hd,id=os,drive=drive_image1,bus=scsi0.0,bootindex=0,serial=OS_DISK   \
  \
  -blockdev node-name=prot_stg0,driver=virtio-blk-vhost-vdpa,path=/dev/vhost-vdpa-0,cache.direct=on,discard=unmap,detect-zeroes=on \
  -blockdev node-name=fmt_stg0,driver=raw,file=prot_stg0 \
  -device virtio-blk-pci,iothread=iothread0,share-rw=on,serial=data0,bus=pcie-root-port-4,addr=0,id=stg0,drive=fmt_stg0,bootindex=1 \
  \
  -blockdev node-name=prot_stg1,driver=virtio-blk-vhost-vdpa,path=/dev/vhost-vdpa-1,cache.direct=on,auto-read-only=on,read-only=off,force-share=off \
  -blockdev node-name=fmt_stg1,driver=raw,file=prot_stg1 \
  -device scsi-hd,id=stg1,drive=fmt_stg1,share-rw=on,serial=data1,bootindex=2 \
  -vnc :5 \
  -monitor stdio \
  -qmp tcp:0:5955,server=on,wait=off \
  -device virtio-net-pci,mac=9a:b5:b6:b1:b2:b7,id=nic1,netdev=nicpci,bus=pcie-root-port-7 \
  -netdev tap,id=nicpci \
  -boot menu=on,reboot-timeout=1000,strict=off \
  \
  -chardev socket,id=socket-serial,path=/var/tmp/socket-serial,logfile=/var/tmp/file-serial.log,mux=on,server=on,wait=off \
  -serial chardev:socket-serial \
  -chardev file,path=/var/tmp/file-bios.log,id=file-bios \
  -device isa-debugcon,chardev=file-bios,iobase=0x402 \
  \
  -chardev socket,id=socket-qmp,path=/var/tmp/socket-qmp,logfile=/var/tmp/file-qmp.log,mux=on,server=on,wait=off \
  -mon chardev=socket-qmp,mode=control \
  -chardev socket,id=socket-hmp,path=/var/tmp/socket-hmp,logfile=/var/tmp/file-hmp.log,mux=on,server=on,wait=off \
  -mon chardev=socket-hmp,mode=readline \

Comment 30 errata-xmlrpc 2023-11-07 08:26:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6368