Bug 1829865

Summary: [RFE] Manage shared NVMe PCI adapters using qemu-storage-daemon
Product: Red Hat Enterprise Linux 9 Reporter: Stefan Hajnoczi <stefanha>
Component: libvirtAssignee: Virtualization Maintenance <virt-maint>
libvirt sub component: Storage QA Contact: Han Han <hhan>
Status: CLOSED WONTFIX Docs Contact:
Severity: medium    
Priority: medium CC: dyuan, jsuchane, kwolf, lmen, pkrempa, virt-maint, xuzhang
Version: 9.0Keywords: FutureFeature, Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-04-30 07:27:19 UTC Type: Feature Request
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Stefan Hajnoczi 2020-04-30 13:48:08 UTC
(This is preliminary information because this feature is still being designed and developed.)

qemu-storage-daemon is a new QEMU program that can be used to work with disk images outside a specific guest QEMU process.  It has a QMP monitor and can serve as a long-lived process that provides storage functionality.  Note that file locking still applies so only disk images not being used by guests may be opened.

The QEMU nvme:// block driver is being extended to support multiple blockdevs on a single NVMe PCI adapter (bz1827750).  This only works if the blockdevs are created in the same process, so a single NVMe PCI adapter cannot be shared by blockdevs in separate QEMU processes.  qemu-storage-daemon is a natural fit for this use case:

1. A single qemu-storage-daemon owns the NVMe PCI adapter and blockdevs are created.  The recommended way of splitting up storage areas on NVMe drives is using NVMe Namespaces created with the nvme-cli "nvme create-ns" subcommand.  Each blockdev has accesses to one Namespace:

  qemu-storage-daemon ... \
  --blockdev driver=nvme,node-name=nvme1,device=0000:01:00.0,namespace=1 \
  --blockdev driver=nvme,node-name=nvme2,device=0000:01:00.0,namespace=2 \
  --blockdev driver=raw,node-name=raw-nvme1,file=nvme1 \
  --blockdev driver=raw,node-name=raw-nvme2,file=nvme2

Some NVMe drives do not support the Namespace feature and in this case it is possible to use the raw block driver's offset= and size= parameters to split the storage up:

  qemu-storage-daemon ... \
  --blockdev driver=nvme,node-name=nvme,device=0000:01:00.0 \
  --blockdev driver=raw,node-name=raw-nvme1,file=nvme,offset=0,size=10G \
  --blockdev driver=raw,node-name=raw-nvme2,file=nvme,offset=10G,size=8G

2. The blockdevs are exported as vhost-user-blk devices.  This allows QEMU to connect to the qemu-storage-daemon.  The runtime vhost-user-blk server feature is currently being developed upstream by a community contributor.

One IOThread should be defined and all vhost-user-blk devices will be associated with it.

3. QEMU guests are launched with vhost-user-blk devices:

  qemu-system-x86_64 --name guest1 ... \
  --device vhost-user-blk-pci,...

  qemu-system-x86_64 --name guest2 ... \
  --device vhost-user-blk-pci,...

Comment 2 John Ferlan 2021-09-08 13:29:41 UTC
Bulk update: Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release.

Comment 7 RHEL Program Management 2022-04-30 07:27:19 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.