RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2040272 - [RFE] Allow passing file descriptors to qemu for disks on startup and for hotplug
Summary: [RFE] Allow passing file descriptors to qemu for disks on startup and for hot...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: libvirt
Version: unspecified
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: beta
: 9.1
Assignee: Peter Krempa
QA Contact: Han Han
URL:
Whiteboard:
Depends On:
Blocks: 2040235 2040625
TreeView+ depends on / blocked
 
Reported: 2022-01-13 10:47 UTC by Roman Mohr
Modified: 2023-05-29 07:34 UTC (History)
13 users (show)

Fixed In Version: libvirt-9.0.0-3.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-09 07:26:11 UTC
Type: Feature Request
Target Upstream Version: 9.0.0
Embargoed:


Attachments (Terms of Use)
the log, xml and script for hot-plug (35.32 KB, application/gzip)
2023-01-30 03:02 UTC, Han Han
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker LIBVIRTAT-14182 0 None None None 2023-05-04 02:56:17 UTC
Red Hat Issue Tracker LIBVIRTAT-14183 0 None None None 2023-05-04 02:56:17 UTC
Red Hat Issue Tracker LIBVIRTAT-14184 0 None None None 2023-05-04 02:56:17 UTC
Red Hat Issue Tracker RHELPLAN-107807 0 None None None 2022-01-13 10:51:09 UTC
Red Hat Product Errata RHBA-2023:2171 0 None None None 2023-05-09 07:26:51 UTC

Description Roman Mohr 2022-01-13 10:47:19 UTC
Description of problem:

Due to some limitations in Kubernetes, CNV does some mounting tricks to make disks available for containerdisks and hotplug. The details are explained in https://bugzilla.redhat.com/show_bug.cgi?id=2040235#c0.

As a consequence our node daemon has to unmount all these disks to allow the kubelet from kubernetes clean shutdowns. We have measures in place to ensure that it works properly, but we have race conditions which delay the cleanup, or in some scenarios where our node daemon is not available, even left over mounts.

In order to improve this scenario we thought about the possibility to give file descriptors to libvirt via socket ownership transfer. This would meet our two requirements from a management perspective:

 * For as long as the VM is alive, it has open file descriptors and they block any cleanup/unmount attempts of the kubelet
 * As soon as the file descriptors are closed (so when the VM is down), all unmounts can continue, without the need of our node deamon to interfere.


After some initial discussions with Peter Krempa, Kevin Wolf, et al., passing file descriptions could be made possible for disks in the following scenarios:
 * VM startup
 * hotplug
 * migrations

One requirement for us would be to be able to continue using P2P migrations (here an overview of the flags which we use right now: https://github.com/kubevirt/kubevirt/blob/1840e40b550d6b1e85ff96721b69233b7df5a964/pkg/virt-launcher/virtwrap/live-migration-source.go#L83).


KubeVirt can assist on the migration source and target with its virt-launcher, which will on source and target at some point own all the FDs and which can then provide in various ways FD mapping information as well as sockets for the actual FD ownership transfer.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Klaus Heinrich Kiwi 2022-01-27 16:58:20 UTC
@rmohr , what is the requested target for this feature to be available for CNV? in other words, by what RHEL9.x  release you expect to be able to make use of it? Thanks

Comment 2 Roman Mohr 2022-01-28 09:14:23 UTC
(In reply to Klaus Heinrich Kiwi from comment #1)
> @rmohr , what is the requested target for this feature to be
> available for CNV? in other words, by what RHEL9.x  release you expect to be
> able to make use of it? Thanks

From my perspective: The sooner we have it the faster we can start the integration work, since we can only start changing CNV once we have this. This is also not a trivial effort.

Regarding to the exact target release, Stu can you provide a target?

Comment 3 Klaus Heinrich Kiwi 2022-02-21 13:38:37 UTC
Peter, what are the perspectives for having this as part of the next libvirt, and included in RHEL 9.1?

Comment 4 Peter Krempa 2022-02-21 15:10:30 UTC
Next libvirt upstream release (8.1.0) is going into freeze this week, so that's not possible. I expect either the release after that or one more, but that's both in scope for rhel-9.1

Comment 9 Peter Krempa 2023-01-10 15:09:44 UTC
The feature was added upstream by the following commits:

d7e9093502 qemu: Fix handling of passed FDs in remoteDispatchDomainFdAssociate
fe6077585e qemuxml2*test: Enable testing of disks with 'fdgroup'
894fe89484 qemu: Enable support for FD passed disk sources
a575aa280d qemu: cgroup: Don't setup cgroups for FD-passed images
dc20b1d774 qemu: driver: Don't allow certain operations with FD-passed disks
7ce63d5a07 qemu: Prepare storage backing chain traversal code for FD passed images
6f3d13bfbd security: selinux: Handle security labelling of FD-passed images
7fceb5e168 secuirity: DAC: Don't relabel FD-passed virStorageSource images
74f3f4b93c qemu: block: Add support for passing FDs of disk images
81cbfc2fc3 qemu: Prepare data for FD-passed disk image sources
47b922f3f8 conf: storage_source: Introduce virStorageSourceIsFD
4c9ce062d3 qemu: domain: Introduce qemuDomainStartupCleanup
98bd201678 conf: Add 'fdgroup' attribute for 'file' disks
0fcdb512d4 qemuxml2argvtest: Add support for populating 'fds' in private data
f762f87534 qemu: Implement qemuDomainFDAssociate
e2670a63d2 conf: storage_source: Introduce type for storing FDs associated for storage
3ea4170551 virsh: Introduce 'dom-fd-associate' for invoking virDomainFDAssociate()
abd9025c2f lib: Introduce virDomainFDAssociate API
608c4b249e qemuxml2xmltest: Remove 'disk-backing-chain' case and output files
e2b36febdf qemuxml2argvtest: Add seclabels in <backingStore> to disk-backing-chains-(no)index
75a7a3b597 virStorageSourceIsSameLocation: Use switch statement for individual storage types
08406591ce remote_driver: Refactor few functions as example of auto-locking
8d7e3a723d remote_driver: Return 'virLockGuard' from 'remoteDriverLock'
1be393d9ad gendispatch: Add 'G_GNUC_WARN_UNUSED_RESULT' to output of 'aclheader'
aa47051bf4 virclosecallbacks: Remove old close callbacks code
38607ea891 qemuMigrationSrcBeginResumePhase: Remove unused 'driver' argument
8187c0ed94 qemuMigrationSrcIsAllowed: Remove unused 'driver' argument
aa8e187fa9 qemu: Use new connection close callbacks API
ba6f53d778 bhyve: Use new connection close callbacks API
e74bb402e4 lxc: Use new connection close callbacks API
cb195c19b7 virclosecallbacks: Add new close callbacks APIs
2cb13113c2 conf: domain: Add helper infrastructure for new connection close callbacks
e88593ba39 conf: virdomainobjlist: Remove return value from virDomainObjListCollect
cd3599c876 conf: virdomainobjlist: Introduce 'virDomainObjListCollectAll'
f52bc2d54a conf: virdomainobjlist: Convert header to contemporary style
0cd318ce16 datatypes: Clean up whitespace in definition of struct _virConnect
3de56902d3 datatypes: Simplify error path of 'virGetDomain'

v9.0.0-rc1-4-gd7e9093502

Comment 10 Han Han 2023-01-20 04:36:15 UTC
Run basic tests on libvirt libvirt-9.0.0-1.el9.x86_64 python3-libvirt-9.0.0-1.el9.x86_64 qemu-kvm-7.2.0-5.el9.x86_64
1. Associate the FD of disk to a domain
2. Start the domain with that disk
3. Detach the disk

#!/usr/bin/python3

from os import path
import subprocess
import time
from lxml import etree as et
from io import StringIO
import libvirt


DOM = 'rhel-ovmf-9.2'
FILE = '/tmp/vdb'
FDGROUP = 'test'

DISK_XML_TEMPL = '''<disk type="file" device="disk">
  <driver name="qemu" type="raw"/>
  <source file="{0}" fdgroup="{2}"/>
  <backingStore/>
  <target dev="{1}" bus="virtio"/>
</disk>'''


subprocess.run("qemu-img create {0} 100M".format(FILE).split())

with libvirt.open() as conn:
    domain = conn.lookupByName(DOM)
    with open(FILE, "w+b") as f_obj:
        fds = [f_obj.fileno()]
        domain.FDAssociate(FDGROUP, fds, libvirt.VIR_DOMAIN_FD_ASSOCIATE_SECLABEL_RESTORE | libvirt.VIR_DOMAIN_FD_ASSOCIATE_SECLABEL_WRITABLE)
        disk_xml = DISK_XML_TEMPL.format(FILE, path.basename(FILE), FDGROUP)
        domain.attachDeviceFlags(disk_xml, libvirt.VIR_DOMAIN_AFFECT_CONFIG)
        domain.create()
        time.sleep(50)
        domain.detachDevice(disk_xml)
        domain.detachDeviceFlags(disk_xml, libvirt.VIR_DOMAIN_AFFECT_CONFIG)


The running VM vdb disk XML is:
<disk type="file" device="disk">
  <driver name="qemu" type="raw"/>
  <source file="/tmp/vdb" fdgroup="test" index="1"/>
  <backingStore/>
  <target dev="vdb" bus="virtio"/>
  <alias name="virtio-disk1"/>
  <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
</disk>


Work as expected

Comment 11 Han Han 2023-01-30 03:02:28 UTC
Created attachment 1941037 [details]
the log, xml and script for hot-plug

Run it as run-fd_associate.py:
1. Open the disk file
2. Create a VM
3. Assign the fd to VM by FDAssociate
4. Attach the disk to VM
The results:
Formatting '/tmp/vdb', fmt=raw size=104857600
libvirt: QEMU Driver error : internal error: unable to execute QEMU command 'blockdev-add': Could not dup FD for /dev/fdset/1 flags 2: No such file or directory
Traceback (most recent call last):
  File "/root/./run-fd_associate.py", line 33, in <module>
    domain.attachDeviceFlags(disk_xml, 0)
  File "/usr/lib64/python3.9/site-packages/libvirt.py", line 716, in attachDeviceFlags
    raise libvirtError('virDomainAttachDeviceFlags() failed')
libvirt.libvirtError: internal error: unable to execute QEMU command 'blockdev-add': Could not dup FD for /dev/fdset/1 flags 2: No such file or directory

Comment 12 Han Han 2023-01-30 03:03:49 UTC
Peter, please check the results of comment11.
Version: libvirt-9.0.0-2.el9.x86_64 python3-libvirt-9.0.0-1.el9.x86_64 qemu-kvm-7.2.0-5.el9.x86_64

Comment 13 Peter Krempa 2023-01-31 11:25:19 UTC
Oops, I must have misplaced the hunk which actually passes the FDs on hotplug. Do you want to file another BZ to track this part?

Comment 14 Han Han 2023-02-01 02:48:34 UTC
(In reply to Peter Krempa from comment #13)
> Oops, I must have misplaced the hunk which actually passes the FDs on
> hotplug. Do you want to file another BZ to track this part?

Not needed. Please just fix it here and update the "Fixed In Version"

Comment 15 Peter Krempa 2023-02-01 09:08:29 UTC
Fixes for hotplug pushed upstream:

3b8d669d55 qemu: block: Properly handle FD-passed disk hot-(un-)plug
f730b1e4f2 qemu: domain: Store fdset ID for disks passed to qemu via FD
5598c10c64 qemu: fd: Add helpers allowing storing FD set data in status XML
3b7b201b95 qemuFDPassTransferCommand: Mark that FD was passed
65f14232fb qemu: command: Handle FD passing commandline via qemuBuildBlockStorageSourceAttachDataCommandline
531adf3274 qemuStorageSourcePrivateDataFormat: Rename 'tmp' to 'objectsChildBuf'
51dc38fe31 qemu_fd: Remove declaration for 'qemuFDPassNewDirect'

Comment 20 Han Han 2023-02-03 06:13:42 UTC
Hot-plug test passes as comment11 on libvirt-9.0.0-3.el9.x86_64 qemu-kvm-7.2.0-6.el9.x86_64

Comment 21 Han Han 2023-02-13 12:08:17 UTC
Hi Peter, are there anyways to test hot-plug/vm create/migrate with dom-fd-associate by virsh?
I test as the following on libvirt-9.0.0-4.el9.x86_64 qemu-kvm-7.2.0-8.el9.x86_64. But it doesn't work for hot-plug:
➜  ~ cat /tmp/vdb.xml 
<disk type="file" device="disk">
  <driver name="qemu" type="raw"/>
  <source file="/tmp/vdb" fdgroup="test"/>
  <backingStore/>
  <target dev="vdb" bus="virtio"/>
</disk>

➜  ~ virsh list 
 Id   Name       State
--------------------------
 2    rhel-9.2   running

➜  ~ exec 3<> /tmp/vdb            
➜  ~ virsh -k0 -K0 dom-fd-associate rhel-9.2 test 3
                     
➜  ~ virsh -k0 -K0 attach-device rhel-9.2 /tmp/vdb.xml      
error: Failed to attach device from /tmp/vdb.xml                
error: invalid argument: file descriptor group 'test' was not associated with the domain                                         
                                           
➜  ~ lsof /tmp/vdb                         
COMMAND  PID USER   FD   TYPE DEVICE  SIZE/OFF     NODE NAME
zsh     4078 root    3u   REG  252,4 104857600 25176394 /tmp/vdb


I checked the description of virDomainFDAssociate(https://gitlab.com/libvirt/libvirt/-/blob/master/src/libvirt-domain.c#L13985), it says:
"The FDs are associated as long as the connection used to associated exists and are disposed of afterwards."

So for virsh, is there any way to keep the connection of dom-fd-associate and use it afterwards?

Comment 22 Peter Krempa 2023-02-13 13:04:36 UTC
(In reply to Han Han from comment #21)
> Hi Peter, are there anyways to test hot-plug/vm create/migrate with
> dom-fd-associate by virsh?

[...]

> "The FDs are associated as long as the connection used to associated exists
> and are disposed of afterwards."
> 
> So for virsh, is there any way to keep the connection of dom-fd-associate
> and use it afterwards?

With 'virsh' you have to use the interactive mode or batch multiple commands at once e.g.:

 # virsh "dom-fd-associate --domain cd --name testcd --pass-fds 4 ; start cd"  4<>/tmp/ble

For migration you need to remember that the FDs need to be associated with the destination daemon, but virsh initiates the migration from the source side, so you'll need to have another instance of virsh.

Comment 23 Han Han 2023-02-17 08:10:35 UTC
For disk attaching and creating VM with fdgroup, test them on libvirt-9.0.0-5.el9.x86_64 qemu-kvm-7.2.0-8.el9.x86_64, PASS.
For migration, save&restore, managedsave, and VM start. Test them as the following:
1. migration
1.0 Open the disk file as fd:
(src)➜  ~ exec 3<> /mnt/vdb
1.1. On the src host, start a VM with disks on shared nfs storage, with fdgroup
(src)➜  ~ virsh "dom-fd-associate rhel-9.2 test 3 --seclabel-restore --seclabel-writable; start rhel"                            

Domain 'rhel' started

1.2. Define the same VM on dst host and associate the fd with the just defined VM. The keep the connection
(src) ➜  ~ virsh dumpxml rhel > /mnt/rhel.xml

(dst)➜  ~ exec 3<> /mnt/vdb

(dst)➜  ~ virsh define /mnt/rhel.xml 
Domain 'rhel' defined from /mnt/rhel.xml

(dst)➜  ~ virsh
virsh # dom-fd-associate rhel test 3

1.3 Migrate the VM to the dst host:
(src) ➜  ~ virsh migrate rhel qemu+ssh://vm-10-0-79-60.hosted.upshift.rdu2.redhat.com/system --live --verbose --p2p 
Migration: [100 %]


2. Keep the connection of dom-fd-associate. Test managedsave & start
(dst)➜  ~ virsh managedsave rhel           

Domain 'rhel' state saved by libvirt

(dst)➜  ~ virsh start rhel  
Domain 'rhel' started

➜  ~ virsh dumpxml rhel --xpath //disk               
<disk type="file" device="disk">
  <driver name="qemu" type="qcow2"/>
  <source file="/mnt/rhel.qcow2" index="2"/>
  <backingStore/>
  <target dev="vda" bus="virtio"/>
  <alias name="virtio-disk0"/>
  <address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
</disk>
<disk type="file" device="disk">
  <driver name="qemu" type="raw"/>
  <source file="/mnt/vdb" fdgroup="test" index="1"/>
  <backingStore/>
  <target dev="vdb" bus="virtio"/>
  <alias name="virtio-disk1"/>
  <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
</disk>

3. Keep the fd-associate connection and test save&restore
➜  ~ virsh save rhel /tmp/rhel                                                       
Domain 'rhel' saved to /tmp/rhe

➜  ~ virsh restore /tmp/rhel                
Domain restored from /tmp/rhel

➜  ~ virsh dumpxml rhel --xpath //disk                                        
<disk type="file" device="disk">
  <driver name="qemu" type="qcow2"/>
  <source file="/mnt/rhel.qcow2" index="2"/>
  <backingStore/>
  <target dev="vda" bus="virtio"/>
  <alias name="virtio-disk0"/>
  <address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
</disk>
<disk type="file" device="disk">
  <driver name="qemu" type="raw"/>
  <source file="/mnt/vdb" fdgroup="test" index="1"/>
  <backingStore/>
  <target dev="vdb" bus="virtio"/>
  <alias name="virtio-disk1"/>
  <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
</disk>

Comment 25 errata-xmlrpc 2023-05-09 07:26:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (libvirt bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2171


Note You need to log in before you can comment on or make changes to this bug.