Description of problem: Observed in logs from https://bugzilla.redhat.com/show_bug.cgi?id=1927136 Nova issues the following libvirt APIs: (trimmed irrelevant libvirt stuff from between) 2021-02-10 07:02:22.164+0000: 166314: debug : virDomainFSFreeze:11329 : dom=0x7f695000ce40, (VM: name=instance-00000018, uuid=0129f9e7-3016-496b-baa9-2cfc3d57414f), mountpoints=(nil), nmountpoints=0, flags=0x0 2021-02-10 07:02:24.097+0000: 166311: debug : virDomainSnapshotCreateXML:221 : dom=0x7f690c009da0, (VM: name=instance-00000018, uuid=0129f9e7-3016-496b-baa9-2cfc3d57414f), xmlDesc=<domainsnapshot> <disks> <disk name="/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.6296bdcb-2cd4-4be5-921e-362930b2bcea" snapshot="external" type="file"> <source file="/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.feb74d31-e3ac-4a14-b077-a4253df148c6"/> </disk> </disks> </domainsnapshot> , flags=0x74 2021-02-10 07:02:24.133+0000: 166314: debug : virDomainSnapshotCreateXML:221 : dom=0x7f695000ce40, (VM: name=instance-00000018, uuid=0129f9e7-3016-496b-baa9-2cfc3d57414f), xmlDesc=<domainsnapshot> <disks> <disk name="/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.6296bdcb-2cd4-4be5-921e-362930b2bcea" snapshot="external" type="file"> <source file="/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.feb74d31-e3ac-4a14-b077-a4253df148c6"/> </disk> </disks> </domainsnapshot> , flags=0x34 2021-02-10 07:02:25.752+0000: 166315: debug : virDomainFSThaw:11371 : dom=0x7f694c00cf30, (VM: name=instance-00000018, uuid=0129f9e7-3016-496b-baa9-2cfc3d57414f), flags=0x0 Flags for virDomainSnapshotCreateXML have following meaning: VIR_DOMAIN_SNAPSHOT_CREATE_REDEFINE = 1 (0x1; 1 << 0) Restore or alter metadata VIR_DOMAIN_SNAPSHOT_CREATE_CURRENT = 2 (0x2; 1 << 1) With redefine, make snapshot current VIR_DOMAIN_SNAPSHOT_CREATE_NO_METADATA = 4 (0x4; 1 << 2) Make snapshot without remembering it VIR_DOMAIN_SNAPSHOT_CREATE_HALT = 8 (0x8; 1 << 3) Stop running guest after snapshot VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY = 16 (0x10; 1 << 4) disk snapshot, not full system VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT = 32 (0x20; 1 << 5) reuse any existing external files VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE = 64 (0x40; 1 << 6) use guest agent to quiesce all mounted file systems within the domain VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC = 128 (0x80; 1 << 7) atomically avoid partial changes VIR_DOMAIN_SNAPSHOT_CREATE_LIVE = 256 (0x100; 1 << 8)create the snapshot while the guest is running VIR_DOMAIN_SNAPSHOT_CREATE_VALIDATE = 512 (0x200; 1 << 9) validate the XML against the schema The first thing that happens is a virDomainFSFreeze, then virDomainSnapshotCreateXML which asserts VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE flag. This call (always) fails because the filesystems are already frozen. Then virDomainSnapshotCreateXML is issued without VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE, which succeeds. The filesystems are then unfrozen via virDomainFSThaw. The above operations don't make sense. If an explicit virDomainFSFreeze is used, there's no point in using VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE which will actually always fail, since the guest agent doesn't allow a double freeze. (Note there's also a bug in libvirt where filesystems are thawed in the failed virDomainSnapshotCreateXML invocation see https://bugzilla.redhat.com/show_bug.cgi?id=1928819 ) Version-Release number of selected component (if applicable): libvirt-daemon-7.0.0-3.module+el8.4.0+9709+a99efd61.x86_64 qemu-kvm-5.2.0-4.module+el8.4.0+9676+589043b9.x86_64 kernel: 4.18.0-282.el8.x86_64 OSP16.2: openstack-nova-compute-20.4.2-2.20201224134938.81a3f4b.el8ost.1.noarch How reproducible: Steps to Reproduce: 1. See steps in https://bugzilla.redhat.com/show_bug.cgi?id=1927136 2. 3.
Looking a the nova-compute's exception fragment[1], this is coming from the _volume_snapshot_create() method in Nova's libvirt driver, where the following seems to be the logic. Before taking a snapshot, the _volume_snapshot_create() method checks if we can quiesce the guest: - if the guest is capable of quiescing, then it tries guest.snapshot() with the "quiesce=True" [...] # if the user requests (by specifying as a parameter on the template image from which the guest is booting) to have quiesce be part of the snapshot, and if Nova can't honour that, then raise an error - but if the guest is _not_ capable of quiescing, then the guest.snapshot() call is re-tried with "quiesce=False" It was introduced in this Nova commit[3] to fix a bug where Nova was attempting to quiesce when doing a volume (i.e. a detachable block device) snapshot without checking if the guest is _capable_ of quiescing or not. [1] exception fragment from nova-compute.log --------------------------------------------------------- [...] 2021-02-10 06:00:55.038 7 ERROR nova.virt.libvirt.driver [instance: 72786c63-160a-44d0-941e-3ce056afebe2] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 2897, in _volume_snapshot_create 2021-02-10 06:00:55.038 7 ERROR nova.virt.libvirt.driver [instance: 72786c63-160a-44d0-941e-3ce056afebe2] reuse_ext=True, quiesce=True) [...] 2021-02-10 05:59:29.293 7 ERROR nova.virt.libvirt.driver [instance: 72786c63-160a-44d0-941e-3ce056afebe2] File "/usr/lib64/python3.6/site-packages/libvirt.py", line 2814, in snapshotCreate XML 2021-02-10 05:59:29.293 7 ERROR nova.virt.libvirt.driver [instance: 72786c63-160a-44d0-941e-3ce056afebe2] if ret is None:raise libvirtError('virDomainSnapshotCreateXML() failed', dom=sel f) 2021-02-10 05:59:29.293 7 ERROR nova.virt.libvirt.driver [instance: 72786c63-160a-44d0-941e-3ce056afebe2] libvirt.libvirtError: internal error: unable to execute QEMU agent command 'guest-fs freeze-freeze': The command guest-fsfreeze-freeze has been disabled for this instance [...] --------------------------------------------------------- [2] https://github.com/openstack/nova/blob/308c6007dcbced/nova/virt/libvirt/driver.py#L2791,#L2820 [3] https://opendev.org/openstack/nova/commit/e659a6e7cbb30 (libvirt: check if we can quiesce before volume-backed snapshot; 2016-09-30)
The problem isn't that 'guest.snapshot(quiesce=True)' is followed by 'guest.snapshot(quiesce=False)' if the former fails. That is a reasonable algorithm when the quiescing is done as integral part of the libvirt snapshot API. The problem lies with an explicit quiesce done via 'virDomainFsFreeze' (https://github.com/openstack/nova/blob/308c6007dcbced8f4e97b1712ade66b27949b712/nova/virt/libvirt/guest.py#L546) followed by a snapshot with quiesce=True, in libvirt terms virDomainFsFreeze, virDomainSnapshotCreateXML(...,VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE). The qemu guest agent doesn't allow quiescing/freezing if the filesystems are already frozen, so the snapshot with the quiescing enabled will always fail if the filesystems are already quiesced. I've also updated the libvirt docs https://gitlab.com/libvirt/libvirt/-/commit/ec86b8fa29fa97b51382eb19ca2355c87dfcc38f to promote use of explicit quiescing.
While the reported behavior isn't great, as far as I understand it the impact is minimal and not user-visible. Being realistic, we'll never get around to fixing this (and it's been 4 years since the bug report anyways). Closing.