Description of problem: Failed to delete snapshot after restarting VM Version-Release number of selected component (if applicable): libvirt-daemon-7.0.0-3.module+el8.4.0+9709+a99efd61.x86_64 qemu-kvm-5.2.0-4.module+el8.4.0+9676+589043b9.x86_64 kernel: 4.18.0-282.el8.x86_64 OSP16.2: openstack-nova-compute-20.4.2-2.20201224134938.81a3f4b.el8ost.1.noarch How reproducible: 100% Steps to Reproduce: 1. Install OSP16.2(with RHEL-AV 8.4.0) on baremetal with 1 controller, 3 compute nodes, cinder is using nfs storage. 2. Create volume,network,flavor, boot a VM from volume: asb-vm-qcow2-vol in OSP. $ openstack server list +--------------------------------------+------------------+--------+----------- | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+------------------+--------+------------------------+-------+--------+ | 0129f9e7-3016-496b-baa9-2cfc3d57414f | asb-vm-qcow2-vol | ACTIVE | asb-net1=192.168.32.46 | | | +--------------------------------------+------------------+--------+----------- 3. Login to the web console, "Compute" -> "Instances", create 4 snapshots for this VM: s1,s2,s3,s4. check "Volumes" -> "Snapshots", there are 4 snapshots: snapshot for s1 to s4, check the VM disk xml is as below: ------------------------------------------------------------- <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='native'/> <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.2fe4efce-26f7-4cb9-9472-ec6cc2b4ce73' index='6'/> <backingStore type='file' index='5'> <format type='qcow2'/> <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.ed92b7bb-78bb-449b-ae9a-c89ae3ac6924'/> <backingStore type='file' index='4'> <format type='qcow2'/> <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.54b74790-1122-4e48-b1cf-bca1f10c96fa'/> <backingStore type='file' index='3'> <format type='qcow2'/> <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.feb74d31-e3ac-4a14-b077-a4253df148c6'/> <backingStore type='file' index='1'> <format type='qcow2'/> <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.6296bdcb-2cd4-4be5-921e-362930b2bcea'/> <backingStore type='file' index='2'> <format type='raw'/> <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67'/> <backingStore/> </backingStore> ...... </disk> 4. On web console, "Volumes" -> "Snapshots", delete the volume snapshot: "snapshot for s2" successfully, check the VM xml is as below, the "volume-880e38be-1905-470b-86c0-7a98783e8a67.54b74790-1122-4e48-b1cf-bca1f10c96fa" is deleted. ------------------------------------------------------------- <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='native'/> <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.2fe4efce-26f7-4cb9-9472-ec6cc2b4ce73' index='6'/> <backingStore type='file' index='5'> <format type='qcow2'/> <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.ed92b7bb-78bb-449b-ae9a-c89ae3ac6924'/> <backingStore type='file' index='3'> <format type='qcow2'/> <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.feb74d31-e3ac-4a14-b077-a4253df148c6'/> <backingStore type='file' index='1'> <format type='qcow2'/> <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.6296bdcb-2cd4-4be5-921e-362930b2bcea'/> <backingStore type='file' index='2'> <format type='raw'/> <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67'/> <backingStore/> </backingStore> ...... </disk> 5. On web console, "Compute" -> "Instances", select: asb-vm-qcow2-vol, click "Shutoff the Instance", check it's Shutoff, then click "Start Instance", check the VM xml is as below: ------------------------------------------------------------- <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='native'/> <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.6296bdcb-2cd4-4be5-921e-362930b2bcea' index='1'/> <backingStore type='file' index='2'> <format type='raw'/> <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67'/> <backingStore/> </backingStore> <target dev='vda' bus='virtio'/> <serial>880e38be-1905-470b-86c0-7a98783e8a67</serial> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/> </disk> 6. On web console, "Volumes" -> "Snapshots", try to delete volume snapshot:"snapshot for s3", hit error. The error messages in nova-compute.log are as below: ---------------------------------------------------------------------------- 2021-02-10 07:04:24.825 7 ERROR nova.virt.libvirt.driver [instance: 0129f9e7-3016-496b-baa9-2cfc3d57414f] libvirt.libvirtError: internal error: unable to execute QEMU agent command 'guest-fsfreeze-freeze': The command guest-fsfreeze-freeze has been disabled for this instance 2021-02-10 07:04:24.825 7 ERROR nova.virt.libvirt.driver [instance: 0129f9e7-3016-496b-baa9-2cfc3d57414f] ...... 2021-02-10 07:17:36.843 7 ERROR oslo_messaging.rpc.server libvirt.libvirtError: invalid argument: could not find image 'volume-880e38be-1905-470b-86c0-7a98783e8a67.ed92b7bb-78bb-449b-ae9a-c89ae3ac6924' in chain for '/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.6296bdcb-2cd4-4be5-921e-362930b2bcea' Actual results: In step6: Failed to delete the volume snapshot Expected results: In step6: Delete the volume snapshot successfully Additional info: - nova-compute.log - libvirtd.log - cinder-volume.log
Created attachment 1756140 [details] nova-compute.log
Created attachment 1756141 [details] cinder-volume.log
Created attachment 1756154 [details] libvirtd.log
From the debug logs. The VM was shutdown by: 2021-02-10 07:14:09.883+0000: 166313: debug : virDomainShutdown:1443 : dom=0x7f694c00c5f0, (VM: name=instance-00000018, uuid=0129f9e7-3016-496b-baa9-2cfc3d57414f) The VM config was then redefined: 2021-02-10 07:14:39.734+0000: 166313: debug : virDomainDefineXML:6306 : conn=0x7f690c015190, xml=<domain type="kvm"> <uuid>0129f9e7-3016-496b-baa9-2cfc3d57414f</uuid> <name>instance-00000018</name> <memory>524288</memory> <vcpu>1</vcpu> <metadata> <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0"> <nova:package version="20.4.2-2.20201224134938.81a3f4b.el8ost.1"/> <nova:name>asb-vm-qcow2-vol</nova:name> <nova:creationTime>2021-02-10 07:14:39</nova:creationTime> <nova:flavor name="asb-m2"> <nova:memory>512</nova:memory> <nova:disk>10</nova:disk> <nova:swap>0</nova:swap> <nova:ephemeral>0</nova:ephemeral> <nova:vcpus>1</nova:vcpus> </nova:flavor> <nova:owner> <nova:user uuid="bad1f9d07c0d482480372aeaf58ed151">admin</nova:user> <nova:project uuid="c5b1cf8d7baa4b9896f93842dab9bb14">admin</nova:project> </nova:owner> </nova:instance> </metadata> <sysinfo type="smbios"> <system> <entry name="manufacturer">Red Hat</entry> <entry name="product">OpenStack Compute</entry> <entry name="version">20.4.2-2.20201224134938.81a3f4b.el8ost.1</entry> <entry name="serial">0129f9e7-3016-496b-baa9-2cfc3d57414f</entry> <entry name="uuid">0129f9e7-3016-496b-baa9-2cfc3d57414f</entry> <entry name="family">Virtual Machine</entry> </system> </sysinfo> <os> <type machine="q35">hvm</type> <boot dev="hd"/> <smbios mode="sysinfo"/> </os> <features> <acpi/> <apic/> <pmu state="on"/> </features> <cputune> <shares>1024</shares> </cputune> <clock offset="utc"> <timer name="pit" tickpolicy="delay"/> <timer name="rtc" tickpolicy="catchup"/> <timer name="hpet" present="no"/> </clock> <cpu mode="host-model" match="exact"> <topology sockets="1" cores="1" threads="1"/> </cpu> <devices> <disk type="file" device="disk"> <driver name="qemu" type="qcow2" cache="none" io="native"/> <source file="/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.6296bdcb-2cd4-4be5-921e-362930b2bcea"/> <target bus="virtio" dev="vda"/> <serial>880e38be-1905-470b-86c0-7a98783e8a67</serial> </disk> <controller type="scsi" index="0" model="virtio-scsi"/> [... snipped ...] </devices> </domain> The VM is then started: 2021-02-10 07:14:39.752+0000: 166315: debug : virDomainCreateWithFlags:6749 : dom=0x7f694400db30, (VM: name=instance-00000018, uuid=0129f9e7-3016-496b-baa9-2cfc3d57414f), flags=0x0 Some time later block commit is attempted (snapshot deletion) 2021-02-10 07:17:36.139+0000: 166311: debug : virDomainBlockCommit:10688 : dom=0x7f690c009d00, (VM: name=instance-00000018, uuid=0129f9e7-3016-496b-baa9-2cfc3d57414f), disk=vda, base=volume-880e38be-1905-470b-86c0-7a98783e8a67.feb74d31-e3ac-4a14-b077-a4253df148c6, top=volume-880e38be-1905-470b-86c0-7a98783e8a67.ed92b7bb-78bb-449b-ae9a-c89ae3ac6924, bandwidth=0, flags=0x8 2021-02-10 07:17:36.141+0000: 166311: error : virStorageFileChainLookup:1524 : invalid argument: could not find image 'volume-880e38be-1905-470b-86c0-7a98783e8a67.ed92b7bb-78bb-449b-ae9a-c89ae3ac6924' in chain for '/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.6296bdcb-2cd4-4be5-921e-362930b2bcea' Which fails. The problem though is, that the new configuration defined above actually defines the disk source to the file which was used originally when the VM was first started (/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.6296bdcb-2cd4-4be5-921e-362930b2bcea) rather than to the state after the snapshots above were taken (/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.2fe4efce-26f7-4cb9-9472-ec6cc2b4ce73), thus without any of the snapshots which were created on top of it in previous steps. This means that the virDomainBlockCommit can't find the image nova wants to commit since the backing chain state was reverted by the redefinition. I can't tell whether that's an usage problem of the openstack APIs or a bug, thus moving to NOVA. Moving to nova for
This is a duplicate of 1757691 but as it's targeted at 16.2 I'll leave it in place for now and continue working to land the fix in 17 before backporting.
Apologies, lets actually mark this as a duplicate of 1757691 to keep things simple. *** This bug has been marked as a duplicate of bug 1757691 ***