Bug 1927136 - Failed to delete snapshot after restarting VM
Summary: Failed to delete snapshot after restarting VM
Keywords:
Status: CLOSED DUPLICATE of bug 1757691
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 16.2 (Train)
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: ---
: ---
Assignee: Lee Yarwood
QA Contact: OSP DFG:Compute
URL:
Whiteboard: libvirt_OSP_INT
Depends On: 1757691 1977845
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-10 07:39 UTC by chhu
Modified: 2023-03-21 19:39 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-15 10:59:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
nova-compute.log (132.91 KB, text/plain)
2021-02-10 08:10 UTC, chhu
no flags Details
cinder-volume.log (40.19 KB, text/plain)
2021-02-10 08:11 UTC, chhu
no flags Details
libvirtd.log (808.11 KB, application/gzip)
2021-02-10 08:23 UTC, chhu
no flags Details

Description chhu 2021-02-10 07:39:27 UTC
Description of problem:
Failed to delete snapshot after restarting VM

Version-Release number of selected component (if applicable):
libvirt-daemon-7.0.0-3.module+el8.4.0+9709+a99efd61.x86_64
qemu-kvm-5.2.0-4.module+el8.4.0+9676+589043b9.x86_64
kernel: 4.18.0-282.el8.x86_64
OSP16.2: 
openstack-nova-compute-20.4.2-2.20201224134938.81a3f4b.el8ost.1.noarch

How reproducible:
100%

Steps to Reproduce:
1. Install OSP16.2(with RHEL-AV 8.4.0) on baremetal with 1 controller, 3 compute nodes, cinder is using nfs storage.

2. Create volume,network,flavor, boot a VM from volume: asb-vm-qcow2-vol in OSP.
$ openstack server list
+--------------------------------------+------------------+--------+-----------
| ID                                   | Name             | Status | Networks               | Image | Flavor |
+--------------------------------------+------------------+--------+------------------------+-------+--------+
| 0129f9e7-3016-496b-baa9-2cfc3d57414f | asb-vm-qcow2-vol | ACTIVE | asb-net1=192.168.32.46 |       |        |
+--------------------------------------+------------------+--------+-----------

3. Login to the web console, "Compute" -> "Instances", create 4 snapshots for this VM: s1,s2,s3,s4. check "Volumes" -> "Snapshots", there are 4 snapshots: snapshot for s1 to s4, check the VM disk xml is as below:
-------------------------------------------------------------
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none' io='native'/>
      <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.2fe4efce-26f7-4cb9-9472-ec6cc2b4ce73' index='6'/>
      <backingStore type='file' index='5'>
        <format type='qcow2'/>
        <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.ed92b7bb-78bb-449b-ae9a-c89ae3ac6924'/>
        <backingStore type='file' index='4'>
          <format type='qcow2'/>
          <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.54b74790-1122-4e48-b1cf-bca1f10c96fa'/>
          <backingStore type='file' index='3'>
            <format type='qcow2'/>
            <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.feb74d31-e3ac-4a14-b077-a4253df148c6'/>
            <backingStore type='file' index='1'>
              <format type='qcow2'/>
              <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.6296bdcb-2cd4-4be5-921e-362930b2bcea'/>
              <backingStore type='file' index='2'>
                <format type='raw'/>
                <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67'/>
                <backingStore/>
              </backingStore>
         ......
    </disk>


4. On web console, "Volumes" -> "Snapshots", delete the volume snapshot: "snapshot for s2" successfully, check the VM xml is as below, the "volume-880e38be-1905-470b-86c0-7a98783e8a67.54b74790-1122-4e48-b1cf-bca1f10c96fa" is deleted.
-------------------------------------------------------------
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none' io='native'/>
      <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.2fe4efce-26f7-4cb9-9472-ec6cc2b4ce73' index='6'/>
      <backingStore type='file' index='5'>
        <format type='qcow2'/>
        <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.ed92b7bb-78bb-449b-ae9a-c89ae3ac6924'/>
        <backingStore type='file' index='3'>
          <format type='qcow2'/>
          <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.feb74d31-e3ac-4a14-b077-a4253df148c6'/>
          <backingStore type='file' index='1'>
            <format type='qcow2'/>
            <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.6296bdcb-2cd4-4be5-921e-362930b2bcea'/>
            <backingStore type='file' index='2'>
              <format type='raw'/>
              <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67'/>
              <backingStore/>
            </backingStore>
......
    </disk>


5. On web console, "Compute" -> "Instances", select: asb-vm-qcow2-vol, click "Shutoff the Instance", check it's Shutoff, then click "Start Instance", check the VM xml is as below:
-------------------------------------------------------------
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none' io='native'/>
      <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.6296bdcb-2cd4-4be5-921e-362930b2bcea' index='1'/>
      <backingStore type='file' index='2'>
        <format type='raw'/>
        <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67'/>
        <backingStore/>
      </backingStore>
      <target dev='vda' bus='virtio'/>
      <serial>880e38be-1905-470b-86c0-7a98783e8a67</serial>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </disk>

6. On web console, "Volumes" -> "Snapshots", try to delete volume snapshot:"snapshot for s3", hit error. The error messages in nova-compute.log are as below:
----------------------------------------------------------------------------
2021-02-10 07:04:24.825 7 ERROR nova.virt.libvirt.driver [instance: 0129f9e7-3016-496b-baa9-2cfc3d57414f] libvirt.libvirtError: internal error: unable to execute QEMU agent command 'guest-fsfreeze-freeze': The command guest-fsfreeze-freeze has been disabled for this instance
2021-02-10 07:04:24.825 7 ERROR nova.virt.libvirt.driver [instance: 0129f9e7-3016-496b-baa9-2cfc3d57414f]
......
2021-02-10 07:17:36.843 7 ERROR oslo_messaging.rpc.server libvirt.libvirtError: invalid argument: could not find image 'volume-880e38be-1905-470b-86c0-7a98783e8a67.ed92b7bb-78bb-449b-ae9a-c89ae3ac6924' in chain for '/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.6296bdcb-2cd4-4be5-921e-362930b2bcea'


Actual results:
In step6: Failed to delete the volume snapshot

Expected results:
In step6: Delete the volume snapshot successfully

Additional info:
- nova-compute.log
- libvirtd.log
- cinder-volume.log

Comment 1 chhu 2021-02-10 08:10:37 UTC
Created attachment 1756140 [details]
nova-compute.log

Comment 2 chhu 2021-02-10 08:11:12 UTC
Created attachment 1756141 [details]
cinder-volume.log

Comment 3 chhu 2021-02-10 08:23:13 UTC
Created attachment 1756154 [details]
libvirtd.log

Comment 4 Peter Krempa 2021-02-15 16:33:06 UTC
From the debug logs.

The VM was shutdown by:

2021-02-10 07:14:09.883+0000: 166313: debug : virDomainShutdown:1443 : dom=0x7f694c00c5f0, (VM: name=instance-00000018, uuid=0129f9e7-3016-496b-baa9-2cfc3d57414f)

The VM config was then redefined:

2021-02-10 07:14:39.734+0000: 166313: debug : virDomainDefineXML:6306 : conn=0x7f690c015190, xml=<domain type="kvm">
  <uuid>0129f9e7-3016-496b-baa9-2cfc3d57414f</uuid>
  <name>instance-00000018</name>
  <memory>524288</memory>
  <vcpu>1</vcpu>
  <metadata>
    <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0">
      <nova:package version="20.4.2-2.20201224134938.81a3f4b.el8ost.1"/>
      <nova:name>asb-vm-qcow2-vol</nova:name>
      <nova:creationTime>2021-02-10 07:14:39</nova:creationTime>
      <nova:flavor name="asb-m2">
        <nova:memory>512</nova:memory>
        <nova:disk>10</nova:disk>
        <nova:swap>0</nova:swap>
        <nova:ephemeral>0</nova:ephemeral>
        <nova:vcpus>1</nova:vcpus>
      </nova:flavor>
      <nova:owner>
        <nova:user uuid="bad1f9d07c0d482480372aeaf58ed151">admin</nova:user>
        <nova:project uuid="c5b1cf8d7baa4b9896f93842dab9bb14">admin</nova:project>
      </nova:owner>
    </nova:instance>
  </metadata>
  <sysinfo type="smbios">
    <system>
      <entry name="manufacturer">Red Hat</entry>
      <entry name="product">OpenStack Compute</entry>
      <entry name="version">20.4.2-2.20201224134938.81a3f4b.el8ost.1</entry>
      <entry name="serial">0129f9e7-3016-496b-baa9-2cfc3d57414f</entry>
      <entry name="uuid">0129f9e7-3016-496b-baa9-2cfc3d57414f</entry>
      <entry name="family">Virtual Machine</entry>
    </system>
  </sysinfo>
  <os>
    <type machine="q35">hvm</type>
    <boot dev="hd"/>
    <smbios mode="sysinfo"/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pmu state="on"/>
  </features>
  <cputune>
    <shares>1024</shares>
  </cputune>
  <clock offset="utc">
    <timer name="pit" tickpolicy="delay"/>
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="hpet" present="no"/>
  </clock>
  <cpu mode="host-model" match="exact">
    <topology sockets="1" cores="1" threads="1"/>
  </cpu>
  <devices>
    <disk type="file" device="disk">
      <driver name="qemu" type="qcow2" cache="none" io="native"/>
      <source file="/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.6296bdcb-2cd4-4be5-921e-362930b2bcea"/>
      <target bus="virtio" dev="vda"/>
      <serial>880e38be-1905-470b-86c0-7a98783e8a67</serial>
    </disk>
    <controller type="scsi" index="0" model="virtio-scsi"/>

[... snipped ...]

  </devices>
</domain>

The VM is then started:

2021-02-10 07:14:39.752+0000: 166315: debug : virDomainCreateWithFlags:6749 : dom=0x7f694400db30, (VM: name=instance-00000018, uuid=0129f9e7-3016-496b-baa9-2cfc3d57414f), flags=0x0

Some time later block commit is attempted (snapshot deletion)

2021-02-10 07:17:36.139+0000: 166311: debug : virDomainBlockCommit:10688 : dom=0x7f690c009d00, (VM: name=instance-00000018, uuid=0129f9e7-3016-496b-baa9-2cfc3d57414f), disk=vda, base=volume-880e38be-1905-470b-86c0-7a98783e8a67.feb74d31-e3ac-4a14-b077-a4253df148c6, top=volume-880e38be-1905-470b-86c0-7a98783e8a67.ed92b7bb-78bb-449b-ae9a-c89ae3ac6924, bandwidth=0, flags=0x8
2021-02-10 07:17:36.141+0000: 166311: error : virStorageFileChainLookup:1524 : invalid argument: could not find image 'volume-880e38be-1905-470b-86c0-7a98783e8a67.ed92b7bb-78bb-449b-ae9a-c89ae3ac6924' in chain for '/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.6296bdcb-2cd4-4be5-921e-362930b2bcea'

Which fails. The problem though is, that the new configuration defined above actually defines the disk source to the file which was used originally when the VM was first started (/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.6296bdcb-2cd4-4be5-921e-362930b2bcea) rather than to the state after the snapshots above were taken (/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.2fe4efce-26f7-4cb9-9472-ec6cc2b4ce73), thus without any of the snapshots which were created on top of it in previous steps. This means that the virDomainBlockCommit can't find the image nova wants to commit since the backing chain state was reverted by the redefinition.

I can't tell whether that's an usage problem of the openstack APIs or a bug, thus moving to NOVA.

Moving to nova for

Comment 5 Lee Yarwood 2021-02-19 11:57:19 UTC
This is a duplicate of 1757691 but as it's targeted at 16.2 I'll leave it in place for now and continue working to land the fix in 17 before backporting.

Comment 6 Lee Yarwood 2021-03-15 10:59:44 UTC
Apologies, lets actually mark this as a duplicate of 1757691 to keep things simple.

*** This bug has been marked as a duplicate of bug 1757691 ***


Note You need to log in before you can comment on or make changes to this bug.