Bug 1927136

Summary: Failed to delete snapshot after restarting VM
Product: Red Hat OpenStack Reporter: chhu
Component: openstack-novaAssignee: Lee Yarwood <lyarwood>
Status: CLOSED DUPLICATE QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: medium Docs Contact:
Priority: high    
Version: 16.2 (Train)CC: dasmith, eglynn, jhakimra, kchamart, lmen, lyarwood, pkrempa, sbauza, sgordon, virt-maint, vromanso
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: libvirt_OSP_INT
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-15 10:59:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1757691, 1977845    
Bug Blocks:    
Attachments:
Description Flags
nova-compute.log
none
cinder-volume.log
none
libvirtd.log none

Description chhu 2021-02-10 07:39:27 UTC
Description of problem:
Failed to delete snapshot after restarting VM

Version-Release number of selected component (if applicable):
libvirt-daemon-7.0.0-3.module+el8.4.0+9709+a99efd61.x86_64
qemu-kvm-5.2.0-4.module+el8.4.0+9676+589043b9.x86_64
kernel: 4.18.0-282.el8.x86_64
OSP16.2: 
openstack-nova-compute-20.4.2-2.20201224134938.81a3f4b.el8ost.1.noarch

How reproducible:
100%

Steps to Reproduce:
1. Install OSP16.2(with RHEL-AV 8.4.0) on baremetal with 1 controller, 3 compute nodes, cinder is using nfs storage.

2. Create volume,network,flavor, boot a VM from volume: asb-vm-qcow2-vol in OSP.
$ openstack server list
+--------------------------------------+------------------+--------+-----------
| ID                                   | Name             | Status | Networks               | Image | Flavor |
+--------------------------------------+------------------+--------+------------------------+-------+--------+
| 0129f9e7-3016-496b-baa9-2cfc3d57414f | asb-vm-qcow2-vol | ACTIVE | asb-net1=192.168.32.46 |       |        |
+--------------------------------------+------------------+--------+-----------

3. Login to the web console, "Compute" -> "Instances", create 4 snapshots for this VM: s1,s2,s3,s4. check "Volumes" -> "Snapshots", there are 4 snapshots: snapshot for s1 to s4, check the VM disk xml is as below:
-------------------------------------------------------------
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none' io='native'/>
      <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.2fe4efce-26f7-4cb9-9472-ec6cc2b4ce73' index='6'/>
      <backingStore type='file' index='5'>
        <format type='qcow2'/>
        <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.ed92b7bb-78bb-449b-ae9a-c89ae3ac6924'/>
        <backingStore type='file' index='4'>
          <format type='qcow2'/>
          <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.54b74790-1122-4e48-b1cf-bca1f10c96fa'/>
          <backingStore type='file' index='3'>
            <format type='qcow2'/>
            <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.feb74d31-e3ac-4a14-b077-a4253df148c6'/>
            <backingStore type='file' index='1'>
              <format type='qcow2'/>
              <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.6296bdcb-2cd4-4be5-921e-362930b2bcea'/>
              <backingStore type='file' index='2'>
                <format type='raw'/>
                <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67'/>
                <backingStore/>
              </backingStore>
         ......
    </disk>


4. On web console, "Volumes" -> "Snapshots", delete the volume snapshot: "snapshot for s2" successfully, check the VM xml is as below, the "volume-880e38be-1905-470b-86c0-7a98783e8a67.54b74790-1122-4e48-b1cf-bca1f10c96fa" is deleted.
-------------------------------------------------------------
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none' io='native'/>
      <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.2fe4efce-26f7-4cb9-9472-ec6cc2b4ce73' index='6'/>
      <backingStore type='file' index='5'>
        <format type='qcow2'/>
        <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.ed92b7bb-78bb-449b-ae9a-c89ae3ac6924'/>
        <backingStore type='file' index='3'>
          <format type='qcow2'/>
          <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.feb74d31-e3ac-4a14-b077-a4253df148c6'/>
          <backingStore type='file' index='1'>
            <format type='qcow2'/>
            <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.6296bdcb-2cd4-4be5-921e-362930b2bcea'/>
            <backingStore type='file' index='2'>
              <format type='raw'/>
              <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67'/>
              <backingStore/>
            </backingStore>
......
    </disk>


5. On web console, "Compute" -> "Instances", select: asb-vm-qcow2-vol, click "Shutoff the Instance", check it's Shutoff, then click "Start Instance", check the VM xml is as below:
-------------------------------------------------------------
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none' io='native'/>
      <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.6296bdcb-2cd4-4be5-921e-362930b2bcea' index='1'/>
      <backingStore type='file' index='2'>
        <format type='raw'/>
        <source file='/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67'/>
        <backingStore/>
      </backingStore>
      <target dev='vda' bus='virtio'/>
      <serial>880e38be-1905-470b-86c0-7a98783e8a67</serial>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </disk>

6. On web console, "Volumes" -> "Snapshots", try to delete volume snapshot:"snapshot for s3", hit error. The error messages in nova-compute.log are as below:
----------------------------------------------------------------------------
2021-02-10 07:04:24.825 7 ERROR nova.virt.libvirt.driver [instance: 0129f9e7-3016-496b-baa9-2cfc3d57414f] libvirt.libvirtError: internal error: unable to execute QEMU agent command 'guest-fsfreeze-freeze': The command guest-fsfreeze-freeze has been disabled for this instance
2021-02-10 07:04:24.825 7 ERROR nova.virt.libvirt.driver [instance: 0129f9e7-3016-496b-baa9-2cfc3d57414f]
......
2021-02-10 07:17:36.843 7 ERROR oslo_messaging.rpc.server libvirt.libvirtError: invalid argument: could not find image 'volume-880e38be-1905-470b-86c0-7a98783e8a67.ed92b7bb-78bb-449b-ae9a-c89ae3ac6924' in chain for '/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.6296bdcb-2cd4-4be5-921e-362930b2bcea'


Actual results:
In step6: Failed to delete the volume snapshot

Expected results:
In step6: Delete the volume snapshot successfully

Additional info:
- nova-compute.log
- libvirtd.log
- cinder-volume.log

Comment 1 chhu 2021-02-10 08:10:37 UTC
Created attachment 1756140 [details]
nova-compute.log

Comment 2 chhu 2021-02-10 08:11:12 UTC
Created attachment 1756141 [details]
cinder-volume.log

Comment 3 chhu 2021-02-10 08:23:13 UTC
Created attachment 1756154 [details]
libvirtd.log

Comment 4 Peter Krempa 2021-02-15 16:33:06 UTC
From the debug logs.

The VM was shutdown by:

2021-02-10 07:14:09.883+0000: 166313: debug : virDomainShutdown:1443 : dom=0x7f694c00c5f0, (VM: name=instance-00000018, uuid=0129f9e7-3016-496b-baa9-2cfc3d57414f)

The VM config was then redefined:

2021-02-10 07:14:39.734+0000: 166313: debug : virDomainDefineXML:6306 : conn=0x7f690c015190, xml=<domain type="kvm">
  <uuid>0129f9e7-3016-496b-baa9-2cfc3d57414f</uuid>
  <name>instance-00000018</name>
  <memory>524288</memory>
  <vcpu>1</vcpu>
  <metadata>
    <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0">
      <nova:package version="20.4.2-2.20201224134938.81a3f4b.el8ost.1"/>
      <nova:name>asb-vm-qcow2-vol</nova:name>
      <nova:creationTime>2021-02-10 07:14:39</nova:creationTime>
      <nova:flavor name="asb-m2">
        <nova:memory>512</nova:memory>
        <nova:disk>10</nova:disk>
        <nova:swap>0</nova:swap>
        <nova:ephemeral>0</nova:ephemeral>
        <nova:vcpus>1</nova:vcpus>
      </nova:flavor>
      <nova:owner>
        <nova:user uuid="bad1f9d07c0d482480372aeaf58ed151">admin</nova:user>
        <nova:project uuid="c5b1cf8d7baa4b9896f93842dab9bb14">admin</nova:project>
      </nova:owner>
    </nova:instance>
  </metadata>
  <sysinfo type="smbios">
    <system>
      <entry name="manufacturer">Red Hat</entry>
      <entry name="product">OpenStack Compute</entry>
      <entry name="version">20.4.2-2.20201224134938.81a3f4b.el8ost.1</entry>
      <entry name="serial">0129f9e7-3016-496b-baa9-2cfc3d57414f</entry>
      <entry name="uuid">0129f9e7-3016-496b-baa9-2cfc3d57414f</entry>
      <entry name="family">Virtual Machine</entry>
    </system>
  </sysinfo>
  <os>
    <type machine="q35">hvm</type>
    <boot dev="hd"/>
    <smbios mode="sysinfo"/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pmu state="on"/>
  </features>
  <cputune>
    <shares>1024</shares>
  </cputune>
  <clock offset="utc">
    <timer name="pit" tickpolicy="delay"/>
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="hpet" present="no"/>
  </clock>
  <cpu mode="host-model" match="exact">
    <topology sockets="1" cores="1" threads="1"/>
  </cpu>
  <devices>
    <disk type="file" device="disk">
      <driver name="qemu" type="qcow2" cache="none" io="native"/>
      <source file="/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.6296bdcb-2cd4-4be5-921e-362930b2bcea"/>
      <target bus="virtio" dev="vda"/>
      <serial>880e38be-1905-470b-86c0-7a98783e8a67</serial>
    </disk>
    <controller type="scsi" index="0" model="virtio-scsi"/>

[... snipped ...]

  </devices>
</domain>

The VM is then started:

2021-02-10 07:14:39.752+0000: 166315: debug : virDomainCreateWithFlags:6749 : dom=0x7f694400db30, (VM: name=instance-00000018, uuid=0129f9e7-3016-496b-baa9-2cfc3d57414f), flags=0x0

Some time later block commit is attempted (snapshot deletion)

2021-02-10 07:17:36.139+0000: 166311: debug : virDomainBlockCommit:10688 : dom=0x7f690c009d00, (VM: name=instance-00000018, uuid=0129f9e7-3016-496b-baa9-2cfc3d57414f), disk=vda, base=volume-880e38be-1905-470b-86c0-7a98783e8a67.feb74d31-e3ac-4a14-b077-a4253df148c6, top=volume-880e38be-1905-470b-86c0-7a98783e8a67.ed92b7bb-78bb-449b-ae9a-c89ae3ac6924, bandwidth=0, flags=0x8
2021-02-10 07:17:36.141+0000: 166311: error : virStorageFileChainLookup:1524 : invalid argument: could not find image 'volume-880e38be-1905-470b-86c0-7a98783e8a67.ed92b7bb-78bb-449b-ae9a-c89ae3ac6924' in chain for '/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.6296bdcb-2cd4-4be5-921e-362930b2bcea'

Which fails. The problem though is, that the new configuration defined above actually defines the disk source to the file which was used originally when the VM was first started (/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.6296bdcb-2cd4-4be5-921e-362930b2bcea) rather than to the state after the snapshots above were taken (/var/lib/nova/mnt/805af70202ed20867b0f31abdf6acba4/volume-880e38be-1905-470b-86c0-7a98783e8a67.2fe4efce-26f7-4cb9-9472-ec6cc2b4ce73), thus without any of the snapshots which were created on top of it in previous steps. This means that the virDomainBlockCommit can't find the image nova wants to commit since the backing chain state was reverted by the redefinition.

I can't tell whether that's an usage problem of the openstack APIs or a bug, thus moving to NOVA.

Moving to nova for

Comment 5 Lee Yarwood 2021-02-19 11:57:19 UTC
This is a duplicate of 1757691 but as it's targeted at 16.2 I'll leave it in place for now and continue working to land the fix in 17 before backporting.

Comment 6 Lee Yarwood 2021-03-15 10:59:44 UTC
Apologies, lets actually mark this as a duplicate of 1757691 to keep things simple.

*** This bug has been marked as a duplicate of bug 1757691 ***