Bug 1854796 - Failed to start VM after deleting shutdown VM snapshot in OSP16.1
Summary: Failed to start VM after deleting shutdown VM snapshot in OSP16.1
Keywords:
Status: CLOSED DUPLICATE of bug 1757691
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 16.1 (Train)
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: OSP DFG:Compute
QA Contact: OSP DFG:Compute
URL:
Whiteboard: libvirt_OSP_INT
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-08 09:19 UTC by chhu
Modified: 2023-03-21 19:36 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-21 22:34:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
libvirtd.log, nova-compute.log (419.77 KB, application/gzip)
2020-07-08 09:20 UTC, chhu
no flags Details
cinder.conf, cinder-*.log (521.56 KB, application/gzip)
2020-07-09 12:13 UTC, chhu
no flags Details
nfs-config.yaml (704 bytes, text/plain)
2020-07-13 07:49 UTC, chhu
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1860913 0 None None None 2020-07-21 15:52:46 UTC
Red Hat Issue Tracker OSP-23543 0 None None None 2023-03-21 19:36:01 UTC

Description chhu 2020-07-08 09:19:10 UTC
Description of problem:
Failed to start VM after doing snapshot for shutdown VM in OSP16.1

Version-Release number of selected component (if applicable):
libvirt-daemon-kvm-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
qemu-kvm-core-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64
openstack-nova-compute-20.2.1-0.20200528080027.1e95025.el8ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Create a image and volume in OSP, boot VM from volume: Project->Instances->Launch Instance->Source:Volume r8-vol
   $ openstack image create r8 --disk-format qcow2 --container-format bare --file /tmp/RHEL-8.2-x86_64-latest.qcow2
   $ openstack volume create r8-vol --size 10 --image r8

2. Shutdown the VM, check the xml:
   ---------------------------------------------------------------
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source file='/var/lib/nova/mnt/07b80119d40eda06c63650e0d74e0ba5/volume-5fd5a675-1171-4dbc-af73-6b2a8a1dc457'/>
      <target dev='vda' bus='virtio'/>
      <serial>5fd5a675-1171-4dbc-af73-6b2a8a1dc457</serial>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
   ----------------------------------------------------------------

3. Do VM snapshot r8-vol-shutdown-s1, check the xml:
   ----------------------------------------------------------------
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none' io='native'/>
      <source file='/var/lib/nova/mnt/07b80119d40eda06c63650e0d74e0ba5/volume-5fd5a675-1171-4dbc-af73-6b2a8a1dc457.362d13e8-9f6f-4f61-9e34-bf44ea626e7a'/>
      <target dev='vda' bus='virtio'/>
      <serial>5fd5a675-1171-4dbc-af73-6b2a8a1dc457</serial>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>

4. Delete VM snapshot r8-vol-shutdown-s1 successfully, check openstack volume snapshot list, the snapshot r8-vol-shutdown-s1 is no longer existed.
   Check the libvirt xml is not changed.
   -------------------------------------------------------------
   <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none' io='native'/>
      <source file='/var/lib/nova/mnt/07b80119d40eda06c63650e0d74e0ba5/volume-5fd5a675-1171-4dbc-af73-6b2a8a1dc457.362d13e8-9f6f-4f61-9e34-bf44ea626e7a'/>
      <target dev='vda' bus='virtio'/>
      <serial>5fd5a675-1171-4dbc-af73-6b2a8a1dc457</serial>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>

5. Do VM snapshot r8-vol-shutdown-s2 successfully, it created 0 size image, and volume snapshot.
   $ openstack volume snapshot list
   +--------------------------------------+------------------------------------+-------------+----------------+------+
   | ID                                   | Name                               | Description | Status         | Size |
   +--------------------------------------+------------------------------------+-------------+----------------+------+
   | b6a06c69-57f7-4fbe-a48b-b251fc550d87 | snapshot for r8-vol-shutdown-s2    | None        | available      |   10 |

   ()[root@overcloud-novacompute-1 libvirt]# virsh list --all
    Id   Name                State
    ------------------------------------
    -    instance-00000004   shut off

  Check the xml:
   ---------------------------------------------------------------
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none' io='native'/>
      <source file='/var/lib/nova/mnt/07b80119d40eda06c63650e0d74e0ba5/volume-5fd5a675-1171-4dbc-af73-6b2a8a1dc457.b6a06c69-57f7-4fbe-a48b-b251fc550d87'/>
      <target dev='vda' bus='virtio'/>
      <serial>5fd5a675-1171-4dbc-af73-6b2a8a1dc457</serial>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>

6. Try to start the VM, the VM is still shutoff in OSP16.1:
   ------------------------------------------------------------
   $ openstack server list
   +--------------------------------------+-----------+---------+-----------------------+-------+--------+
   | ID                                   | Name      | Status  | Networks              | Image | Flavor |
   +--------------------------------------+-----------+---------+-----------------------+-------+--------+
   | 416b3517-84d9-4f93-85de-e395c3333c3e | r8-vol    | SHUTOFF | default=192.168.35.64 |       |        |

   But VM is disappeared in virsh list, and get error in libvirtd.log
   ---------------------------------------------------------------
    error : virStorageFileReportBrokenChain:5214 : Cannot access storage file '/var/lib/nova/mnt/07b80119d40eda06c63650e0d74e0ba5/volume-5fd5a675-1171-4dbc-af73-6b2a8a1dc457': No such file or directory
   ---------------------------------------------------------------

Actual results:
In step6, failed to start the VM

Expected results:
In step6, start the VM successfully

Additional info:
- libvirtd.log
- nova_compute.log

Comment 1 chhu 2020-07-08 09:20:14 UTC
Created attachment 1700262 [details]
libvirtd.log, nova-compute.log

Comment 2 chhu 2020-07-09 10:17:13 UTC
I think this is an OSP bug, add more information here:

1. Create a image and volume in OSP, boot VM from volume: Project->Instances->Launch Instance->Source:Volume r8t-vol
   $ openstack image create r8t --disk-format qcow2 --container-format bare --file /tmp/RHEL-8.2-x86_64-latest.qcow2
   $ openstack volume create r8t-vol --size 10 --image r8t
   
   Check the cinder on nfs storage:
   [root@dell-per730-** cinder]# ls -lh
   -rw-rw-rw-. 1 qemu qemu  10G Jul  9 05:35 volume-2750c079-b5f5-48ec-8e71-b470c41abf08

2. Shutdown the VM
   [root@dell-per730-** cinder]# ls -lh
   -rw-rw-rw-. 1 root root  10G Jul  9 05:35 volume-2750c079-b5f5-48ec-8e71-b470c41abf08

3. Do VM snapshot s1-down, volume snapshot is created
   [root@dell-per730-** cinder]# ls -lh
   -rw-rw-rw-. 1 root root  10G Jul  9 05:35 volume-2750c079-b5f5-48ec-8e71-b470c41abf08
   -rw-rw-rw-. 1 root root 193K Jul  9 05:36 volume-2750c079-b5f5-48ec-8e71-b470c41abf08.52fd1a14-4f33-48a7-9ca3-5c159cb3c660

4. Delete VM snapshot s1-down, the volume-**bf08 is deleted. It should delete volume-**bf08.**3c660
   [root@dell-per730-** cinder]# ls -lh
   -rw-rw-rw-. 1 root root 193K Jul  9 05:36 volume-2750c079-b5f5-48ec-8e71-b470c41abf08.52fd1a14-4f33-48a7-9ca3-5c159cb3c660

5. Do VM snapshot s2-down, volume-**bf08.**c1bb is created
   [root@dell-per730-64 cinder]# ls -lh
   -rw-rw-rw-. 1 root root 1.6G Jul  9 05:38 volume-2750c079-b5f5-48ec-8e71-b470c41abf08.52fd1a14-4f33-48a7-9ca3-5c159cb3c660
   -rw-rw-rw-. 1 root root 193K Jul  9 05:40 volume-2750c079-b5f5-48ec-8e71-b470c41abf08.bb18f197-7bce-47ca-91e4-f666048fc1bb

6. Try to start the VM, hit error in libvirtd.log:
   --------------------------------------------------------------------------------------------------
   error : virStorageFileReportBrokenChain:5214 : Cannot access storage file '/var/lib/nova/mnt/07b80119d40eda06c63650e0d74e0ba5/volume-2750c079-b5f5-48ec-8e71-b470c41abf08': No such file or directory

Comment 3 Peter Krempa 2020-07-09 10:21:19 UTC
So the libvirtd log shows:

'instance-00000004' is defined as:

    2020-07-08 08:32:23.025+0000: 61778: debug : virDomainDefineXML:6146 : conn=0x7fafa40015d0, xml=<domain type="kvm">
      <uuid>416b3517-84d9-4f93-85de-e395c3333c3e</uuid>
      <name>instance-00000004</name>
      <memory>2097152</memory>
      <vcpu>2</vcpu>
      <metadata>
        <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0">
          <nova:package version="20.2.1-0.20200528080027.1e95025.el8ost"/>
          <nova:name>r8-vol</nova:name>
          <nova:creationTime>2020-07-08 08:32:22</nova:creationTime>
          <nova:flavor name="m1">
            <nova:memory>2048</nova:memory>
            <nova:disk>10</nova:disk>
            <nova:swap>0</nova:swap>
            <nova:ephemeral>0</nova:ephemeral>
            <nova:vcpus>2</nova:vcpus>
          </nova:flavor>
          <nova:owner>
            <nova:user uuid="286e173a35bc42158a88973fdaf6e564">admin</nova:user>
            <nova:project uuid="a26754324ca249c590505e1f796d5f0f">admin</nova:project>
          </nova:owner>
        </nova:instance>
      </metadata>
      <sysinfo type="smbios">
        <system>
          <entry name="manufacturer">Red Hat</entry>
          <entry name="product">OpenStack Compute</entry>
          <entry name="version">20.2.1-0.20200528080027.1e95025.el8ost</entry>
          <entry name="serial">416b3517-84d9-4f93-85de-e395c3333c3e</entry>
          <entry name="uuid">416b3517-84d9-4f93-85de-e395c3333c3e</entry>
          <entry name="family">Virtual Machine</entry>
        </system>
      </sysinfo>
      <os>
        <type machine="pc-i440fx-rhel7.6.0">hvm</type>
        <boot dev="hd"/>
        <smbios mode="sysinfo"/>
      </os>
      <features>
        <acpi/>
        <apic/>
      </features>
      <cputune>
        <shares>2048</shares>
      </cputune>
      <clock offset="utc">
        <timer name="pit" tickpolicy="delay"/>
        <timer name="rtc" tickpolicy="catchup"/>
        <timer name="hpet" present="no"/>
      </clock>
      <cpu mode="host-model" match="exact">
        <topology sockets="2" cores="1" threads="1"/>
      </cpu>
      <devices>
        <disk type="file" device="disk">
          <driver name="qemu" type="raw" cache="none" io="native"/>
          <source file="/var/lib/nova/mnt/07b80119d40eda06c63650e0d74e0ba5/volume-5fd5a675-1171-4dbc-af73-6b2a8a1dc457"/>
          <target bus="virtio" dev="vda"/>
          <serial>5fd5a675-1171-4dbc-af73-6b2a8a1dc457</serial>
        </disk>
        <interface type="bridge">
          <mac address="fa:16:3e:77:27:46"/>
          <model type="virtio"/>
          <driver name="vhost" rx_queue_size="512"/>
          <source bridge="br-int"/>
          <mtu size="1442"/>
          <target dev="tapa82ddf70-7d"/>
          <virtualport type="openvswitch">
            <parameters interfaceid="a82ddf70-7d05-4f83-9f26-07ad91cdc87e"/>
          </virtualport>
        </interface>
        <serial type="pty">
          <log file="/var/lib/nova/instances/416b3517-84d9-4f93-85de-e395c3333c3e/console.log" append="off"/>
        </serial>
        <input type="tablet" bus="usb"/>
        <graphics type="vnc" autoport="yes" listen="192.168.24.11"/>
        <video>
          <model type="cirrus"/>
        </video>
        <memballoon model="virtio">
          <stats period="10"/>
        </memballoon>
      </devices>
    </domain>

    2020-07-08 08:32:23.033+0000: 61778: info : qemuDomainDefineXMLFlags:7636 : Creating domain 'instance-00000004'

The VM is then started:

    2020-07-08 08:32:23.069+0000: 61775: debug : virDomainCreateWithFlags:6589 : dom=0x7fafb4024930, (VM: name=instance-00000004, uuid=416b3517-84d9-4f93-85de-e395c3333c3e), flags=0x1
    2020-07-08 08:32:23.069+0000: 61775: debug : qemuDomainObjBeginJobInternal:9784 : Starting job: job=none agentJob=none asyncJob=start (vm=0x7fafa8029f70 name=instance-00000004, current job=none agentJob=none async=none)
    2020-07-08 08:32:23.069+0000: 61775: debug : qemuDomainObjBeginJobInternal:9841 : Started async job: start (vm=0x7fafa8029f70 name=instance-00000004)
    2020-07-08 08:32:23.069+0000: 61775: debug : qemuProcessStart:7077 : conn=0x7fafa40015d0 driver=0x7faf640f1980 vm=0x7fafa8029f70 name=instance-00000004 id=-1 asyncJob=start migrateFrom=<null> migrateFd=-1 migratePath=<null> snapshot=(nil) vmop=0 flags=0x3
    2020-07-08 08:32:23.069+0000: 61775: debug : qemuProcessInit:5526 : vm=0x7fafa8029f70 name=instance-00000004 id=-1 migration=0

[...]

    2020-07-08 08:32:23.148+0000: 61775: debug : virCommandRunAsync:2588 : About to run LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOME=/var/lib/libvirt/qemu/domain-7-instance-00000004 XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-7-instance-00000004/.local/share XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-7-instance-00000004/.cache XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-7-instance-00000004/.config QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=instance-00000004,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-7-instance-00000004/master-key.aes -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off -cpu Skylake-Server-IBRS,ss=on,vmx=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,ibpb=on,amd-ssbd=on,skip-l1dfl-vmentry=on,pschange-mc-no=on -m 2048 -overcommit mem-lock=off -smp 2,sockets=2,dies=1,cores=1,threads=1 -uuid 416b3517-84d9-4f93-85de-e395c3333c3e -smbios 'type=1,manufacturer=Red Hat,product=OpenStack Compute,version=20.2.1-0.20200528080027.1e95025.el8ost,serial=416b3517-84d9-4f93-85de-e395c3333c3e,uuid=416b3517-84d9-4f93-85de-e395c3333c3e,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=33,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -blockdev '{"driver":"file","filename":"/var/lib/nova/mnt/07b80119d40eda06c63650e0d74e0ba5/volume-5fd5a675-1171-4dbc-af73-6b2a8a1dc457","aio":"native","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' -blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"}' -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on,serial=5fd5a675-1171-4dbc-af73-6b2a8a1dc457 -netdev tap,fd=35,id=hostnet0,vhost=on,vhostfd=36 -device virtio-net-pci,rx_queue_size=512,host_mtu=1442,netdev=hostnet0,id=net0,mac=fa:16:3e:77:27:46,bus=pci.0,addr=0x3 -add-fd set=3,fd=38 -chardev pty,id=charserial0,logfile=/dev/fdset/3,logappend=on -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 192.168.24.11:0 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on

CPUs are started:

    2020-07-08 08:32:26.123+0000: 61777: debug : virDomainResume:666 : dom=0x7fafa40075c0, (VM: name=instance-00000004, uuid=416b3517-84d9-4f93-85de-e395c3333c3e)
    2020-07-08 08:32:26.125+0000: 61770: debug : qemuProcessHandleResume:728 : Transitioned guest instance-00000004 into running state, reason 'unpaused', event detail 0


And the VM is then asked to be shutdown a minute later:

    2020-07-08 08:33:31.688+0000: 61778: debug : virDomainShutdown:1444 : dom=0x7fafa8019b40, (VM: name=instance-00000004, uuid=416b3517-84d9-4f93-85de-e395c3333c3e)
    2020-07-08 08:33:31.688+0000: 61778: info : qemuMonitorSend:996 : QEMU_MONITOR_SEND_MSG: mon=0x7fafb4046e80 msg={"execute":"system_powerdown","id":"libvirt-367"}

The guest OS shuts down in a moment

    2020-07-08 08:33:33.422+0000: 61770: debug : qemuProcessHandleShutdown:594 : Transitioned guest instance-00000004 to shutdown state
    2020-07-08 08:33:33.422+0000: 61770: debug : virObjectEventNew:631 : obj=0x561796c81000
    2020-07-08 08:33:33.423+0000: 61770: debug : qemuProcessKill:7200 : vm=0x7fafa8029f70 name=instance-00000004 pid=103949 flags=0x2

    [...]

    2020-07-08 08:33:33.457+0000: 61770: debug : qemuProcessHandleMonitorEOF:293 : Received EOF on 0x7fafa8029f70 'instance-00000004'

    [...]

    2020-07-08 08:33:33.692+0000: 62484: debug : qemuRemoveCgroup:1333 : Failed to terminate cgroup for instance-00000004
    2020-07-08 08:33:33.701+0000: 62484: error : virProcessRunInFork:1161 : internal error: child reported (status=125): internal error: child reported (status=125): internal error: child reported (status=125): internal error: End of file from qemu monitor
    2020-07-08 08:33:33.701+0000: 62484: warning : qemuBlockRemoveImageMetadata:2774 : Unable to remove disk metadata on vm instance-00000004 from /var/lib/nova/mnt/07b80119d40eda06c63650e0d74e0ba5/volume-5fd5a675-1171-4dbc-af73-6b2a8a1dc457 (disk target vda)

We for some reason fail to remove image metadata, but that shouldn't be fatal in this instance. I'll ask Michal who implemented this to comment on why that could have happened.


The VM is then destroyed, but it's not running so this is a no-op.

    2020-07-08 08:33:33.716+0000: 61777: debug : virDomainDestroy:466 : dom=0x7fafa4057620, (VM: name=instance-00000004, uuid=416b3517-84d9-4f93-85de-e395c3333c3e)
    2020-07-08 08:33:33.716+0000: 61777: debug : virDomainObjCheckActive:7204 : Requested operation is not valid: domain is not running

An external snapshot is then created with the VM in inactive state:

    2020-07-08 08:35:14.804+0000: 61776: debug : virDomainSnapshotCreateXML:221 : dom=0x7fafac039850, (VM: name=instance-00000004, uuid=416b3517-84d9-4f93-85de-e395c3333c3e), xmlDesc=<domainsnapshot>
      <disks>
        <disk name="/var/lib/nova/mnt/07b80119d40eda06c63650e0d74e0ba5/volume-5fd5a675-1171-4dbc-af73-6b2a8a1dc457" snapshot="external" type="file">
          <source file="/var/lib/nova/mnt/07b80119d40eda06c63650e0d74e0ba5/volume-5fd5a675-1171-4dbc-af73-6b2a8a1dc457.362d13e8-9f6f-4f61-9e34-bf44ea626e7a"/>
        </disk>
      </disks>
    </domainsnapshot>
    , flags=0x34

The flags used are:
     VIR_DOMAIN_SNAPSHOT_CREATE_NO_METADATA  (0x04)
     VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY (0x10)
     VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT (0x20)

Since the VM is inactive at this point and _REUSE_EXT along with _NO_METADATA
is used to create an external snapshot, this means that libvirt effectively just
replaces the disk source with the one specified by the snapshot. Libvirt doesn't
even touch the image files apart from checking whether they exist.


Another snapshot is then created ...

    2020-07-08 08:43:17.458+0000: 61775: debug : virDomainSnapshotCreateXML:221 : dom=0x7fafb401eb80, (VM: name=instance-00000004, uuid=416b3517-84d9-4f93-85de-e395c3333c3e), xmlDesc=<domainsnapshot>
      <disks>
        <disk name="/var/lib/nova/mnt/07b80119d40eda06c63650e0d74e0ba5/volume-5fd5a675-1171-4dbc-af73-6b2a8a1dc457.362d13e8-9f6f-4f61-9e34-bf44ea626e7a" snapshot="external" type="file">
          <source file="/var/lib/nova/mnt/07b80119d40eda06c63650e0d74e0ba5/volume-5fd5a675-1171-4dbc-af73-6b2a8a1dc457.b6a06c69-57f7-4fbe-a48b-b251fc550d87"/>
        </disk>
      </disks>
    </domainsnapshot>
    , flags=0x34

With same configuration. So at this point the 'vda' disk points to:

"/var/lib/nova/mnt/07b80119d40eda06c63650e0d74e0ba5/volume-5fd5a675-1171-4dbc-af73-6b2a8a1dc457.b6a06c69-57f7-4fbe-a48b-b251fc550d87"


Another spurious destroy is attempted but the VM still isn't running.

    2020-07-08 08:47:38.979+0000: 61779: debug : virDomainDestroy:466 : dom=0x7fafa0038710, (VM: name=instance-00000004, uuid=416b3517-84d9-4f93-85de-e395c3333c3e)
    2020-07-08 08:47:38.979+0000: 61779: debug : virDomainObjCheckActive:7204 : Requested operation is not valid: domain is not running

At this point the VM is undefined ...

    2020-07-08 08:47:39.051+0000: 61775: debug : virDomainUndefineFlags:6304 : dom=0x7fafb4005400, (VM: name=instance-00000004, uuid=416b3517-84d9-4f93-85de-e395c3333c3e), flags=0x1


And defined again.

    2020-07-08 08:47:39.190+0000: 61780: debug : virDomainDefineXML:6146 : conn=0x7fafa40015d0, xml=<domain type="kvm">
      <uuid>416b3517-84d9-4f93-85de-e395c3333c3e</uuid>
      <name>instance-00000004</name>
      <memory>2097152</memory>
      <vcpu>2</vcpu>
      <metadata>
        <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0">
          <nova:package version="20.2.1-0.20200528080027.1e95025.el8ost"/>
          <nova:name>r8-vol</nova:name>
          <nova:creationTime>2020-07-08 08:47:39</nova:creationTime>
          <nova:flavor name="m1">
            <nova:memory>2048</nova:memory>
            <nova:disk>10</nova:disk>
            <nova:swap>0</nova:swap>
            <nova:ephemeral>0</nova:ephemeral>
            <nova:vcpus>2</nova:vcpus>
          </nova:flavor>
          <nova:owner>
            <nova:user uuid="286e173a35bc42158a88973fdaf6e564">admin</nova:user>
            <nova:project uuid="a26754324ca249c590505e1f796d5f0f">admin</nova:project>
          </nova:owner>
        </nova:instance>
      </metadata>
      <sysinfo type="smbios">
        <system>
          <entry name="manufacturer">Red Hat</entry>
          <entry name="product">OpenStack Compute</entry>
          <entry name="version">20.2.1-0.20200528080027.1e95025.el8ost</entry>
          <entry name="serial">416b3517-84d9-4f93-85de-e395c3333c3e</entry>
          <entry name="uuid">416b3517-84d9-4f93-85de-e395c3333c3e</entry>
          <entry name="family">Virtual Machine</entry>
        </system>
      </sysinfo>
      <os>
        <type machine="pc-i440fx-rhel7.6.0">hvm</type>
        <boot dev="hd"/>
        <smbios mode="sysinfo"/>
      </os>
      <features>
        <acpi/>
        <apic/>
      </features>
      <cputune>
        <shares>2048</shares>
      </cputune>
      <clock offset="utc">
        <timer name="pit" tickpolicy="delay"/>
        <timer name="rtc" tickpolicy="catchup"/>
        <timer name="hpet" present="no"/>
      </clock>
      <cpu mode="host-model" match="exact">
        <topology sockets="2" cores="1" threads="1"/>
      </cpu>
      <devices>
        <disk type="file" device="disk">
          <driver name="qemu" type="raw" cache="none" io="native"/>
          <source file="/var/lib/nova/mnt/07b80119d40eda06c63650e0d74e0ba5/volume-5fd5a675-1171-4dbc-af73-6b2a8a1dc457"/>

Disk skource is thus overwritten by the define to the original path.

          <target bus="virtio" dev="vda"/>
          <serial>5fd5a675-1171-4dbc-af73-6b2a8a1dc457</serial>
        </disk>
        <interface type="bridge">
          <mac address="fa:16:3e:77:27:46"/>
          <model type="virtio"/>
          <driver name="vhost" rx_queue_size="512"/>
          <source bridge="br-int"/>
          <mtu size="1442"/>
          <target dev="tapa82ddf70-7d"/>
          <virtualport type="openvswitch">
            <parameters interfaceid="a82ddf70-7d05-4f83-9f26-07ad91cdc87e"/>
          </virtualport>
        </interface>
        <serial type="pty">
          <log file="/var/lib/nova/instances/416b3517-84d9-4f93-85de-e395c3333c3e/console.log" append="off"/>
        </serial>
        <input type="tablet" bus="usb"/>
        <graphics type="vnc" autoport="yes" listen="192.168.24.11"/>
        <video>
          <model type="cirrus"/>
        </video>
        <memballoon model="virtio">
          <stats period="10"/>
        </memballoon>
      </devices>
    </domain>

    2020-07-08 08:47:39.198+0000: 61780: info : qemuDomainDefineXMLFlags:7636 : Creating domain 'instance-00000004'

The VM is then started:

    2020-07-08 08:47:39.200+0000: 61778: debug : virDomainCreateWithFlags:6589 : dom=0x7fafa80016c0, (VM: name=instance-00000004, uuid=416b3517-84d9-4f93-85de-e395c3333c3e), flags=0x0
    2020-07-08 08:47:39.200+0000: 61778: debug : qemuProcessStart:7077 : conn=0x7fafa40015d0 driver=0x7faf640f1980 vm=0x7faf9801c430 name=instance-00000004 id=-1 asyncJob=start migrateFrom=<null> migrateFd=-1 migratePath=<null> snapshot=(nil) vmop=0 flags=0x1
    2020-07-08 08:47:39.200+0000: 61778: debug : qemuProcessInit:5526 : vm=0x7faf9801c430 name=instance-00000004 id=-1 migration=0

And fails to find the image.

    2020-07-08 08:47:39.229+0000: 61778: debug : qemuProcessPrepareHost:6535 : Preparing disks (host)
    2020-07-08 08:47:39.230+0000: 61778: error : virStorageFileReportBrokenChain:5214 : Cannot access storage file '/var/lib/nova/mnt/07b80119d40eda06c63650e0d74e0ba5/volume-5fd5a675-1171-4dbc-af73-6b2a8a1dc457': No such file or directory
          <source file="/var/lib/nova/mnt/07b80119d40eda06c63650e0d74e0ba5/volume-5fd5a675-1171-4dbc-af73-6b2a8a1dc457"/>
    2020-07-08 08:47:39.230+0000: 61778: debug : qemuProcessStop:7286 : Shutting down vm=0x7faf9801c430 name=instance-00000004 id=8 pid=0, reason=failed, asyncJob=start, flags=0x2

And it's undefined.

    2020-07-08 08:47:39.417+0000: 61780: debug : virDomainUndefineFlags:6304 : dom=0x7faf98018f90, (VM: name=instance-00000004, uuid=416b3517-84d9-4f93-85de-e395c3333c3e), flags=0x1
    2020-07-08 08:47:39.417+0000: 61780: info : qemuDomainUndefineFlags:7772 : Undefining domain 'instance-00000004'


As the above log shows libvirt didn't actually modify the original image or even touch it as _REUSE_EXT is used when creating the snapshots and it was supplied back to us via the newly defined XML. As of that there doesn't seem to be a libvirt problem related to snapshots here and the image was removed out of our control.

Please re-assign it to the openstack team to further investigation.

Comment 4 Peter Krempa 2020-07-09 10:22:47 UTC
(In reply to chhu from comment #2)
> I think this is an OSP bug, add more information here:
> 
> 1. Create a image and volume in OSP, boot VM from volume:
> Project->Instances->Launch Instance->Source:Volume r8t-vol
>    $ openstack image create r8t --disk-format qcow2 --container-format bare
> --file /tmp/RHEL-8.2-x86_64-latest.qcow2
>    $ openstack volume create r8t-vol --size 10 --image r8t
>    
>    Check the cinder on nfs storage:
>    [root@dell-per730-** cinder]# ls -lh
>    -rw-rw-rw-. 1 qemu qemu  10G Jul  9 05:35
> volume-2750c079-b5f5-48ec-8e71-b470c41abf08
> 
> 2. Shutdown the VM
>    [root@dell-per730-** cinder]# ls -lh
>    -rw-rw-rw-. 1 root root  10G Jul  9 05:35
> volume-2750c079-b5f5-48ec-8e71-b470c41abf08
> 
> 3. Do VM snapshot s1-down, volume snapshot is created
>    [root@dell-per730-** cinder]# ls -lh
>    -rw-rw-rw-. 1 root root  10G Jul  9 05:35
> volume-2750c079-b5f5-48ec-8e71-b470c41abf08
>    -rw-rw-rw-. 1 root root 193K Jul  9 05:36
> volume-2750c079-b5f5-48ec-8e71-b470c41abf08.52fd1a14-4f33-48a7-9ca3-
> 5c159cb3c660
> 
> 4. Delete VM snapshot s1-down, the volume-**bf08 is deleted. It should
> delete volume-**bf08.**3c660
>    [root@dell-per730-** cinder]# ls -lh
>    -rw-rw-rw-. 1 root root 193K Jul  9 05:36
> volume-2750c079-b5f5-48ec-8e71-b470c41abf08.52fd1a14-4f33-48a7-9ca3-
> 5c159cb3c660
> 
> 5. Do VM snapshot s2-down, volume-**bf08.**c1bb is created
>    [root@dell-per730-64 cinder]# ls -lh
>    -rw-rw-rw-. 1 root root 1.6G Jul  9 05:38
> volume-2750c079-b5f5-48ec-8e71-b470c41abf08.52fd1a14-4f33-48a7-9ca3-
> 5c159cb3c660
>    -rw-rw-rw-. 1 root root 193K Jul  9 05:40
> volume-2750c079-b5f5-48ec-8e71-b470c41abf08.bb18f197-7bce-47ca-91e4-
> f666048fc1bb
> 
> 6. Try to start the VM, hit error in libvirtd.log:
>   
> -----------------------------------------------------------------------------
> ---------------------
>    error : virStorageFileReportBrokenChain:5214 : Cannot access storage file
> '/var/lib/nova/mnt/07b80119d40eda06c63650e0d74e0ba5/volume-2750c079-b5f5-
> 48ec-8e71-b470c41abf08': No such file or directory

Yes, that's consistent with my findings from the log. The file is removed somewhere in between which is not recorded in the logs.

Thanks for update from openstack side.

Comment 5 Luigi Toscano 2020-07-09 10:30:32 UTC
If this is assigned to cinder we need at least the cinder configuration and the various cinder logs (volume, api, etc).

Comment 6 chhu 2020-07-09 12:13:19 UTC
Created attachment 1700428 [details]
cinder.conf, cinder-*.log

Comment 9 chhu 2020-07-11 03:08:44 UTC
Let's involve nova-compute developer to do further debugging.
I'll change the Component to nova, please feel free to change it back if it's not correct. Thank you!

Comment 12 chhu 2020-07-13 07:49:08 UTC
Created attachment 1700773 [details]
nfs-config.yaml

Comment 14 chhu 2020-07-21 07:56:02 UTC
Hi, Tzach

1. Boot VM from volume: Project->Instances->Launch Instance->Source:Volume r8-vol
2. Project->Instances-> Select the Instances -> Click "Shutoff"
3. Project->Instances-> Select the Instances -> Do VM snapshot: r8-vol-shutdown-s1, 
   volume snapshot is created, you can find it in Cinder -> volume snapshot

   And you can found the volume-**.** is created in the storage
   [root@dell-per730-** cinder]# ls -lh
   -rw-rw-rw-. 1 root root  10G Jul  9 05:35 volume-2750c079-b5f5-48ec-8e71-b470c41abf08
   -rw-rw-rw-. 1 root root 193K Jul  9 05:36 volume-2750c079-b5f5-48ec-8e71-b470c41abf08.52fd1a14-4f33-48a7-9ca3-5c159cb3c660

4. Project->Cinder->volume snapshot, select the snapshot, and click "Delete"
   the volume-**bf08 is deleted. It should delete volume-**bf08.**c660

5. Report step3 to do another snapshot, you can find another volume-**.** is created in cinder storage(same as in comment 2)
6. Project->Instances-> Select the Instances -> click "start instance",
   Can't start the VM as the volume is deleted in step4.

Comment 16 Eric Harney 2020-07-21 15:52:47 UTC
https://bugs.launchpad.net/cinder/+bug/1860913 seems to be the same bug upstream, and has some info about why this happens.

Comment 17 Lee Yarwood 2020-07-21 22:34:08 UTC
(In reply to Eric Harney from comment #16)
> https://bugs.launchpad.net/cinder/+bug/1860913 seems to be the same bug
> upstream, and has some info about why this happens.

Yup correct this looks like the same issue, I'm tracking this under the following bug that I'll mark this as a duplicate of for now while I work on getting the series updated upstream.

Instance uses base image file when it is rebooted after snapshot creation if cinder nfs backend is used
https://bugzilla.redhat.com/show_bug.cgi?id=1757691

*** This bug has been marked as a duplicate of bug 1757691 ***


Note You need to log in before you can comment on or make changes to this bug.