Bug 1470634 - Wrong allocation value after virDomainBlockCopy() (alloc=capacity)
Wrong allocation value after virDomainBlockCopy() (alloc=capacity)
Status: POST
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev (Show other bugs)
7.4
Unspecified Unspecified
urgent Severity urgent
: rc
: ---
Assigned To: Kevin Wolf
Qianqian Zhu
: Regression, ZStream
Depends On:
Blocks: 1473145
  Show dependency treegraph
 
Reported: 2017-07-13 06:54 EDT by Nir Soffer
Modified: 2017-07-20 02:42 EDT (History)
27 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1473145 (view as bug list)
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Vdsm log (794.89 KB, text/plain)
2017-07-13 06:54 EDT, Nir Soffer
no flags Details
Vdsm and libvirt debug logs for run 2 (102.45 KB, application/x-xz)
2017-07-13 16:20 EDT, Nir Soffer
no flags Details
setup script for creating block storage domains (621 bytes, text/plain)
2017-07-15 10:30 EDT, Nir Soffer
no flags Details
reproducer script (1.53 KB, text/plain)
2017-07-15 10:31 EDT, Nir Soffer
no flags Details
output of the reproducer script (24.79 KB, text/plain)
2017-07-15 10:33 EDT, Nir Soffer
no flags Details
reproducer log with build 13667657 (24.79 KB, patch)
2017-07-17 11:47 EDT, Nir Soffer
no flags Details | Diff
virsh log during ovirt live storage migration with build 13667657 (18.97 KB, text/plain)
2017-07-17 11:57 EDT, Nir Soffer
no flags Details
reproducer log with qemu build 13677788 (23.25 KB, patch)
2017-07-18 13:49 EDT, Nir Soffer
no flags Details | Diff
Live storage migration log with qemu build 13677788 (26.02 KB, text/plain)
2017-07-18 14:29 EDT, Nir Soffer
no flags Details

  None (edit)
Description Nir Soffer 2017-07-13 06:54:48 EDT
Created attachment 1297506 [details]
Vdsm log

Description of problem:

After invoking virDomainBlockCopy, the next virDomainBlockGetInfo returns wrong 
allocation value with alloc=capacity.

This cause RHV to pause the vm, as protection from invalid allocation values
from the underlying stack (this was added few years ago after some qemu bug).

Version-Release number of selected component (if applicable):
# rpm -qa | egrep '(libvirt|qemu)' | sort
ipxe-roms-qemu-20170123-1.git4e85b27.el7.noarch
libvirt-client-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-config-nwfilter-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-interface-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-network-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-nodedev-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-nwfilter-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-qemu-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-secret-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-core-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-disk-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-gluster-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-iscsi-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-logical-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-mpath-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-rbd-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-scsi-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-kvm-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-libs-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-lock-sanlock-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-python-3.2.0-3.el7.x86_64
qemu-img-rhev-2.9.0-14.el7.x86_64
qemu-kvm-common-rhev-2.9.0-14.el7.x86_64
qemu-kvm-rhev-2.9.0-14.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create vm with thin block disk
2. Start live storage migration

Actual results:
VM pause

Expected results:
VM is up during the live storage migration

1. Starting vm

virsh # dumpxml 4
<domain type='kvm' id='4'>
  <name>bz1467826</name>
  <uuid>aec5c331-cba9-4af1-995f-63a0b9472b0b</uuid>
  <metadata xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0">
    <ovirt-tune:qos/>
    <ovirt-vm:vm xmlns:ovirt-vm="http://ovirt.org/vm/1.0">
    <ovirt-vm:agentChannelName>com.redhat.rhevm.vdsm</ovirt-vm:agentChannelName>
    <ovirt-vm:destroy_on_reboot type="bool">False</ovirt-vm:destroy_on_reboot>
    <ovirt-vm:startTime type="float">1499941264.08</ovirt-vm:startTime>
</ovirt-vm:vm>
  </metadata>
  <maxMemory slots='16' unit='KiB'>4194304</maxMemory>
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>
  <vcpu placement='static' current='2'>16</vcpu>
  <cputune>
    <shares>1020</shares>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <sysinfo type='smbios'>
    <system>
      <entry name='manufacturer'>oVirt</entry>
      <entry name='product'>oVirt Node</entry>
      <entry name='version'>7.4-18.el7</entry>
      <entry name='serial'>77A14806-F954-4D2A-AEA6-C098A4469927</entry>
      <entry name='uuid'>aec5c331-cba9-4af1-995f-63a0b9472b0b</entry>
    </system>
  </sysinfo>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.3.0'>hvm</type>
    <smbios mode='sysinfo'/>
  </os>
  <features>
    <acpi/>
  </features>
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>SandyBridge</model>
    <topology sockets='16' cores='1' threads='1'/>
    <feature policy='require' name='vme'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='xsaveopt'/>
    <numa>
      <cell id='0' cpus='0-1' memory='1048576' unit='KiB'/>
    </numa>
  </cpu>
  <clock offset='variable' adjustment='0' basis='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/rhev/data-center/mnt/dumbo.tlv.redhat.com:_voodoo_iso-02/b9cda581-f9ac-4650-83e0-17d00d7dfc75/images/11111111-1111-1111-1111-111111111111/Fedora-Server-dvd-x86_64-25-1.3.iso' startupPolicy='optional'/>
      <backingStore/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <boot order='2'/>
      <alias name='ide0-1-0'/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    </disk>
    <disk type='block' device='disk' snapshot='no'>
      <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/>
      <source dev='/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/df9692ab-e64b-40bc-82b4-005b6d126eeb/e6c91d0b-c50f-40d4-893a-fa8d469459c4'/>
      <backingStore type='block' index='1'>
        <format type='qcow2'/>
        <source dev='/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/df9692ab-e64b-40bc-82b4-005b6d126eeb/93331705-46be-4cb8-9dc2-c1559843fd4a'/>
        <backingStore/>
      </backingStore>
      <target dev='sda' bus='scsi'/>
      <serial>df9692ab-e64b-40bc-82b4-005b6d126eeb</serial>
      <boot order='1'/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <controller type='usb' index='0' model='piix3-uhci'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='scsi' index='0' model='virtio-scsi'>
      <alias name='scsi0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0' ports='16'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <controller type='ide' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <interface type='bridge'>
      <mac address='00:00:00:ff:41:08'/>
      <source bridge='ovirtmgmt'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <filterref filter='vdsm-no-mac-spoofing'/>
      <link state='up'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channels/aec5c331-cba9-4af1-995f-63a0b9472b0b.com.redhat.rhevm.vdsm'/>
      <target type='virtio' name='com.redhat.rhevm.vdsm' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channels/aec5c331-cba9-4af1-995f-63a0b9472b0b.org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel1'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0' state='disconnected'/>
      <alias name='channel2'/>
      <address type='virtio-serial' controller='0' bus='0' port='3'/>
    </channel>
    <input type='mouse' bus='ps2'>
      <alias name='input0'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input1'/>
    </input>
    <graphics type='spice' tlsPort='5900' autoport='yes' listen='10.35.0.110' defaultMode='secure' passwdValidTo='2017-07-13T10:23:08'>
      <listen type='network' address='10.35.0.110' network='vdsm-ovirtmgmt'/>
      <channel name='main' mode='secure'/>
      <channel name='display' mode='secure'/>
      <channel name='inputs' mode='secure'/>
      <channel name='cursor' mode='secure'/>
      <channel name='playback' mode='secure'/>
      <channel name='record' mode='secure'/>
      <channel name='smartcard' mode='secure'/>
      <channel name='usbredir' mode='secure'/>
    </graphics>
    <video>
      <model type='qxl' ram='65536' vram='8192' vgamem='16384' heads='1' primary='yes'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </memballoon>
    <rng model='virtio'>
      <backend model='random'>/dev/urandom</backend>
      <alias name='rng0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </rng>
  </devices>
  <seclabel type='dynamic' model='selinux' relabel='yes'>
    <label>system_u:system_r:svirt_t:s0:c80,c625</label>
    <imagelabel>system_u:object_r:svirt_image_t:s0:c80,c625</imagelabel>
  </seclabel>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+107:+107</label>
    <imagelabel>+107:+107</imagelabel>
  </seclabel>
</domain>

2. Checking allocation

virsh # domblkinfo 4 sda
Capacity:       8589934592
Allocation:     27459584
Physical:       1073741824

3. Writing some data inside the vm

# dd if=/dev/urandom of=data bs=8M count=10 oflag=direct
10+0 records in
10+0 records out
83886080 bytes (84 MB, 80 MiB) copied, 1.68035 s, 49.9 MB/s

4. Checking allocation 

virsh # domblkinfo 4 sda
Capacity:       8589934592
Allocation:     112525312
Physical:       1073741824

5. Performing live snapshot in ovirt

2017-07-13 13:27:07,337+0300 INFO  (jsonrpc/1) [virt.vm] (vmId='aec5c331-cba9-4af1-995f-63a0b9472b0b') <?xml version='1.0' encoding='UTF-8'?>
<domainsnapshot><disks><disk name="sda" snapshot="external" type="block"><source dev="/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/df9692ab-e64b-40bc-82b4-005b6d126eeb/6c7364d7-4b93-4213-8ce4-46ae394a4a29" type="block" /></disk></disks></domainsnapshot> (vm:3919)
2017-07-13 13:27:07,337+0300 INFO  (jsonrpc/1) [vdsm.api] START freeze() from=::ffff:10.35.0.111,38640, flow_id=41efcc82-52cc-4427-b098-eb84ffe204ed (api:46)
2017-07-13 13:27:07,338+0300 INFO  (jsonrpc/1) [virt.vm] (vmId='aec5c331-cba9-4af1-995f-63a0b9472b0b') Freezing guest filesystems (vm:3715)
2017-07-13 13:27:07,342+0300 WARN  (jsonrpc/1) [virt.vm] (vmId='aec5c331-cba9-4af1-995f-63a0b9472b0b') Unable to freeze guest filesystems: Guest agent is not responding: QEMU guest agent is not connected (vm:3720)
2017-07-13 13:27:07,342+0300 INFO  (jsonrpc/1) [vdsm.api] FINISH freeze return={'status': {'message': 'Guest agent is not responding: QEMU guest agent is not connected', 'code': 19}} from=::ffff:10.35.0.111,38640, flow_id=41efcc82-52cc-4427-b098-eb84ffe204ed (api:52)
2017-07-13 13:27:07,342+0300 INFO  (jsonrpc/1) [virt.vm] (vmId='aec5c331-cba9-4af1-995f-63a0b9472b0b') Taking a live snapshot (drives=sda, memory=True) (vm:3935)
2017-07-13 13:27:07,510+0300 INFO  (jsonrpc/1) [virt.vm] (vmId='aec5c331-cba9-4af1-995f-63a0b9472b0b') Completed live snapshot (vm:3937)
2017-07-13 13:27:07,510+0300 INFO  (jsonrpc/1) [vdsm.api] START thaw() from=::ffff:10.35.0.111,38640, flow_id=41efcc82-52cc-4427-b098-eb84ffe204ed (api:46)
2017-07-13 13:27:07,511+0300 INFO  (jsonrpc/1) [virt.vm] (vmId='aec5c331-cba9-4af1-995f-63a0b9472b0b') Thawing guest filesystems (vm:3739)
2017-07-13 13:27:07,511+0300 WARN  (jsonrpc/1) [virt.vm] (vmId='aec5c331-cba9-4af1-995f-63a0b9472b0b') Unable to thaw guest filesystems: Guest agent is not responding: QEMU guest agent is not connected (vm:3744)
2017-07-13 13:27:07,511+0300 INFO  (jsonrpc/1) [vdsm.api] FINISH thaw return={'status': {'message': 'Guest agent is not responding: QEMU guest agent is not connected', 'code': 19}} from=::ffff:10.35.0.111,38640, flow_id=41efcc82-52cc-4427-b098-eb84ffe204ed (api:52)
2017-07-13 13:27:07,512+0300 DEBUG (jsonrpc/1) [virt.vm] (vmId='aec5c331-cba9-4af1-995f-63a0b9472b0b') Drive sda moved from u'/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/df9692ab-e64b-40bc-82b4-005b6d126eeb/e6c91d0b-c50f-40d4-893a-fa8d469459c4' to u'/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/df9692ab-e64b-40bc-82b4-005b6d126eeb/6c7364d7-4b93-4213-8ce4-46ae394a4a29' (storage:358)

6. Checking allocation

virsh # domblkinfo 4 sda
Capacity:       8589934592
Allocation:     917504
Physical:       1073741824

7. More writes in the vm

# dd if=/dev/urandom of=data bs=8M count=10 oflag=direct
10+0 records in
10+0 records out
83886080 bytes (84 MB, 80 MiB) copied, 1.51129 s, 55.5 MB/s

8. checking allocation

virsh # domblkinfo 4 sda
Capacity:       8589934592
Allocation:     85000192
Physical:       1073741824

9. Starting live storage migration in ovirt (calls blockCopy())

2017-07-13 13:32:19,907+0300 INFO  (jsonrpc/1) [vdsm.api] START diskReplicateStart(srcDisk={u'device': u'disk', u'poolID': u'6c99f4e5-8588-46f5-a818-e11151c1d19c', u'volumeID': u'e27836fd-6a
17-4b87-85fe-ce470f02de84', u'domainID': u'aed577ea-d1ca-4ebe-af80-f852c7ce59bb', u'imageID': u'df9692ab-e64b-40bc-82b4-005b6d126eeb'}, dstDisk={u'device': u'disk', u'poolID': u'6c99f4e5-858
8-46f5-a818-e11151c1d19c', u'volumeID': u'e27836fd-6a17-4b87-85fe-ce470f02de84', u'domainID': u'd0812abd-2beb-4103-b00c-b9bc4ac77085', u'imageID': u'df9692ab-e64b-40bc-82b4-005b6d126eeb'}) from=::ffff:10.35.0.111,38640, flow_id=a92af502-d0a3-4aba-8b63-55b5ea3de8d7 (api:46)

2017-07-13 13:32:20,133+0300 DEBUG (periodic/0) [virt.vm] (vmId='aec5c331-cba9-4af1-995f-63a0b9472b0b') Extend info for drive sda (vol_id=e27836fd-6a17-4b87-85fe-ce470f02de84, sd_id=aed577ea-d1ca-4ebe-af80-f852c7ce59bb, format=cow, capacity=8589934592, allocated=0, physical=1073741824) (vm:1038)

    NOTE: allocated=0

2017-07-13 13:32:20,562+0300 INFO  (jsonrpc/1) [vds] prepared volume path: /rhev/data-center/mnt/blockSD/d0812abd-2beb-4103-b00c-b9bc4ac77085/images/df9692ab-e64b-40bc-82b4-005b6d126eeb/e27836fd-6a17-4b87-85fe-ce470f02de84 (clientIF:389)

2017-07-13 13:32:20,570+0300 DEBUG (jsonrpc/1) [virt.vm] (vmId='aec5c331-cba9-4af1-995f-63a0b9472b0b') Replicating drive sda to <?xml version='1.0' encoding='UTF-8'?>
<disk device="disk" snapshot="no" type="block"><source dev="/rhev/data-center/mnt/blockSD/d0812abd-2beb-4103-b00c-b9bc4ac77085/images/df9692ab-e64b-40bc-82b4-005b6d126eeb/e27836fd-6a17-4b87-85fe-ce470f02de84" /><driver cache="none" error_policy="stop" io="native" name="qemu" type="qcow2" /></disk> (vm:4140)

2017-07-13 13:32:20,667+0300 DEBUG (libvirt/events) [virt.vm] (vmId='aec5c331-cba9-4af1-995f-63a0b9472b0b') unhandled libvirt event (event_name=Unknown id 8, args=('/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/df9692ab-e64b-40bc-82b4-005b6d126eeb/e27836fd-6a17-4b87-85fe-ce470f02de84', 2, 3, 8)) (clientIF:559)

2017-07-13 13:32:20,669+0300 DEBUG (jsonrpc/1) [virt.vm] (vmId='aec5c331-cba9-4af1-995f-63a0b9472b0b') Extend info for drive sda (vol_id=e27836fd-6a17-4b87-85fe-ce470f02de84, sd_id=aed577ea-d1ca-4ebe-af80-f852c7ce59bb, format=cow, capacity=8589934592, allocated=8589934592, physical=1073741824) (vm:1038)

    NOTE: allocated=8589934592

2017-07-13 13:32:22,135+0300 ERROR (periodic/2) [virt.vm] (vmId='aec5c331-cba9-4af1-995f-63a0b9472b0b') Improbable extension request for volume e27836fd-6a17-4b87-85fe-ce470f02de84 on domain aed577ea-d1ca-4ebe-af80-f852c7ce59bb, pausing the VM to avoid corruptions (capacity: 8589934592, allocated: 8589934592, physical: 1073741824, next physical size: 3221225472) (vm:1081)

    We pause the vm at this point because of the improbable alocation value.


10. Checking allocation

virsh # domblkinfo 4 sda
Capacity:       8589934592
Allocation:     8589934592
Physical:       3221225472

11. Disk xml

    <disk type='block' device='disk' snapshot='no'>
      <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/>
      <source dev='/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/df9692ab-e64b-40bc-82b4-005b6d126eeb/e27836fd-6a17-4b87-85fe-ce470f02de84'/>
      <backingStore type='block' index='1'>
        <format type='qcow2'/>
        <source dev='/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/df9692ab-e64b-40bc-82b4-005b6d126eeb/6c7364d7-4b93-4213-8ce4-46ae394a4a29'/>
        <backingStore type='block' index='2'>
          <format type='qcow2'/>
          <source dev='/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/df9692ab-e64b-40bc-82b4-005b6d126eeb/e6c91d0b-c50f-40d4-893a-fa8d469459c4'/>
          <backingStore type='block' index='3'>
            <format type='qcow2'/>
            <source dev='/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/df9692ab-e64b-40bc-82b4-005b6d126eeb/93331705-46be-4cb8-9dc2-c1559843fd4a'/>
            <backingStore/>
          </backingStore>
        </backingStore>
      </backingStore>
      <mirror type='block' job='copy' ready='yes'>
        <format type='qcow2'/>
        <source dev='/rhev/data-center/mnt/blockSD/d0812abd-2beb-4103-b00c-b9bc4ac77085/images/df9692ab-e64b-40bc-82b4-005b6d126eeb/e27836fd-6a17-4b87-85fe-ce470f02de84'/>
      </mirror>
      <target dev='sda' bus='scsi'/>
      <serial>df9692ab-e64b-40bc-82b4-005b6d126eeb</serial>
      <boot order='1'/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

Note: I don't have libvirt logs, will reproduce again with libvirt debug logs.
Comment 2 Nir Soffer 2017-07-13 07:02:47 EDT
Raz, can you reproduce this issue again with libvirt debug logs, both on current
libvirt and the scratch build for bug 1470127?
Comment 4 Nir Soffer 2017-07-13 07:17:37 EDT
This is the blockCopy call used in tested vdsm version - this is the call
used in ovirt 4.1:

4142         flags = (libvirt.VIR_DOMAIN_BLOCK_COPY_SHALLOW |
4143                  libvirt.VIR_DOMAIN_BLOCK_COPY_REUSE_EXT)
4144 
4145         self._dom.blockCopy(drive.name, destxml, flags=flags)

On master we switch to persistent domains yesterday, so we are calling now:

4188         flags = (libvirt.VIR_DOMAIN_BLOCK_COPY_SHALLOW |
4189                  libvirt.VIR_DOMAIN_BLOCK_COPY_REUSE_EXT |
4190                  VIR_DOMAIN_BLOCK_COPY_TRANSIENT_JOB)
4191 
4192         self._dom.blockCopy(drive.name, destxml, flags=flags)

Need to test again with master.
Comment 5 Elad 2017-07-13 08:18:51 EDT
We will test it today
Comment 8 John Ferlan 2017-07-13 11:17:14 EDT
First off BlockCopy is a different area of the code... I'm not as "knowledgeable" about that code. In particular, that multi-level <backingStore ...> is something I'd need to have a closer look at how it works...

Might also be interesting to know what virsh domstats returns for sda, e.g.:

virsh domstats $DOM --backing | grep -E "name|allocation|capacity|physical"

FWIW:
The blockstats values are a closer representation of what QEMU knows about, while the blockinfo has some gray area.

For domstats, those values are:

  "block.<num>.allocation" - offset of the highest written sector
                             as unsigned long long.
  "block.<num>.capacity" - logical size in bytes of the block device
                           backing image as unsigned long long.
  "block.<num>.physical" - physical size in bytes of the container of the
                           backing image as unsigned long long.

For blkinfo, those values are:

capacity	logical size in bytes of the image (how much storage the guest will see)
allocation	host storage in bytes occupied by the image (such as highest allocated extent if there are no holes, similar to 'du')
physical	host physical size in bytes of the image container (last offset, similar to 'ls'

It seems to me there's a desire to get statistical information using the BlockInfo API
Comment 9 Nir Soffer 2017-07-13 12:12:50 EDT
The issue is all RHV versions are using BlockInfo to get the allocation. We cannot
fix this for customers running existing products. These customers setup will break
when upgrading to libvirt 3.2. Breaking backward compatibility is not an option.

For next version, we can move to domstats api, we wanted to do this anyway since
we can do one call per vm instead of one call per vm per disk, and it we also get
the allocation of internal volumes, which can be useful for live merge flow.
Comment 10 Elad 2017-07-13 14:20:25 EDT
Couldn't reproduce using:
4.2 master
vdsm-4.20.1-176.gitecbab6b.el7.centos.x86_64
libvirt-daemon-3.2.0-15.el7_rc.9a49cd47ef.x86_64

VM with RHEL OS with an additional disk:
Tested with 10G thin provision disk - created as 1G LV
Tested also with preallocated 25G disk - created as 25G LV

For both cases:
Started the VM and live migrated the disk (for the preallocated disk, it changed allocation policy to thin due to snapshot creation for live storage migration)
VM remains running through all live storage migration operation.
Comment 11 Nir Soffer 2017-07-13 14:44:06 EDT
(In reply to Elad from comment #10)
> Couldn't reproduce using:
> 4.2 master
> vdsm-4.20.1-176.gitecbab6b.el7.centos.x86_64
We moved to persistent domains yesterday, maybe it does not happen in this
configuration. Can you test with latest 4.1 (vdsm-4.19.x)?
Comment 12 John Ferlan 2017-07-13 15:16:00 EDT
I understand the historical usage problem as libvirt faces that often too.

There's a few different variables in this particular bz as opposed to just the allocation and/or capacity variable changing. Mirgration using BlockCopy and multiple level backingStore presents a different challenge and I'm not 100% sure whether underlying QEMU adjustments in the block layer have an impact as well.

Looking through libvirt sources, I don't see anywhere obvious where allocation has been set to capacity. If anything allocation could be set to physical - so I'm not sure what could have caused the allocation to jump like that. Looks like we'll need more libvirtd debug data in an environment where the issue is reproduced.

Don't forget to get the domstats output as that should be beneficial.
Comment 13 Nir Soffer 2017-07-13 16:16:35 EDT
I did another run with libvirt debug logs.

1. Staring vm

virsh # dumpxml 2
<domain type='kvm' id='2'>
  <name>bz1470634</name>
  <uuid>585d96ca-49d7-4ffc-800b-c24990349383</uuid>
  <metadata xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0">
    <ovirt-tune:qos/>
    <ovirt-vm:vm xmlns:ovirt-vm="http://ovirt.org/vm/1.0">
    <ovirt-vm:agentChannelName>com.redhat.rhevm.vdsm</ovirt-vm:agentChannelName>
    <ovirt-vm:destroy_on_reboot type="bool">False</ovirt-vm:destroy_on_reboot>
    <ovirt-vm:startTime type="float">1499971723.4</ovirt-vm:startTime>
</ovirt-vm:vm>
  </metadata>
  <maxMemory slots='16' unit='KiB'>4194304</maxMemory>
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>
  <vcpu placement='static' current='2'>16</vcpu>
  <cputune>
    <shares>1020</shares>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <sysinfo type='smbios'>
    <system>
      <entry name='manufacturer'>oVirt</entry>
      <entry name='product'>oVirt Node</entry>
      <entry name='version'>7.4-18.el7</entry>
      <entry name='serial'>77A14806-F954-4D2A-AEA6-C098A4469927</entry>
      <entry name='uuid'>585d96ca-49d7-4ffc-800b-c24990349383</entry>
    </system>
  </sysinfo>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.3.0'>hvm</type>
    <smbios mode='sysinfo'/>
  </os>
  <features>
    <acpi/>
  </features>
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>SandyBridge</model>
    <topology sockets='16' cores='1' threads='1'/>
    <feature policy='require' name='vme'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='xsaveopt'/>
    <numa>
      <cell id='0' cpus='0-1' memory='1048576' unit='KiB'/>
    </numa>
  </cpu>
  <clock offset='variable' adjustment='0' basis='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source startupPolicy='optional'/>
      <backingStore/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <alias name='ide0-1-0'/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    </disk>
    <disk type='block' device='disk' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/>
      <source dev='/rhev/data-center/mnt/blockSD/d0812abd-2beb-4103-b00c-b9bc4ac77085/images/6d8fa77d-9b67-4150-bfef-06a4e7c66cf4/058224ec-791b-4373-b006-aa4d6e2f7283'/>
      <backingStore/>
      <target dev='sda' bus='scsi'/>
      <serial>6d8fa77d-9b67-4150-bfef-06a4e7c66cf4</serial>
      <boot order='1'/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <controller type='usb' index='0' model='piix3-uhci'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='scsi' index='0' model='virtio-scsi'>
      <alias name='scsi0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0' ports='16'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <controller type='ide' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <interface type='bridge'>
      <mac address='00:00:00:ff:41:09'/>
      <source bridge='ovirtmgmt'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <filterref filter='vdsm-no-mac-spoofing'/>
      <link state='up'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channels/585d96ca-49d7-4ffc-800b-c24990349383.com.redhat.rhevm.vdsm'/>
      <target type='virtio' name='com.redhat.rhevm.vdsm' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channels/585d96ca-49d7-4ffc-800b-c24990349383.org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel1'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0' state='disconnected'/>
      <alias name='channel2'/>
      <address type='virtio-serial' controller='0' bus='0' port='3'/>
    </channel>
    <input type='mouse' bus='ps2'>
      <alias name='input0'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input1'/>
    </input>
    <graphics type='spice' tlsPort='5900' autoport='yes' listen='10.35.0.110' defaultMode='secure' passwdValidTo='1970-01-01T00:00:01'>
      <listen type='network' address='10.35.0.110' network='vdsm-ovirtmgmt'/>
      <channel name='main' mode='secure'/>
      <channel name='display' mode='secure'/>
      <channel name='inputs' mode='secure'/>
      <channel name='cursor' mode='secure'/>
      <channel name='playback' mode='secure'/>
      <channel name='record' mode='secure'/>
      <channel name='smartcard' mode='secure'/>
      <channel name='usbredir' mode='secure'/>
    </graphics>
    <video>
      <model type='qxl' ram='65536' vram='8192' vgamem='16384' heads='1' primary='yes'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </memballoon>
    <rng model='virtio'>
      <backend model='random'>/dev/urandom</backend>
      <alias name='rng0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </rng>
  </devices>
  <seclabel type='dynamic' model='selinux' relabel='yes'>
    <label>system_u:system_r:svirt_t:s0:c354,c582</label>
    <imagelabel>system_u:object_r:svirt_image_t:s0:c354,c582</imagelabel>
  </seclabel>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+107:+107</label>
    <imagelabel>+107:+107</imagelabel>
  </seclabel>
</domain>


virsh # domstats --backing 2
Domain: 'bz1470634'
  state.state=1
  state.reason=1
  cpu.time=2609290083
  cpu.user=110000000
  cpu.system=230000000
  balloon.current=1048576
  balloon.maximum=1048576
  balloon.last-update=0
  balloon.rss=35112
  vcpu.current=2
  vcpu.maximum=16
  vcpu.0.state=1
  vcpu.0.time=1860000000
  vcpu.0.wait=0
  vcpu.0.halted=yes
  vcpu.1.state=1
  vcpu.1.time=0
  vcpu.1.wait=0
  vcpu.1.halted=yes
  net.count=1
  net.0.name=vnet0
  net.0.rx.bytes=90
  net.0.rx.pkts=1
  net.0.rx.errs=0
  net.0.rx.drop=0
  net.0.tx.bytes=0
  net.0.tx.pkts=0
  net.0.tx.errs=0
  net.0.tx.drop=0
  block.count=2
  block.0.name=hdc
  block.0.rd.reqs=0
  block.0.rd.bytes=0
  block.0.rd.times=0
  block.0.wr.reqs=0
  block.0.wr.bytes=0
  block.0.wr.times=0
  block.0.fl.reqs=0
  block.0.fl.times=0
  block.0.allocation=0
  block.0.physical=0
  block.1.name=sda
  block.1.path=/rhev/data-center/mnt/blockSD/d0812abd-2beb-4103-b00c-b9bc4ac77085/images/6d8fa77d-9b67-4150-bfef-06a4e7c66cf4/058224ec-791b-4373-b006-aa4d6e2f7283
  block.1.rd.reqs=1
  block.1.rd.bytes=512
  block.1.rd.times=1078839
  block.1.wr.reqs=0
  block.1.wr.bytes=0
  block.1.wr.times=0
  block.1.fl.reqs=0
  block.1.fl.times=0
  block.1.allocation=0
  block.1.capacity=21474836480
  block.1.physical=21474836480

2. Starting live storage migratin in ovirt

This does:
 a. create snapshot 
 b. create destination disk on another storage
 c. call libvirt.virDomain.blockCopy() with the destination disk

virsh # dumpxml 2
<domain type='kvm' id='2'>
  <name>bz1470634</name>
  <uuid>585d96ca-49d7-4ffc-800b-c24990349383</uuid>
  <metadata xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0">
    <ovirt-tune:qos/>
    <ovirt-vm:vm xmlns:ovirt-vm="http://ovirt.org/vm/1.0">
    <ovirt-vm:agentChannelName>com.redhat.rhevm.vdsm</ovirt-vm:agentChannelName>
    <ovirt-vm:destroy_on_reboot type="bool">False</ovirt-vm:destroy_on_reboot>
    <ovirt-vm:startTime type="float">1499971723.4</ovirt-vm:startTime>
</ovirt-vm:vm>
  </metadata>
  <maxMemory slots='16' unit='KiB'>4194304</maxMemory>
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>
  <vcpu placement='static' current='2'>16</vcpu>
  <cputune>
    <shares>1020</shares>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <sysinfo type='smbios'>
    <system>
      <entry name='manufacturer'>oVirt</entry>
      <entry name='product'>oVirt Node</entry>
      <entry name='version'>7.4-18.el7</entry>
      <entry name='serial'>77A14806-F954-4D2A-AEA6-C098A4469927</entry>
      <entry name='uuid'>585d96ca-49d7-4ffc-800b-c24990349383</entry>
    </system>
  </sysinfo>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.3.0'>hvm</type>
    <smbios mode='sysinfo'/>
  </os>
  <features>
    <acpi/>
  </features>
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>SandyBridge</model>
    <topology sockets='16' cores='1' threads='1'/>
    <feature policy='require' name='vme'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='xsaveopt'/>
    <numa>
      <cell id='0' cpus='0-1' memory='1048576' unit='KiB'/>
    </numa>
  </cpu>
  <clock offset='variable' adjustment='0' basis='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source startupPolicy='optional'/>
      <backingStore/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <alias name='ide0-1-0'/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    </disk>
    <disk type='block' device='disk' snapshot='no'>
      <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/>
      <source dev='/rhev/data-center/mnt/blockSD/d0812abd-2beb-4103-b00c-b9bc4ac77085/images/6d8fa77d-9b67-4150-bfef-06a4e7c66cf4/ba76fbfc-3cb8-4876-9ec6-6173d63772e2'/>
      <backingStore type='block' index='1'>
        <format type='raw'/>
        <source dev='/rhev/data-center/mnt/blockSD/d0812abd-2beb-4103-b00c-b9bc4ac77085/images/6d8fa77d-9b67-4150-bfef-06a4e7c66cf4/058224ec-791b-4373-b006-aa4d6e2f7283'/>
        <backingStore/>
      </backingStore>
      <mirror type='block' job='copy' ready='yes'>
        <format type='qcow2'/>
        <source dev='/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/6d8fa77d-9b67-4150-bfef-06a4e7c66cf4/ba76fbfc-3cb8-4876-9ec6-6173d63772e2'/>
      </mirror>
      <target dev='sda' bus='scsi'/>
      <serial>6d8fa77d-9b67-4150-bfef-06a4e7c66cf4</serial>
      <boot order='1'/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <controller type='usb' index='0' model='piix3-uhci'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='scsi' index='0' model='virtio-scsi'>
      <alias name='scsi0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0' ports='16'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <controller type='ide' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <interface type='bridge'>
      <mac address='00:00:00:ff:41:09'/>
      <source bridge='ovirtmgmt'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <filterref filter='vdsm-no-mac-spoofing'/>
      <link state='up'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channels/585d96ca-49d7-4ffc-800b-c24990349383.com.redhat.rhevm.vdsm'/>
      <target type='virtio' name='com.redhat.rhevm.vdsm' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channels/585d96ca-49d7-4ffc-800b-c24990349383.org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel1'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0' state='disconnected'/>
      <alias name='channel2'/>
      <address type='virtio-serial' controller='0' bus='0' port='3'/>
    </channel>
    <input type='mouse' bus='ps2'>
      <alias name='input0'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input1'/>
    </input>
    <graphics type='spice' tlsPort='5900' autoport='yes' listen='10.35.0.110' defaultMode='secure' passwdValidTo='1970-01-01T00:00:01'>
      <listen type='network' address='10.35.0.110' network='vdsm-ovirtmgmt'/>
      <channel name='main' mode='secure'/>
      <channel name='display' mode='secure'/>
      <channel name='inputs' mode='secure'/>
      <channel name='cursor' mode='secure'/>
      <channel name='playback' mode='secure'/>
      <channel name='record' mode='secure'/>
      <channel name='smartcard' mode='secure'/>
      <channel name='usbredir' mode='secure'/>
    </graphics>
    <video>
      <model type='qxl' ram='65536' vram='8192' vgamem='16384' heads='1' primary='yes'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </memballoon>
    <rng model='virtio'>
      <backend model='random'>/dev/urandom</backend>
      <alias name='rng0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </rng>
  </devices>
  <seclabel type='dynamic' model='selinux' relabel='yes'>
    <label>system_u:system_r:svirt_t:s0:c354,c582</label>
    <imagelabel>system_u:object_r:svirt_image_t:s0:c354,c582</imagelabel>
  </seclabel>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+107:+107</label>
    <imagelabel>+107:+107</imagelabel>
  </seclabel>
</domain>

virsh # domstats --backing 2
Domain: 'bz1470634'
  state.state=3
  state.reason=1
  cpu.time=4324202072
  cpu.user=150000000
  cpu.system=260000000
  balloon.current=1048576
  balloon.maximum=1048576
  balloon.last-update=0
  balloon.rss=36836
  vcpu.current=2
  vcpu.maximum=16
  vcpu.0.state=1
  vcpu.0.time=3000000000
  vcpu.0.wait=0
  vcpu.0.halted=yes
  vcpu.1.state=1
  vcpu.1.time=0
  vcpu.1.wait=0
  vcpu.1.halted=yes
  net.count=1
  net.0.name=vnet0
  net.0.rx.bytes=230
  net.0.rx.pkts=3
  net.0.rx.errs=0
  net.0.rx.drop=901
  net.0.tx.bytes=0
  net.0.tx.pkts=0
  net.0.tx.errs=0
  net.0.tx.drop=0
  block.count=3
  block.0.name=hdc
  block.0.rd.reqs=0
  block.0.rd.bytes=0
  block.0.rd.times=0
  block.0.wr.reqs=0
  block.0.wr.bytes=0
  block.0.wr.times=0
  block.0.fl.reqs=0
  block.0.fl.times=0
  block.0.allocation=0
  block.0.physical=0
  block.1.name=sda
  block.1.path=/rhev/data-center/mnt/blockSD/d0812abd-2beb-4103-b00c-b9bc4ac77085/images/6d8fa77d-9b67-4150-bfef-06a4e7c66cf4/ba76fbfc-3cb8-4876-9ec6-6173d63772e2
  block.1.rd.reqs=1
  block.1.rd.bytes=512
  block.1.rd.times=1078839
  block.1.wr.reqs=0
  block.1.wr.bytes=0
  block.1.wr.times=0
  block.1.fl.reqs=0
  block.1.fl.times=0
  block.1.allocation=0
  block.1.capacity=21474836480
  block.1.physical=21474836480  <<<< this is wrong, this is a 1g lv on the destination
  block.2.name=sda
  block.2.path=/rhev/data-center/mnt/blockSD/d0812abd-2beb-4103-b00c-b9bc4ac77085/images/6d8fa77d-9b67-4150-bfef-06a4e7c66cf4/058224ec-791b-4373-b006-aa4d6e2f7283
  block.2.backingIndex=1
  block.2.rd.reqs=0
  block.2.rd.bytes=0
  block.2.rd.times=0
  block.2.wr.reqs=0
  block.2.wr.bytes=0
  block.2.wr.times=0
  block.2.fl.reqs=0
  block.2.fl.times=0
  block.2.allocation=0
  block.2.capacity=21474836480
  block.2.physical=21474836480


virsh # domblkinfo 2 sda
Capacity:       21474836480
Allocation:     21474836480
Physical:       3221225472

virsh # list
 Id    Name                           State
----------------------------------------------------
 2     bz1470634                      paused

Looks like qemu return wrong physical value (physical==capacity)
during block copy.

We assumed that this code was correct because physical is
always zero for block:

11701     /* Unlike GetStatsBlock, this API has defined the expected return values
11702      * for allocation and physical slightly differently.
11703      *
11704      * Having a zero for either or if they're the same is an indication that
11705      * there's a sparse file backing this device. In this case, we'll force
11706      * the setting of physical based on the on disk file size.
11707      *
11708      * Additionally, if qemu hasn't written to the file yet, then set the
11709      * allocation to whatever qemu returned for physical (e.g. the "actual-
11710      * size" from the json query) as that will match the expected allocation
11711      * value for this API. NB: May still be 0 for block. */
11712     if (entry->physical == 0 || info->allocation == 0 ||
11713         info->allocation == entry->physical) {
11714         if (info->allocation == 0)
11715             info->allocation = entry->physical;

But if allocation is zero, and physical is not, we are overriding
allocation value.

I think the safe way to make the api happy without breaking backward
compatibility is:

    if (entry->physical == 0 || info->allocation == 0 ||
        info->allocation == entry->physical) {
        if (virStorageSourceGetActualType(disk->src) == VIR_STORAGE_TYPE_FILE &&
            info->allocation == 0)
            info->allocation = entry->physical;
Comment 14 Nir Soffer 2017-07-13 16:20 EDT
Created attachment 1297854 [details]
Vdsm and libvirt debug logs for run 2
Comment 15 Nir Soffer 2017-07-13 19:00:06 EDT
Reproduce also with vdsm 4.2, using persistent domains, see comment 4.
Comment 16 John Ferlan 2017-07-13 19:37:47 EDT
While I can understand your assertion and I'm not against it, per se; however, adjusting the code based on a bug or bad value received from a lower layer could mean that eventually when that bug is fixed, the correct value won't be displayed because of this new layer of protection added because we got bad data.

The usage of ActualType in the previous check (above a few lines while managing the wr_highest_offset value) was a result of commit id 'cf436a56'.  So what does "qemu-img info --backing-chain" say about this source path at this point in time?  Does it display the correct or incorrect values?
Comment 17 Nir Soffer 2017-07-14 12:35:16 EDT
(In reply to John Ferlan from comment #16)
I agree that hiding bad value from qemu is bad idea, but I suspect that the issue 
is in libvirt handling of the values from qemu.

In the libvirt log you can see that qemu returns "actual-size": 0. The wrong value
21474836480 appears in qemu response only in "virtual-size".

My theory is:

1. qemu returns "virtual-size": 21474836480, "actual-size": 0
2. libvirt code preparing a stats entry changing actual-size to the value
   of virtual-size - but only after starting blockCopy.
3. the code trying to "fix" allocation, assuming that physical is always 0 for 
   block is fixing allocation to physical.

And regarding https://bugzilla.redhat.com/show_bug.cgi?id=1467826#c29
we cannot use domstats because it return wrong physical value. We need
both allocation and physical to extend volumes when they become full.

We basically do:

    if physical - alloc < drive.watermarkLimit:
        extend the drive...

This must work, even when we will sue the new libvirt threshold events to
minimize polling.

Regarding your question about actual lv sizes in this state, here is another
run - this time with file based disk (this disk was moved from block storage
while the vm was running).

virsh # dumpxml 4
...
    <disk type='file' device='disk' snapshot='no'>
      <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/>
      <source file='/rhev/data-center/mnt/dumbo.tlv.redhat.com:_voodoo_40/d6e4a622-bd31-4d8f-904d-1e26b7286757/images/a5930327-05bc-4b26-b1b5-5a09c720039e/5f9cfbe2-8ced-4e25-bda8-cbec2b3f1d72'/>
      <backingStore type='file' index='1'>
        <format type='raw'/>
        <source file='/rhev/data-center/mnt/dumbo.tlv.redhat.com:_voodoo_40/d6e4a622-bd31-4d8f-904d-1e26b7286757/images/a5930327-05bc-4b26-b1b5-5a09c720039e/25df7f6e-f5de-4dac-8eb8-2f7b991d4bdb'/>
        <backingStore/>
      </backingStore>
      <mirror type='block' job='copy' ready='yes'>
        <format type='qcow2'/>
        <source dev='/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/a5930327-05bc-4b26-b1b5-5a09c720039e/5f9cfbe2-8ced-4e25-bda8-cbec2b3f1d72'/>
      </mirror>
      <target dev='sda' bus='scsi'/>
      <serial>a5930327-05bc-4b26-b1b5-5a09c720039e</serial>
      <boot order='1'/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

virsh # domstats 4 --block --backing
Domain: 'bz1467826'
  block.count=3
  block.0.name=hdc
  block.0.rd.reqs=0
  block.0.rd.bytes=0
  block.0.rd.times=0
  block.0.wr.reqs=0
  block.0.wr.bytes=0
  block.0.wr.times=0
  block.0.fl.reqs=0
  block.0.fl.times=0
  block.0.allocation=0
  block.0.physical=0
  block.1.name=sda
  block.1.path=/rhev/data-center/mnt/dumbo.tlv.redhat.com:_voodoo_40/d6e4a622-bd31-4d8f-904d-1e26b7286757/images/a5930327-05bc-4b26-b1b5-5a09c720039e/5f9cfbe2-8ced-4e25-bda8-cbec2b3f1d72
  block.1.rd.reqs=1
  block.1.rd.bytes=512
  block.1.rd.times=827287
  block.1.wr.reqs=0
  block.1.wr.bytes=0
  block.1.wr.times=0
  block.1.fl.reqs=0
  block.1.fl.times=0
  block.1.allocation=0
  block.1.capacity=21474836480
  block.1.physical=21474836480   <<< wrong value
  block.2.name=sda
  block.2.path=/rhev/data-center/mnt/dumbo.tlv.redhat.com:_voodoo_40/d6e4a622-bd31-4d8f-904d-1e26b7286757/images/a5930327-05bc-4b26-b1b5-5a09c720039e/25df7f6e-f5de-4dac-8eb8-2f7b991d4bdb
  block.2.backingIndex=1
  block.2.rd.reqs=0
  block.2.rd.bytes=0
  block.2.rd.times=0
  block.2.wr.reqs=0
  block.2.wr.bytes=0
  block.2.wr.times=0
  block.2.fl.reqs=0
  block.2.fl.times=0
  block.2.allocation=0
  block.2.capacity=21474836480
  block.2.physical=200704

virsh # domblkinfo 4 sda
Capacity:       21474836480
Allocation:     21474836480 <<< wrong, see qemu-img check result bellow

Physical:       196928

# qemu-img info --backing /rhev/data-center/mnt/dumbo.tlv.redhat.com:_voodoo_40/d6e4a622-bd31-4d8f-904d-1e26b7286757/images/a5930327-05bc-4b26-b1b5-5a09c720039e/5f9cfbe2-8ced-4e25-bda8-cbec2b3f1d72
image: /rhev/data-center/mnt/dumbo.tlv.redhat.com:_voodoo_40/d6e4a622-bd31-4d8f-904d-1e26b7286757/images/a5930327-05bc-4b26-b1b5-5a09c720039e/5f9cfbe2-8ced-4e25-bda8-cbec2b3f1d72
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 196K
cluster_size: 65536
backing file: 25df7f6e-f5de-4dac-8eb8-2f7b991d4bdb (actual path: /rhev/data-center/mnt/dumbo.tlv.redhat.com:_voodoo_40/d6e4a622-bd31-4d8f-904d-1e26b7286757/images/a5930327-05bc-4b26-b1b5-5a09c720039e/25df7f6e-f5de-4dac-8eb8-2f7b991d4bdb)
backing file format: raw
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

image: /rhev/data-center/mnt/dumbo.tlv.redhat.com:_voodoo_40/d6e4a622-bd31-4d8f-904d-1e26b7286757/images/a5930327-05bc-4b26-b1b5-5a09c720039e/25df7f6e-f5de-4dac-8eb8-2f7b991d4bdb
file format: raw
virtual size: 20G (21474836480 bytes)
disk size: 0

# qemu-img check /rhev/data-center/mnt/dumbo.tlv.redhat.com:_voodoo_40/d6e4a622-bd31-4d8f-904d-1e26b7286757/images/a5930327-05bc-4b26-b1b5-5a09c720039e/5f9cfbe2-8ced-4e25-bda8-cbec2b3f1d72
No errors were found on the image.
Image end offset: 262144

I will send later the same output for block disk based flow.
Comment 18 Nir Soffer 2017-07-14 12:51:16 EDT
Here is the same flow with block based disk:

1. Starting the vm:

virsh # dumpxml 5
...
    <disk type='block' device='disk' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/>
      <source dev='/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/31cb9434-0511-4519-aa22-d35869cd7fa2/cd04fdaf-0aff-4720-930a-e2d24ed624a7'/>
      <backingStore/>
      <target dev='sda' bus='scsi'/>
      <serial>31cb9434-0511-4519-aa22-d35869cd7fa2</serial>
      <boot order='1'/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

# qemu-img info --backing /rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/31cb9434-0511-4519-aa22-d35869cd7fa2/cd04fdaf-0aff-4720-930a-e2d24ed624a7
image: /rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/31cb9434-0511-4519-aa22-d35869cd7fa2/cd04fdaf-0aff-4720-930a-e2d24ed624a7
file format: raw
virtual size: 20G (21474836480 bytes)
disk size: 0

virsh # domblkinfo 5 sda
Capacity:       21474836480
Allocation:     0
Physical:       21474836480

virsh # domstats 5 --block --backing
Domain: 'bz1467826'
  block.count=2
  block.0.name=hdc
  block.0.rd.reqs=0
  block.0.rd.bytes=0
  block.0.rd.times=0
  block.0.wr.reqs=0
  block.0.wr.bytes=0
  block.0.wr.times=0
  block.0.fl.reqs=0
  block.0.fl.times=0
  block.0.allocation=0
  block.0.physical=0
  block.1.name=sda
  block.1.path=/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/31cb9434-0511-4519-aa22-d35869cd7fa2/cd04fdaf-0aff-4720-930a-e2d24ed624a7
  block.1.rd.reqs=1
  block.1.rd.bytes=512
  block.1.rd.times=958460
  block.1.wr.reqs=0
  block.1.wr.bytes=0
  block.1.wr.times=0
  block.1.fl.reqs=0
  block.1.fl.times=0
  block.1.allocation=0
  block.1.capacity=21474836480
  block.1.physical=21474836480

2. Starting live storage migration in ovirt to another block storage

The first step in live storage migration is creating a snapshot, blockCopy
was not called yet...

virsh # dumpxml 5
...
    <disk type='block' device='disk' snapshot='no'>
      <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/>
      <source dev='/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/31cb9434-0511-4519-aa22-d35869cd7fa2/0838b0fe-52a5-44b3-9e5f-0104d03e9677'/>
      <backingStore type='block' index='1'>
        <format type='raw'/>
        <source dev='/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/31cb9434-0511-4519-aa22-d35869cd7fa2/cd04fdaf-0aff-4720-930a-e2d24ed624a7'/>
        <backingStore/>
      </backingStore>
      <target dev='sda' bus='scsi'/>
      <serial>31cb9434-0511-4519-aa22-d35869cd7fa2</serial>
      <boot order='1'/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

After blockCopy was called, the vm paused:

virsh # list
 Id    Name                           State
----------------------------------------------------
 5     bz1467826                      paused

virsh # dumpxml 5
...
    <disk type='block' device='disk' snapshot='no'>
      <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/>
      <source dev='/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/31cb9434-0511-4519-aa22-d35869cd7fa2/0838b0fe-52a5-44b3-9e5f-0104d03e9677'/>
      <backingStore type='block' index='1'>
        <format type='raw'/>
        <source dev='/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/31cb9434-0511-4519-aa22-d35869cd7fa2/cd04fdaf-0aff-4720-930a-e2d24ed624a7'/>
        <backingStore/>
      </backingStore>
      <mirror type='block' job='copy' ready='yes'>
        <format type='qcow2'/>
        <source dev='/rhev/data-center/mnt/blockSD/6ffbc483-0031-403a-819b-3bb2f0f8de0a/images/31cb9434-0511-4519-aa22-d35869cd7fa2/0838b0fe-52a5-44b3-9e5f-0104d03e9677'/>
      </mirror>
      <target dev='sda' bus='scsi'/>
      <serial>31cb9434-0511-4519-aa22-d35869cd7fa2</serial>
      <boot order='1'/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

virsh # domblkinfo 5 sda
Capacity:       21474836480
Allocation:     21474836480 << wrong
Physical:       3221225472

virsh # domstats 5 --block --backing
Domain: 'bz1467826'
  block.count=3
  block.0.name=hdc
  block.0.rd.reqs=0
  block.0.rd.bytes=0
  block.0.rd.times=0
  block.0.wr.reqs=0
  block.0.wr.bytes=0
  block.0.wr.times=0
  block.0.fl.reqs=0
  block.0.fl.times=0
  block.0.allocation=0
  block.0.physical=0
  block.1.name=sda
  block.1.path=/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/31cb9434-0511-4519-aa22-d35869cd7fa2/0838b0fe-52a5-44b3-9e5f-0104d03e9677
  block.1.rd.reqs=1
  block.1.rd.bytes=512
  block.1.rd.times=958460
  block.1.wr.reqs=0
  block.1.wr.bytes=0
  block.1.wr.times=0
  block.1.fl.reqs=0
  block.1.fl.times=0
  block.1.allocation=0
  block.1.capacity=21474836480
  block.1.physical=21474836480 <<< wrong
  block.2.name=sda
  block.2.path=/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/31cb9434-0511-4519-aa22-d35869cd7fa2/cd04fdaf-0aff-4720-930a-e2d24ed624a7
  block.2.backingIndex=1
  block.2.rd.reqs=0
  block.2.rd.bytes=0
  block.2.rd.times=0
  block.2.wr.reqs=0
  block.2.wr.bytes=0
  block.2.wr.times=0
  block.2.fl.reqs=0
  block.2.fl.times=0
  block.2.allocation=0
  block.2.capacity=21474836480
  block.2.physical=21474836480

# qemu-img info --backing /rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/31cb9434-0511-4519-aa22-d35869cd7fa2/0838b0fe-52a5-44b3-9e5f-0104d03e9677
image: /rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/31cb9434-0511-4519-aa22-d35869cd7fa2/0838b0fe-52a5-44b3-9e5f-0104d03e9677
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 0
cluster_size: 65536
backing file: cd04fdaf-0aff-4720-930a-e2d24ed624a7 (actual path: /rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/31cb9434-0511-4519-aa22-d35869cd7fa2/cd04fdaf-0aff-4720-930a-e2d24ed624a7)
backing file format: raw
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

image: /rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/31cb9434-0511-4519-aa22-d35869cd7fa2/cd04fdaf-0aff-4720-930a-e2d24ed624a7
file format: raw
virtual size: 20G (21474836480 bytes)
disk size: 0

# qemu-img check /rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/31cb9434-0511-4519-aa22-d35869cd7fa2/0838b0fe-52a5-44b3-9e5f-0104d03e9677
No errors were found on the image.
Image end offset: 262144
Comment 21 Nir Soffer 2017-07-14 13:50:48 EDT
(Adding missing info for comment 18)

The snapshot lv was already removed - we delete the snapshot after live storage
migration completes.

Here are logs from vdsm showing the real size of the lv:

Creating 1g lv:

2017-07-14 19:44:09,656+0300 INFO  (tasks/3) [storage.LVM] Creating LV (vg=aed577ea-d1ca-4ebe-af80-f852c7ce59bb, lv=0838b0fe-52a5-44b3-9e5f-0104d03e9677, size=1024m, activate=True, contiguous=False, initialTags=('OVIRT_VOL_INITIALIZING',)) (lvm:1147)

After taking a snapshot, using now this lv:

2017-07-14 19:44:25,859+0300 DEBUG (jsonrpc/5) [virt.vm] (vmId='c6ba6960-bf68-47fd-8ba2-b3543bef1263') Drive sda moved from u'/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/31cb9434-0511-4519-aa22-d35869cd7fa2/cd04fdaf-0aff-4720-930a-e2d24ed624a7' to u'/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/31cb9434-0511-4519-aa22-d35869cd7fa2/0838b0fe-52a5-44b3-9e5f-0104d03e9677' (storage:358)

Response showing the lv size:

2017-07-14 19:44:26,030+0300 INFO  (jsonrpc/3) [vdsm.api] FINISH getVolumeInfo return={'info': {'status': 'OK', 'domain': 'aed577ea-d1ca-4ebe-af80-f852c7ce59bb', 'voltype': 'LEAF', 'description': '', 'parent': 'cd04fdaf-0aff-4720-930a-e2d24ed624a7', 'format': 'COW', 'generation': 0, 'image': '31cb9434-0511-4519-aa22-d35869cd7fa2', 'ctime': '1500050655', 'disktype': '2', 'legality': 'LEGAL', 'mtime': '0', 'apparentsize': '1073741824', 'children': [], 'pool': '', 'capacity': '21474836480', 'uuid': u'0838b0fe-52a5-44b3-9e5f-0104d03e9677', 'truesize': '1073741824', 'type': 'SPARSE', 'lease': {'owners': [], 'version': None}}} from=::ffff:10.35.0.111,39084, flow_id=d2e9931d-b813-4dcc-a9e3-55ad65ea6a89, task_id=994e6092-0b27-4a5e-a548-c335b0dd328c (api:52)

    NOTE: 'apparentsize': '1073741824'

Starting blockCopy:

2017-07-14 19:44:33,978+0300 DEBUG (jsonrpc/2) [virt.vm] (vmId='c6ba6960-bf68-47fd-8ba2-b3543bef1263') Replicating drive sda to <?xml version='1.0' encoding='UTF-8'?>
<disk device="disk" snapshot="no" type="block"><source dev="/rhev/data-center/mnt/blockSD/6ffbc483-0031-403a-819b-3bb2f0f8de0a/images/31cb9434-0511-4519-aa22-d35869cd7fa2/0838b0fe-52a5-44b3-9e5f-0104d03e9677" /><driver cache="none" error_policy="stop" io="native" name="qemu" type="qcow2" /></disk> (vm:4186)

Receiving block job VIR_DOMAIN_BLOCK_JOB_READY for this block copy:

2017-07-14 19:44:33,993+0300 DEBUG (libvirt/events) [virt.vm] (vmId='c6ba6960-bf68-47fd-8ba2-b3543bef1263') unhandled libvirt event (event_name=Unknown id 8, args=('/rhev/data-center/mnt/blo
ckSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/31cb9434-0511-4519-aa22-d35869cd7fa2/0838b0fe-52a5-44b3-9e5f-0104d03e9677', 2, 3, 8)) (clientIF:559)

    NOTE: we don't use yest block job events

We receive bad allocation value form libvirt and pause the vm:

2017-07-14 19:44:34,887+0300 ERROR (periodic/5) [virt.vm] (vmId='c6ba6960-bf68-47fd-8ba2-b3543bef1263') Improbable extension request for volume 0838b0fe-52a5-44b3-9e5f-0104d03e9677 on domain aed577ea-d1ca-4ebe-af80-f852c7ce59bb, pausing the VM to avoid corruptions (capacity: 21474836480, allocated: 21474836480, physical: 1073741824, next physical size: 3221225472) (vm:1084)
Comment 23 John Ferlan 2017-07-14 18:46:01 EDT
Suffice to say there's some moving parts in all this - qemu changing, libvirt changing, and rhv changing. Now you're pointing out different types storage rhv uses and suffice to say there's a lot of data to chomp on and digest. That's going to take some time and may require even more data.

First off, GetBlockInfo was designed long before modern adjustments to qcow2 container files, sparse lv/files, and multi-level backingStore options was as complicated as it is today. Using GetBlockInfo is becoming more and more troublesome because it was never designed to provide the depth of details that domain block stats can provide nor the level of details rhv appears to need to know.

GetBlockInfo massages @allocation based on some historical factors, but DomainStats does not massage allocation - it returns what QEMU gives it. To add to the woes, if the target hasn't been written qemu will have no allocation or "wr_highest_offset" value, so in order to "handle" that libvirt does more detection. For rhv purposes that detection is bad/wrong; however, for someone else that detection may be right. It's a proverbial rock and hard place.

In any case, to lay the ground work, for domain stats:

   capacity == virtual-size
   physical == actual-size
   allocation == wr_highest_offset

If 'actual-size' is missing from the returned qemu data, then physical = capacity because the image is not thin provisioned (see qemuMonitorJSONBlockStatsUpdateCapacityOne). Oh and this is the same algorithm used by the BlkInfo code to at least get the qemu values.

Furthermore, domain stats can massage the returned physical value in qemuDomainGetStatsOneBlock:

    if (entry->physical) {
        QEMU_ADD_BLOCK_PARAM_ULL(record, maxparams, block_idx,
                                 "physical", entry->physical);
    } else {
        if (qemuDomainStorageUpdatePhysical(driver, cfg, dom, src) == 0) {
            QEMU_ADD_BLOCK_PARAM_ULL(record, maxparams, block_idx,
                                     "physical", src->physical);
        } else {
            virResetLastError();
        }
    }

So, if "actual-size == 0" (like you show at the top of comment 17), then the stats code will go off and change that, which you're calling wrong, but I need to let Peter answer since he wrote that Stats code. I have a feeling it's because a physical size == 0 makes zero sense. Furthermore, without the qemu data I cannot know for sure if actual-size == 0 or 20G in this case.

Not that I want more data, but you're only showing a partial picture of all the data at each step, so it's really hard to put together a logically based response. The data you show is after the copy/mirror starts so what you say is wrong I can only show from the code why those values are the way they are.

Based on what you show in comment 17 (cut-n-paste for brevity) here's what I can deduce:

  <disk type='file' device='disk' snapshot='no'>
    <driver name='qemu' type='qcow2' .../>
    <source file='*images*f1d72'/>
    <backingStore type='file' index='1'>
      <format type='raw'/>
      <source file='*images*d4bdb'/>
      <backingStore/>
    </backingStore>
    <mirror type='block' job='copy' ready='yes'>
      <format type='qcow2'/>
      <source dev='*images*f1d72'/>
    </mirror>

and

  block.1.name=sda
  block.1.path=*images*f1d72
  block.1.allocation=0
  block.1.capacity=21474836480
  block.1.physical=21474836480   <<< wrong value
  block.2.name=sda
  block.2.path=*images/*d4bdb
  block.2.backingIndex=1
  block.2.allocation=0
  block.2.capacity=21474836480
  block.2.physical=200704

and 

virsh # domblkinfo 4 sda
Capacity:       21474836480
Allocation:     21474836480 <<< wrong, see qemu-img check result bellow
Physical:       196928

and

qemu-img info

virtual size: 20G (21474836480 bytes)   [== domblkinfo capacity]
disk size: 196K                         [== domblkinfo physical]

IIRC, the qemu-img data shows "disk size" of 196K for that 'raw' area as a result of a call to bdrv_getlength(bs) from block/qapi.c - bdrv_query_image_info() which if I'm reading things right will call raw_getlength() which I chased once and ended up returning essentially the stat() st_size value that the libivrt call to virStorageSourceUpdatePhysicalSize will fetch for a TYPE_FILE. For the 'block' area it returns 20G for capacity and 0 (zero) for disk size. I tried following that code once, but got lost quickly.

From the libvirt side...

For block.1 (mirror type='block'), it would seem to have actual-size == 0 (it's a block device). This results in a call to qemuDomainStorageUpdatePhysical. For a block device that's an lseek(). For block.2 (format type='raw'), it seems actual-size = 200704.

Due to how the algorithm works keying off the block name, GetBlockInfo will use the "last" qemu data returned, which seems to be block.1, but only because if it was block.2, then I'd expect to see 200704 in the allocation field (but I'm also in data overload, so I may have read something wrong). It may also be a case of 'actual-size' not being present resulting in physical = capacity (which also changes the answer a bit to block.1 above which won't make that call).  

I really think I'd need to see the qemu data to be sure though.


On to comment 18 or the 'block' example.

Here we have:

  <disk type='block' device='disk' snapshot='no'>
    <driver name='qemu' type='raw' .../>
    <source dev='*mnt/blockSD*624a7'/>
...
    <backingStore/>
...

[NB: source type = block and source format = raw are different than previous example.  So are the paths obviously. I'm assuming the former was somehow created as a qcow2 image and the latter is a device backed to some raw file.]

and 

  block.1.name=sda
  block.1.allocation=0
  block.1.capacity=21474836480
  block.1.physical=21474836480

and GetBlockInfo displays:

Allocation:     0

This is deemed OK and right, correct? In this case, what's interesting is it would seem 'actual-size == 0'. In that case, the stats would change the physical to be the call to qemuDomainStorageUpdatePhysical. For the BlkInfo, allocation is set to actual-size (which is 0 for the block device) and it too calls the same function for physical. So far, so good.


Then you do your migration and blockCopy and we have:

  <disk type='block' device='disk' snapshot='no'>
    <driver name='qemu' type='qcow2' ... />
    <source dev='*mnt/blockSD*e9677'/>
...

    <backingStore type='block' index='1'>
      <format type='raw'/>
      <source dev='*mnt/blockSD*624a7'/>    <== e.g. same as "source"
...
      <backingStore/>
    </backingStore>
    <mirror type='block' job='copy' ready='yes'>
      <format type='qcow2'/>
      <source dev='*/mnt/blockSD/*e9677'/>
...
    </mirror>

and

  block.1.name=sda
  block.1.path=*/mnt/blockSD/*e9677
  block.1.path=*images*e9677
  block.1.allocation=0
  block.1.capacity=21474836480
  block.1.physical=21474836480 <<< wrong
...
  block.2.name=sda
  block.2.path=*/mnt/blockSD/*624a7
  block.2.backingIndex=1
  block.2.allocation=0
  block.2.capacity=21474836480
  block.2.physical=21474836480
...

and domblkinfo 5 sda

Capacity:       21474836480
Allocation:     21474836480 << wrong
Physical:       3221225472

and qemu-img info

virtual size: 20G (21474836480 bytes)
disk size: 0

In this case for type = block, qemu-img takes a different path to get the "disk size:" output, but reading that code would take someone that understands it better from the qemu side as I don't think I could do it justice! I think the display of 0 just means it cannot get to it because of being a block device.

As for the libvirt side...

For block.1 (mirror type=block) actual-size must be 0 upon return (it's block, right?) so we go ask virStorageSourceUpdatePhysicalSize. That will return an lseek() to the end of the device (standard way to get the size of block device), hence the value.  For Block.2 (format type = raw) - I'm assuming the actual-size is returned as 20G. 

For BlkInfo, if I'm reading the code right 'entry->' values get assigned to *last* 'sda' found in the returned qemu data - IOW block.2 and that's what BlkInfo uses for the basis of it's output. So that means Capacity = 20G, actual-size = 20G, allocation = 0. With these settings, allocation = actual-size and physical = return from virStorageSourceUpdatePhysicalSize on a raw device (e.g. the result of a stat() 'st_size').

So again, based upon the algorithms - it appears libvirt is providing the data; however, without having the actual qemu data for each step it's only a somewhat educated assumption.


So prior to perhaps 3 or 4 different changes within the BlkInfo space I made over 6-8 months ago - it was deemed libvirt was displaying the wrong values. I cannot speak to the stats implementation as Peter Krempa has a better handle on that. 

For me this all started as a result of a question in an upstream list which sent me down the rabbit hole into a twisty maze of passages all alike with the sound of a babbling brook to the west (it's an old computer game reference). After much research and digging, changes were proposed and pushed upstream. But those changes now conflict with the rhv expectations and usage. So again, libvirt is stuck between a rock and hard place.

Hopefully I've done the data presented thus far some justice. At this point, I don't know what's right or wrong. Unfortunately more analysis and information will be required. Whether that can be done in 3 days, who knows.

I wonder though if those cases that are "working" sometimes for you are a result of qemu data being returned in a different order thus not hitting the allocation == capacity issue. That is the qemu data returned that gets parsed during JSON parsing of the returned hunk of data that uses virHashLookup. My assumption has been that data gets keyed off the "sda" name and since each block.#.name is the same for each of the backingStores, what BlockInfo gets for data depends highly on the order of data returned I would think.  Ok - looked too long at this - need a fresh day and/or set of eyes.
Comment 24 Nir Soffer 2017-07-14 20:17:17 EDT
(In reply to John Ferlan from comment #23)
> If 'actual-size' is missing from the returned qemu data, then physical =
> capacity because the image is not thin provisioned (see
> qemuMonitorJSONBlockStatsUpdateCapacityOne). Oh and this is the same
> algorithm used by the BlkInfo code to at least get the qemu values.

Looking at the libvirt debug logs, there is no actual-size in the results of
"query-blockstats", but there is always in the results of "query-block".

> Furthermore, domain stats can massage the returned physical value in
> qemuDomainGetStatsOneBlock:
> 
>     if (entry->physical) {
>         QEMU_ADD_BLOCK_PARAM_ULL(record, maxparams, block_idx,
>                                  "physical", entry->physical);
>     } else {
>         if (qemuDomainStorageUpdatePhysical(driver, cfg, dom, src) == 0) {
>             QEMU_ADD_BLOCK_PARAM_ULL(record, maxparams, block_idx,
>                                      "physical", src->physical);
>         } else {
>             virResetLastError();
>         }
>     }
> 
> So, if "actual-size == 0" (like you show at the top of comment 17), then the
> stats code will go off and change that, which you're calling wrong, 

Adding the correct physical value instead of 0 is not wrong, it is useful.

> but I
> need to let Peter answer since he wrote that Stats code. I have a feeling
> it's because a physical size == 0 makes zero sense. Furthermore, without the
> qemu data I cannot know for sure if actual-size == 0 or 20G in this case.

Massaging the allocation value cannot be right, we need the raw value from
qemu without change.

Actual size in this case is 1g.

> Not that I want more data, but you're only showing a partial picture of all
> the data at each step, so it's really hard to put together a logically based
> response. The data you show is after the copy/mirror starts so what you say
> is wrong I can only show from the code why those values are the way they are.
> 
> Based on what you show in comment 17 (cut-n-paste for brevity) here's what I
> can deduce:
> 
>   <disk type='file' device='disk' snapshot='no'>
>     <driver name='qemu' type='qcow2' .../>
>     <source file='*images*f1d72'/>
>     <backingStore type='file' index='1'>
>       <format type='raw'/>
>       <source file='*images*d4bdb'/>
>       <backingStore/>
>     </backingStore>
>     <mirror type='block' job='copy' ready='yes'>
>       <format type='qcow2'/>
>       <source dev='*images*f1d72'/>
>     </mirror>
> 
> and
> 
>   block.1.name=sda
>   block.1.path=*images*f1d72
>   block.1.allocation=0
>   block.1.capacity=21474836480
>   block.1.physical=21474836480   <<< wrong value
>   block.2.name=sda
>   block.2.path=*images/*d4bdb
>   block.2.backingIndex=1
>   block.2.allocation=0
>   block.2.capacity=21474836480
>   block.2.physical=200704
> 
> and 
> 
> virsh # domblkinfo 4 sda
> Capacity:       21474836480
> Allocation:     21474836480 <<< wrong, see qemu-img check result bellow
> Physical:       196928
> 
> and
> 
> qemu-img info
> 
> virtual size: 20G (21474836480 bytes)   [== domblkinfo capacity]
> disk size: 196K                         [== domblkinfo physical]
> 
> IIRC, the qemu-img data shows "disk size" of 196K for that 'raw' area as a
> result of a call to bdrv_getlength(bs) from block/qapi.c -
> bdrv_query_image_info() which if I'm reading things right will call
> raw_getlength() which I chased once and ended up returning essentially the
> stat() st_size value that the libivrt call to
> virStorageSourceUpdatePhysicalSize will fetch for a TYPE_FILE. For the
> 'block' area it returns 20G for capacity and 0 (zero) for disk size. I tried
> following that code once, but got lost quickly.

Wait, block.1 is qcow2 file:

>   block.1.name=sda
>   block.1.path=*images*f1d72
>   block.1.allocation=0
>   block.1.capacity=21474836480
>   block.1.physical=21474836480   <<< wrong value

block.2 is raw sparse file:

>   block.2.name=sda
>   block.2.path=*images/*d4bdb
>   block.2.backingIndex=1
>   block.2.allocation=0
>   block.2.capacity=21474836480
>   block.2.physical=200704

The issue is showing physical size of 21474836480 for the block.1. This is 
an empty snapshot, and can have no data in this case. I run this test with 
an empty raw disk - no operating system installed, so no data can be written
to any of the volumes.

> From the libvirt side...
> 
> For block.1 (mirror type='block'), it would seem to have actual-size == 0
> (it's a block device). This results in a call to
> qemuDomainStorageUpdatePhysical. For a block device that's an lseek().

lseek on that empty snapshot should return 196928. We create qcow2 volume on
file storage using basically:

    qemu-img create -f qcow2 -b backing-file file

> For
> block.2 (format type='raw'), it seems actual-size = 200704.
> 
> Due to how the algorithm works keying off the block name, GetBlockInfo will
> use the "last" qemu data returned, which seems to be block.1, but only
> because if it was block.2, then I'd expect to see 200704 in the allocation
> field

Looking at the data - the physical of block.2 looks close to the right 
value of block.1, and pyysical of block.1 looks like the right value for
block.2.


> I really think I'd need to see the qemu data to be sure though.

What do you mean by qemu data? how do you suggest to extract it?

> On to comment 18 or the 'block' example.
> 
> Here we have:
> 
>   <disk type='block' device='disk' snapshot='no'>
>     <driver name='qemu' type='raw' .../>
>     <source dev='*mnt/blockSD*624a7'/>
> ...
>     <backingStore/>
> ...
> 
> [NB: source type = block and source format = raw are different than previous
> example.  So are the paths obviously. I'm assuming the former was somehow
> created as a qcow2 image and the latter is a device backed to some raw file.]

This is the base volume, before we start the live storage migration.

> 
> and 
> 
>   block.1.name=sda
>   block.1.allocation=0
>   block.1.capacity=21474836480
>   block.1.physical=21474836480
> 
> and GetBlockInfo displays:
> 
> Allocation:     0
> 
> This is deemed OK and right, correct? 

Right

> In this case, what's interesting is it
> would seem 'actual-size == 0'. In that case, the stats would change the
> physical to be the call to qemuDomainStorageUpdatePhysical. For the BlkInfo,
> allocation is set to actual-size (which is 0 for the block device) and it
> too calls the same function for physical. So far, so good.
> 
> 
> Then you do your migration and blockCopy and we have:
> 
>   <disk type='block' device='disk' snapshot='no'>
>     <driver name='qemu' type='qcow2' ... />
>     <source dev='*mnt/blockSD*e9677'/>
> ...
> 
>     <backingStore type='block' index='1'>
>       <format type='raw'/>
>       <source dev='*mnt/blockSD*624a7'/>    <== e.g. same as "source"
> ...
>       <backingStore/>
>     </backingStore>
>     <mirror type='block' job='copy' ready='yes'>
>       <format type='qcow2'/>
>       <source dev='*/mnt/blockSD/*e9677'/>
> ...
>     </mirror>
> 
> and
> 
>   block.1.name=sda
>   block.1.path=*/mnt/blockSD/*e9677
>   block.1.path=*images*e9677
>   block.1.allocation=0
>   block.1.capacity=21474836480
>   block.1.physical=21474836480 <<< wrong
> ...
>   block.2.name=sda
>   block.2.path=*/mnt/blockSD/*624a7
>   block.2.backingIndex=1
>   block.2.allocation=0
>   block.2.capacity=21474836480
>   block.2.physical=21474836480
> ...
> 
> and domblkinfo 5 sda
> 
> Capacity:       21474836480
> Allocation:     21474836480 << wrong
> Physical:       3221225472
> 
> and qemu-img info
> 
> virtual size: 20G (21474836480 bytes)
> disk size: 0
> 
> In this case for type = block, qemu-img takes a different path to get the
> "disk size:" output, but reading that code would take someone that
> understands it better from the qemu side as I don't think I could do it
> justice! I think the display of 0 just means it cannot get to it because of
> being a block device.

Right, we expect this from qemu.

> 
> As for the libvirt side...
> 
> For block.1 (mirror type=block) actual-size must be 0 upon return (it's
> block, right?) so we go ask virStorageSourceUpdatePhysicalSize. That will
> return an lseek() to the end of the device (standard way to get the size of
> block device), hence the value.  For Block.2 (format type = raw) - I'm
> assuming the actual-size is returned as 20G. 

Right for block.2. For block.1 the size starts at 1g. After calling blockCopy
vdsm extend the the lv to 3g. When the vm was paused, the size was 1g, when
I run the virsh command the size was 3g.

> For BlkInfo, if I'm reading the code right 'entry->' values get assigned to
> *last* 'sda' found in the returned qemu data - IOW block.2 and that's what
> BlkInfo uses for the basis of it's output. So that means Capacity = 20G,
> actual-size = 20G, allocation = 0. With these settings, allocation =
> actual-size and physical = return from virStorageSourceUpdatePhysicalSize on
> a raw device (e.g. the result of a stat() 'st_size').

Using the values from block.2 cannot work, and does not explain how we get
correct values before we start the blockCopy.

> So again, based upon the algorithms - it appears libvirt is providing the
> data; however, without having the actual qemu data for each step it's only a
> somewhat educated assumption.

Look again at the data:

> Capacity:       21474836480
> Allocation:     21474836480 << wrong
> Physical:       3221225472

How can allocation be more than physical?

I can easily collect more data in the vdsm code around the blockCopy
call. Please suggest what info you need in each step of the process.

For example we can do:

        # run qemu-img info on drive.path...
        # log output of virsh domstats, domblkinfo...

        self._dom.blockCopy(drive.name, destxml, flags=flags)

        # sleep couple of seconds
        # log output of qemu-img info on drive.path...
        # log output of virsh domstats, domblkinfo...

Or maybe we better reproduce this using virsh, so it will be easier to debug on 
your side?

If you show me how to create a snapshot from existing file and start blockCopy
using virsh I can cook a reproducer.
Comment 25 Nir Soffer 2017-07-15 10:30 EDT
Created attachment 1298910 [details]
setup script for creating block storage domains

This script create 2 ovirt-like storage domains backed by loop device.
Comment 26 Nir Soffer 2017-07-15 10:31 EDT
Created attachment 1298911 [details]
reproducer script

This script reproduce the issue using the block storage domains created by setup.sh.
Comment 27 Nir Soffer 2017-07-15 10:33 EDT
Created attachment 1298912 [details]
output of the reproducer script

This is the output of the reproduce.sh script, showing both qemu and libvirt info.
Comment 28 Nir Soffer 2017-07-15 11:17:29 EDT
John, to reproduce this issue, use the scripts:

# attachment 1298910 [details]
sh setup.sh 

# attachment 1298911 [details]
sh reproduce.sh

I attached also log from my 7.4 host (attachment 1298912 [details])

Looking at the output of query-block in the log, I think the issue is
the difference between query-block output before the blockcopy job
and during the blockcopy job:

Here is cleaned up output before blockcopy:

{
    "device": "drive-scsi0-0-0-0",
    "inserted": {
        "backing_file": "/dev/src/base",
        "backing_file_depth": 1,
        "drv": "qcow2",
        "file": "/dev/src/top",
        "image": {
            "actual-size": 0,
            "backing-filename": "/dev/src/base",
            "backing-filename-format": "raw",
            "backing-image": {
                "actual-size": 0,
                "filename": "/dev/src/base",
                "format": "raw",
                "virtual-size": 4294967296
            },
            "filename": "/dev/src/top",
            "format": "qcow2",
            "full-backing-filename": "/dev/src/base",
            "virtual-size": 4294967296
        },
        "node-name": "#block119",
    },
}

And during blockcopy:

{
    "device": "drive-scsi0-0-0-0",
    "inserted": {
        "backing_file": "/dev/src/top",
        "backing_file_depth": 2,
        "drv": "mirror_top",
        "file": "/dev/src/top",
        "image": {
            "backing-filename": "/dev/src/top",
            "backing-filename-format": "qcow2",
            "backing-image": {
                "actual-size": 0,
                "backing-filename": "/dev/src/base",
                "backing-filename-format": "raw",
                "backing-image": {
                    "actual-size": 0,
                    "filename": "/dev/src/base",
                    "format": "raw",
                    "virtual-size": 4294967296
                },
                "filename": "/dev/src/top",
                "format": "qcow2",
                "full-backing-filename": "/dev/src/base",
                "virtual-size": 4294967296
            },
            "filename": "/dev/src/top",
            "format": "mirror_top",
            "full-backing-filename": "/dev/src/top",
            "virtual-size": 4294967296
        },
    },
}

During block copy we have a new node in this chain:
filename: /dev/src/top
backing-filename: /dev/src/top
format: mirror_top

This node does not have actual-size. Maybe libvirt is confused 
by this node, treating it as the top layer?

In this case libvirt adds fake physical value based on virtual size,
and later, this code would change the allocation:

    if (entry->physical == 0 || info->allocation == 0 ||
        info->allocation == entry->physical) {
        if (info->allocation == 0)
            info->allocation = entry->physical;

It this is the case, I think we need to do:

1. fix handling of query-block during block copy.
   
2. avoid modifying allocation for block device, it is always wrong.

I think we have similar issue during block commit, when we get
the wrong allocation value after the block job is finished.
See https://bugzilla.redhat.com/1442266#c7 (libvirt 2.0)
Comment 29 Nir Soffer 2017-07-15 11:27:28 EDT
Kevin, is the output of query-block during drive-mirror block job in comment 28
expected? was it changed in lately?
Comment 30 John Ferlan 2017-07-15 14:16:11 EDT
FWIW: The code in qemuMonitorJSONBlockStatsUpdateCapacityOne which both the Stats and GetInfo API's would use:

    /* if actual-size is missing, image is not thin provisioned */
    if (virJSONValueObjectGetNumberUlong(image, "actual-size",
                                         &bstats->physical) < 0)
        bstats->physical = bstats->capacity;


Looks like original commit is 8caded6, but would seemingly only be for stats. Not sure what BlockInfo would be using at the time. Peter says he'll look more in depth on Monday.
Comment 32 Kevin Wolf 2017-07-17 07:00:54 EDT
(In reply to Nir Soffer from comment #29)
> Kevin, is the output of query-block during drive-mirror block job in comment
> 28 expected? was it changed in lately?

This behaviour is new in qemu 2.9. We actually assumed that since libvirt
doesn't manage the block graph on a node level, it would only ever look at the
top level (which represents a whole guest device), so we were free to make this
change. It seems this assumption was wrong...

Now that the change is already made (and we do need this automatically inserted
node in qemu), I think we should try to make libvirt deal with it instead of
making qemu lie about the graph structure.
Comment 37 Nir Soffer 2017-07-17 11:47 EDT
Created attachment 1299921 [details]
reproducer log with build 13667657

In this log we see that allocation and physical values are correct during blockcopy.

Before blockcopy:

*** vm block stats
Domain: 'bz1470634'
  block.0.allocation=0
  block.0.capacity=4294967296
  block.0.physical=1073741824
  block.1.allocation=0
  block.1.capacity=4294967296
  block.1.physical=4294967296

*** vm block info for sda
Capacity:       4294967296
Allocation:     0
Physical:       1073741824

During blockcopy:

*** vm block stats
Domain: 'bz1470634'
  block.0.allocation=0
  block.0.capacity=4294967296
  block.0.physical=1073741824
  block.1.allocation=0
  block.1.capacity=4294967296
  block.1.physical=4294967296

*** vm block info for sda
Capacity:       4294967296
Allocation:     0
Physical:       1073741824

The fix looks correct for this flow.
Comment 38 Nir Soffer 2017-07-17 11:57 EDT
Created attachment 1299924 [details]
virsh log during ovirt live storage migration with build 13667657

This log show output from virsh commands during live storage migration in ovirt
master. The operation completed successfuly with build 13667657.

This flow contains snapshot, blockcopy, and blockcommit.

However this build does not include the fix for bug 1470127, so wrong allocation
value in domblkinfo may be hidden. We really should test a build including both
fixes.


Before live storage migration
-----------------------------

    <disk type='block' device='disk' snapshot='no'>
      <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/>
      <source dev='/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/7aab49ae-d987-4592-9cec-a3cb38df5a8d/b16a9e58-c3e8-4344-a2d9-49dbf7e81177'/>
      <backingStore type='block' index='1'>
        <format type='qcow2'/>
        <source dev='/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/7aab49ae-d987-4592-9cec-a3cb38df5a8d/93331705-46be-4cb8-9dc2-c1559843fd4a'/>
        <backingStore/>
      </backingStore>

virsh # domstats 1 --block --backing
Domain: 'bz1470634'
  block.0.name=hdc
  block.0.path=/rhev/data-center/mnt/dumbo.tlv.redhat.com:_voodoo_iso-02/b9cda581-f9ac-4650-83e0-17d00d7dfc75/images/11111111-1111-1111-1111-111111111111/Fedora-Server-dvd-x86_64-25-1.3.iso
  block.0.allocation=0
  block.0.capacity=2018508800
  block.0.physical=2018508800
  block.1.name=sda
  block.1.path=/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/7aab49ae-d987-4592-9cec-a3cb38df5a8d/b16a9e58-c3e8-4344-a2d9-49dbf7e81177
  block.1.allocation=27394048
  block.1.capacity=8589934592
  block.1.physical=1073741824
  block.2.name=sda
  block.2.path=/rhev/data-center/mnt/blockSD/aed577ea-d1ca-4ebe-af80-f852c7ce59bb/images/7aab49ae-d987-4592-9cec-a3cb38df5a8d/93331705-46be-4cb8-9dc2-c1559843fd4a
  block.2.backingIndex=1
  block.2.allocation=0
  block.2.capacity=8589934592
  block.2.physical=3221225472

# virsh -r domblkinfo 1 sda
Capacity:       8589934592
Allocation:     0
Physical:       1073741824


After live storage migration
----------------------------

    <disk type='block' device='disk' snapshot='no'>
      <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/>
      <source dev='/rhev/data-center/mnt/blockSD/6ffbc483-0031-403a-819b-3bb2f0f8de0a/images/7aab49ae-d987-4592-9cec-a3cb38df5a8d/b16a9e58-c3e8-4344-a2d9-49dbf7e81177'/>
      <backingStore type='block' index='1'>
        <format type='qcow2'/>
        <source dev='/rhev/data-center/mnt/blockSD/6ffbc483-0031-403a-819b-3bb2f0f8de0a/images/7aab49ae-d987-4592-9cec-a3cb38df5a8d/93331705-46be-4cb8-9dc2-c1559843fd4a'/>
        <backingStore/>
      </backingStore>
      <target dev='sda' bus='scsi'/>
      <serial>7aab49ae-d987-4592-9cec-a3cb38df5a8d</serial>
      <boot order='1'/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>


# virsh -r domstats 1 --block --backing
Domain: 'bz1470634'
  block.count=3
  block.0.name=hdc
  block.0.path=/rhev/data-center/mnt/dumbo.tlv.redhat.com:_voodoo_iso-02/b9cda581-f9ac-4650-83e0-17d00d7dfc75/images/11111111-1111-1111-1111-111111111111/Fedora-Server-dvd-x86_64-25-1.3.iso
  block.0.allocation=0
  block.0.capacity=2018508800
  block.0.physical=2018508800
  block.1.name=sda
  block.1.path=/rhev/data-center/mnt/blockSD/6ffbc483-0031-403a-819b-3bb2f0f8de0a/images/7aab49ae-d987-4592-9cec-a3cb38df5a8d/b16a9e58-c3e8-4344-a2d9-49dbf7e81177
  block.1.allocation=87359488
  block.1.capacity=8589934592
  block.1.physical=5368709120
  block.2.name=sda
  block.2.path=/rhev/data-center/mnt/blockSD/6ffbc483-0031-403a-819b-3bb2f0f8de0a/images/7aab49ae-d987-4592-9cec-a3cb38df5a8d/93331705-46be-4cb8-9dc2-c1559843fd4a
  block.2.backingIndex=1
  block.2.allocation=0
  block.2.capacity=8589934592
  block.2.physical=3221225472

# virsh -r domblkinfo 1 sda
Capacity:       8589934592
Allocation:     0
Physical:       5368709120
Comment 39 Han Han 2017-07-18 07:28:20 EDT
I reproduced it on libvirt-3.2.0-14.el7.x86_64 qemu-kvm-rhev-2.9.0-14.el7.x86_64
1. Prepare a running VM and disk vda as its OS image

2. Create a raw image then attach it:
# qemu-img create /var/lib/libvirt/images/HH-vdb.raw 100M      
Formatting '/var/lib/libvirt/images/HH-vdb.raw', fmt=raw size=104857600

# virsh attach-disk HH  /var/lib/libvirt/images/HH-vdb.raw  vdb
Disk attached successfully

# virsh domblkinfo HH vdb                                      
Capacity:       104857600
Allocation:     0
Physical:       104857600


3. Create a external snapshot for vdb
# virsh snapshot-create-as HH --no-metadata --disk-only  --diskspec vda,snapshot=no
Domain snapshot 1500367827 created

# virsh domblkinfo HH vdb                                                          
Capacity:       104857600
Allocation:     200704
Physical:       196616

4. Do shallow blockcopy and check domblkinfo
# qemu-img create /var/lib/libvirt/images/HH-vdb.copy 100M -f qcow2                                                     
Formatting '/var/lib/libvirt/images/HH-vdb.copy', fmt=qcow2 size=104857600 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16

# virsh blockcopy HH vdb /var/lib/libvirt/images/HH-vdb.copy --transient-job --shallow --reuse-external --wait --verbose
Block Copy: [100 %]
Now in mirroring phase

# virsh domblkinfo HH vdb                                                                                               
Capacity:       104857600
Allocation:     104857600
Physical:       196616

@Capacity == @Allocation. Bug reproduced.
Comment 44 Nir Soffer 2017-07-18 13:49 EDT
Created attachment 1300580 [details]
reproducer log with qemu build 13677788

This is a run of the reproducer script with libvirt fix from bug 1467826, and qemu
scratch build 13677788.

With this build everything looks fine. Please see the attachment for full details,
here is a summary from the full log.

Version tested
--------------

# rpm -qa | egrep 'libvirt|qemu' | sort
ipxe-roms-qemu-20170123-1.git4e85b27.el7.noarch
libvirt-client-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-config-nwfilter-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-interface-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-network-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-nodedev-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-nwfilter-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-qemu-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-secret-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-core-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-disk-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-gluster-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-iscsi-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-logical-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-mpath-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-rbd-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-scsi-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-kvm-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-libs-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-lock-sanlock-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-python-3.2.0-3.el7.x86_64
qemu-img-rhev-2.9.0-16.el7_4.2.skip_implicit.x86_64
qemu-kvm-common-rhev-2.9.0-16.el7_4.2.skip_implicit.x86_64
qemu-kvm-rhev-2.9.0-16.el7_4.2.skip_implicit.x86_64

Before blockcopy
----------------

*** vm xml
...
    <disk type='block' device='disk'>
      <driver name='qemu' type='qcow2' cache='none' io='native'/>
      <source dev='/dev/src/top'/>
      <backingStore type='block' index='1'>
        <format type='raw'/>
        <source dev='/dev/src/base'/>
        <backingStore/>
      </backingStore>
      <target dev='sda' bus='scsi'/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
...

*** vm query-block (cleaned up)
{
    "id": "libvirt-21",
    "return": [
        {
            "device": "drive-scsi0-0-0-0",
            "inserted": {
                "backing_file": "/dev/src/base",
                "backing_file_depth": 1,
                "drv": "qcow2",
                "file": "/dev/src/top",
                "image": {
                    "actual-size": 0,
                    "backing-filename": "/dev/src/base",
                    "backing-filename-format": "raw",
                    "backing-image": {
                        "actual-size": 0,
                        "filename": "/dev/src/base",
                        "format": "raw",
                        "virtual-size": 4294967296
                    },
                    "filename": "/dev/src/top",
                    "format": "qcow2",
                    "full-backing-filename": "/dev/src/base",
                    "virtual-size": 4294967296
                },
            },
        }
    ]
}


*** vm block stats
Domain: 'bz1470634'
  block.count=2
  block.0.name=sda
  block.0.path=/dev/src/top
  block.0.allocation=0
  block.0.capacity=4294967296
  block.0.physical=1073741824
  block.1.name=sda
  block.1.path=/dev/src/base
  block.1.backingIndex=1
  block.1.allocation=0
  block.1.capacity=4294967296
  block.1.physical=4294967296

*** vm block info for sda
Capacity:       4294967296
Allocation:     0
Physical:       1073741824

After blockcopy
---------------

*** vm xml
...
    <disk type='block' device='disk'>
      <driver name='qemu' type='qcow2' cache='none' io='native'/>
      <source dev='/dev/src/top'/>
      <backingStore type='block' index='1'>
        <format type='raw'/>
        <source dev='/dev/src/base'/>
        <backingStore/>
      </backingStore>
      <mirror type='block' job='copy' ready='yes'>
        <format type='qcow2'/>
        <source dev='/dev/dst/top'/>
      </mirror>
      <target dev='sda' bus='scsi'/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
...

*** vm query-block (cleaned up)
{
    "id": "libvirt-30",
    "return": [
        {
            "device": "drive-scsi0-0-0-0",
            "inserted": {
                "backing_file": "/dev/src/base",
                "backing_file_depth": 1,
                "drv": "qcow2",
                "file": "/dev/src/top",
                "image": {
                    "actual-size": 0,
                    "backing-filename": "/dev/src/base",
                    "backing-filename-format": "raw",
                    "backing-image": {
                        "actual-size": 0,
                        "filename": "/dev/src/base",
                        "format": "raw",
                        "virtual-size": 4294967296
                    },
                    "filename": "/dev/src/top",
                    "format": "qcow2",
                    "full-backing-filename": "/dev/src/base",
                    "virtual-size": 4294967296
                },
            },
        }
    ]
}

*** vm block stats
Domain: 'bz1470634'
  block.count=2
  block.0.name=sda
  block.0.path=/dev/src/top
  block.0.allocation=0
  block.0.capacity=4294967296
  block.0.physical=1073741824
  block.1.name=sda
  block.1.path=/dev/src/base
  block.1.backingIndex=1
  block.1.allocation=0
  block.1.capacity=4294967296
  block.1.physical=4294967296

*** vm block info for sda
Capacity:       4294967296
Allocation:     0
Physical:       1073741824
Comment 45 Nir Soffer 2017-07-18 14:29 EDT
Created attachment 1300606 [details]
Live storage migration log with qemu build 13677788

This is log from live storage migration in ovirt 4.1 using libvirt fix for bug
1467826 and qemu build 13677788.

The live storage migration completed without errors, and the output from the virsh
commands looks fine.

Here is a summary from the full log, see the attachment for the full details:

Tested versions
---------------

# rpm -qa | egrep 'vdsm|libvirt|qemu' | sort
ipxe-roms-qemu-20170123-1.git4e85b27.el7.noarch
libvirt-client-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-config-nwfilter-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-interface-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-network-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-nodedev-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-nwfilter-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-qemu-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-secret-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-core-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-disk-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-gluster-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-iscsi-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-logical-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-mpath-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-rbd-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-driver-storage-scsi-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-daemon-kvm-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-libs-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-lock-sanlock-3.2.0-15.el7_rc.9a49cd47ef.x86_64
libvirt-python-3.2.0-3.el7.x86_64
qemu-img-rhev-2.9.0-16.el7_4.2.skip_implicit.x86_64
qemu-kvm-common-rhev-2.9.0-16.el7_4.2.skip_implicit.x86_64
qemu-kvm-rhev-2.9.0-16.el7_4.2.skip_implicit.x86_64
vdsm-4.19.20-1.el7ev.x86_64
vdsm-api-4.19.20-1.el7ev.noarch
vdsm-client-4.19.20-1.el7ev.noarch
vdsm-hook-vmfex-dev-4.19.20-1.el7ev.noarch
vdsm-jsonrpc-4.19.20-1.el7ev.noarch
vdsm-python-4.19.20-1.el7ev.noarch
vdsm-xmlrpc-4.19.20-1.el7ev.noarch
vdsm-yajsonrpc-4.19.20-1.el7ev.noarch


Before live storage migration
-----------------------------


# virsh -r dumpxml 1
...
    <disk type='block' device='disk' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/>
      <source dev='/rhev/.../373e8c55-283f-41d4-8433-95c1ef1bbd1a/.../5a2a304a-0fd6-4c5f-a785-8320fee90bf8'/>
      <backingStore/>
      <target dev='sda' bus='scsi'/>
      <serial>959d5609-0bea-4d94-a8bc-ef93b8591c1e</serial>
      <boot order='1'/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

# virsh -r domstats 1 --block --backing
Domain: 'bz1470634'
  block.count=2
  block.0.name=hdc
  block.0.allocation=0
  block.0.physical=0
  block.1.name=sda
  block.1.path=/rhev/.../373e8c55-283f-41d4-8433-95c1ef1bbd1a/.../5a2a304a-0fd6-4c5f-a785-8320fee90bf8
  block.1.allocation=0
  block.1.capacity=21474836480
  block.1.physical=21474836480


# virsh -r domblkinfo 1 sda
Capacity:       21474836480
Allocation:     0
Physical:       21474836480


# lvs 373e8c55-283f-41d4-8433-95c1ef1bbd1a/5a2a304a-0fd6-4c5f-a785-8320fee90bf8
  LV                                   VG                                   Attr       LSize
  5a2a304a-0fd6-4c5f-a785-8320fee90bf8 373e8c55-283f-41d4-8433-95c1ef1bbd1a -wi-ao---- 20.00g                                                    


During live-storage-migration
-----------------------------

# virsh -r dumpxml 1
...
    <disk type='block' device='disk' snapshot='no'>
      <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/>
      <source dev='/rhev/data-center/6c99f4e5-8588-46f5-a818-e11151c1d19c/373e8c55-283f-41d4-8433-95c1ef1bbd1a/images/959d5609-0bea-4d94-a8bc-ef93b8591c1e/ab333078-4cb6-4f0e-9539-86a5ed6d8a62'/>
      <backingStore type='block' index='1'>
        <format type='raw'/>
        <source dev='/rhev/data-center/6c99f4e5-8588-46f5-a818-e11151c1d19c/373e8c55-283f-41d4-8433-95c1ef1bbd1a/images/959d5609-0bea-4d94-a8bc-ef93b8591c1e/5a2a304a-0fd6-4c5f-a785-8320fee90bf8'/>
        <backingStore/>
      </backingStore>
      <mirror type='block' job='copy' ready='yes'>
        <format type='qcow2'/>
        <source dev='/rhev/data-center/6c99f4e5-8588-46f5-a818-e11151c1d19c/6ffbc483-0031-403a-819b-3bb2f0f8de0a/images/959d5609-0bea-4d94-a8bc-ef93b8591c1e/ab333078-4cb6-4f0e-9539-86a5ed6d8a62'/>
      </mirror>
      <target dev='sda' bus='scsi'/>
      <serial>959d5609-0bea-4d94-a8bc-ef93b8591c1e</serial>
      <boot order='1'/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

# virsh -r domstats 1 --block --backing
Domain: 'bz1470634'
  block.count=3
  block.0.name=hdc
  block.0.allocation=0
  block.0.physical=0
  block.1.name=sda
  block.1.path=/rhev/.../373e8c55-283f-41d4-8433-95c1ef1bbd1a/.../ab333078-4cb6-4f0e-9539-86a5ed6d8a62
  block.1.allocation=0
  block.1.capacity=21474836480
  block.1.physical=3221225472
  block.2.name=sda
  block.2.path=/rhev/.../373e8c55-283f-41d4-8433-95c1ef1bbd1a/.../5a2a304a-0fd6-4c5f-a785-8320fee90bf8
  block.2.backingIndex=1
  block.2.allocation=0
  block.2.capacity=21474836480
  block.2.physical=21474836480

# virsh -r domblkinfo 1 sda
Capacity:       21474836480
Allocation:     0
Physical:       3221225472

# lvs 373e8c55-283f-41d4-8433-95c1ef1bbd1a/ab333078-4cb6-4f0e-9539-86a5ed6d8a62
  LV                                   VG                                   Attr       LSize
  ab333078-4cb6-4f0e-9539-86a5ed6d8a62 373e8c55-283f-41d4-8433-95c1ef1bbd1a -wi-ao---- 3.00g                                                    


After live storage migraiton
----------------------------

# virsh -r dumpxml 1
...
    <disk type='block' device='disk' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/>
      <source dev='/rhev/data-center/6c99f4e5-8588-46f5-a818-e11151c1d19c/6ffbc483-0031-403a-819b-3bb2f0f8de0a/images/959d5609-0bea-4d94-a8bc-ef93b8591c1e/5a2a304a-0fd6-4c5f-a785-8320fee90bf8'/>
      <backingStore/>
      <target dev='sda' bus='scsi'/>
      <serial>959d5609-0bea-4d94-a8bc-ef93b8591c1e</serial>
      <boot order='1'/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
...

# virsh -r domstats 1 --block --backing
Domain: 'bz1470634'
  block.count=2
  block.0.name=hdc
  block.0.allocation=0
  block.0.physical=0
  block.1.name=sda
  block.1.path=/rhev/.../6ffbc483-0031-403a-819b-3bb2f0f8de0a/.../5a2a304a-0fd6-4c5f-a785-8320fee90bf8
  block.1.allocation=0
  block.1.capacity=21474836480
  block.1.physical=21474836480

# virsh -r domblkinfo 1 sda
Capacity:       21474836480
Allocation:     0
Physical:       21474836480

# lvs 6ffbc483-0031-403a-819b-3bb2f0f8de0a/5a2a304a-0fd6-4c5f-a785-8320fee90bf8
  LV                                   VG                                   Attr       LSize
  5a2a304a-0fd6-4c5f-a785-8320fee90bf8 6ffbc483-0031-403a-819b-3bb2f0f8de0a -wi-ao---- 20.00g
Comment 47 Peter Krempa 2017-07-19 06:49:35 EDT
Kevin's patch to stop qemu reporting the extra information that breaks libvirt in this case is way simpler and less risky than the refactor to libvirt which would be required at this point to achieve the same result.

I'm reassigning this to qemu at this point, so that the backport can be tracked there.
Comment 51 Qianqian Zhu 2017-07-20 02:29:36 EDT
Reproduced from qemu level.
It inserts a node on the top of snapshot chain when mirroring, with no "actual-size" provided.

Version:
qemu-kvm-rhev-2.9.0-10.el7.x86_64
kernel-3.10.0-679.el7.x86_64

Steps:
1. Launch guest with a secondary disk:

/usr/libexec/qemu-kvm -name rhel7_4 -m 1G -smp 2  -drive file=/home/kvm_autotest_root/images/rhel74-64-virtio.qcow2,format=qcow2,id=drive_image1,if=none,cache=none,media=disk -device virtio-blk-pci,drive=drive_image1,id=virtio-blk-0,bootindex=0 -qmp stdio -vnc :3 -netdev tap,id=hostnet0,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=70:e2:84:14:0e:15 \
-drive file=/home/test.raw,id=drive_image2,format=raw,if=none,cache=none,media=disk -device virtio-blk-pci,drive=drive_image2,id=virtio-blk-1

2. Live snapshot:
{ "execute": "blockdev-snapshot-sync", "arguments": { "device": "drive_image2","snapshot-file": "/home/sn1", "format": "qcow2", "mode": "absolute-paths" } }

3. Drive mirror with option sync=top
{ "execute": "drive-mirror", "arguments": { "device": "drive_image2", "target":"/home/mirror.qcow2", "sync":"top", "mode": "existing"}}

4. Query block and check the information of drive_image2, it will add an node and the backing_file_depth increases to 2, and this node has no "actual-size":

Detail:
{ 'execute' : 'query-block'}

{"device": "drive_image2",
... 
"inserted": {
        ... 
        "image": {
                "backing-image": {
                        "backing-image": {
                                "virtual-size": 104857600,
                                "filename": "/home/test.raw",
                                "actual-size": 0,
                                ...
                                }, 
                        "virtual-size": 104857600, 
                        "filename": "/home/sn1", 
                        "actual-size": 200704,
                        "backing-filename": "/home/test.raw",
                        ...
                        }, 
                "virtual-size": 104857600, 
                "filename": "/home/sn1", 
                "format": "mirror_top",  
                "backing-filename": "/home/sn1"
                ...
                }, 
        "node-name": "#block1322",
        "backing_file_depth": 2, 
        "drv": "mirror_top",  
        "backing_file": "/home/sn1",
        "file": "/home/sn1",
        ...
        }, 
"type": "unknown"}

Note You need to log in before you can comment on or make changes to this bug.