Bug 1467826 - RHEL7.4: virDomainGetBlockInfo always returns alloc=0 on block storage
RHEL7.4: virDomainGetBlockInfo always returns alloc=0 on block storage
Status: ON_QA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt (Show other bugs)
7.4
x86_64 Unspecified
urgent Severity urgent
: rc
: ---
Assigned To: John Ferlan
Han Han
: Regression, ZStream
Depends On:
Blocks: 1473706 1470127
  Show dependency treegraph
 
Reported: 2017-07-05 05:04 EDT by Elad
Modified: 2017-09-12 22:54 EDT (History)
23 users (show)

See Also:
Fixed In Version: libvirt-3.7.0-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1470127 1473706 (view as bug list)
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Logs for run when vm was paused (105.73 KB, application/x-xz)
2017-07-05 07:36 EDT, Nir Soffer
no flags Details
qemu log from /var/log/libvirt/qemu/vmname.log (135.89 KB, text/plain)
2017-07-05 08:05 EDT, Nir Soffer
no flags Details

  None (edit)
Description Elad 2017-07-05 05:04:42 EDT
Description of problem:
oVirt: 
When the guest disk has to be extended, Libvirt doesn't propagate the request to the upper layered component, in our case - VDSM.

Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux Server 7.4 (Maipo)
libvirt-daemon-driver-storage-core-3.2.0-14.el7.x86_64
libvirt-daemon-driver-storage-logical-3.2.0-14.el7.x86_64
libvirt-daemon-driver-secret-3.2.0-14.el7.x86_64
libvirt-libs-3.2.0-14.el7.x86_64
libvirt-daemon-driver-storage-scsi-3.2.0-14.el7.x86_64
libvirt-daemon-3.2.0-14.el7.x86_64
libvirt-daemon-driver-network-3.2.0-14.el7.x86_64
libvirt-daemon-config-nwfilter-3.2.0-14.el7.x86_64
libvirt-daemon-driver-storage-gluster-3.2.0-14.el7.x86_64
libvirt-daemon-driver-storage-3.2.0-14.el7.x86_64
libvirt-daemon-driver-interface-3.2.0-14.el7.x86_64
libvirt-daemon-kvm-3.2.0-14.el7.x86_64
libvirt-daemon-driver-qemu-3.2.0-14.el7.x86_64
libvirt-daemon-driver-storage-mpath-3.2.0-14.el7.x86_64
libvirt-daemon-driver-storage-iscsi-3.2.0-14.el7.x86_64
libvirt-daemon-driver-nodedev-3.2.0-14.el7.x86_64
libvirt-python-3.2.0-3.el7.x86_64
libvirt-daemon-driver-storage-rbd-3.2.0-14.el7.x86_64
libvirt-daemon-driver-nwfilter-3.2.0-14.el7.x86_64
libvirt-daemon-driver-storage-disk-3.2.0-14.el7.x86_64
libvirt-lock-sanlock-3.2.0-14.el7.x86_64
libvirt-client-3.2.0-14.el7.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64
ipxe-roms-qemu-20170123-1.git4e85b27.el7.noarch
qemu-kvm-tools-rhev-2.6.0-28.el7_3.9.x86_64
qemu-img-rhev-2.6.0-28.el7_3.9.x86_64
libvirt-daemon-driver-qemu-3.2.0-14.el7.x86_64
qemu-kvm-common-rhev-2.6.0-28.el7_3.9.x86_64
sanlock-python-3.5.0-1.el7.x86_64
sanlock-lib-3.5.0-1.el7.x86_64
sanlock-3.5.0-1.el7.x86_64
selinux-policy-3.13.1-165.el7.noarch
vdsm-4.19.20-1.el7ev.x86_64
lvm2-2.02.171-8.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
In oVirt: 
1. Create a guest with thin provisioned disk resides on a block-based storage (VDSM creates a 1G LV)
2. Perform write operation to the guest storage (could be acheived by simply installing OS on the guest)


Actual results:
No allocation request from Libvirt to VDSM for extending the guest disk

Expected results:
Libvirt should detect that the guest storage is about to run out of space and ask for allocation.


Additional info:
Logs and more info will be provided soon.
Comment 3 Nir Soffer 2017-07-05 07:30:49 EDT
To reproduce using oVirt:


1. Create vm with disk on block storage (in this example we have 2 disks):

virsh # dumpxml 1
<domain type='kvm' id='1'>
  <name>extend</name>
  <uuid>702b8d03-e246-46d2-808d-e17a54bba708</uuid>
  <metadata xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0">
    <ovirt-tune:qos/>
    <ovirt-vm:vm xmlns:ovirt-vm="http://ovirt.org/vm/1.0">
    <ovirt-vm:agentChannelName>com.redhat.rhevm.vdsm</ovirt-vm:agentChannelName>
    <ovirt-vm:startTime type="float">1499213407.28</ovirt-vm:startTime>
    <ovirt-vm:destroy_on_reboot type="bool">False</ovirt-vm:destroy_on_reboot>
</ovirt-vm:vm>
  </metadata>
  <maxMemory slots='16' unit='KiB'>4194304</maxMemory>
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>
  <vcpu placement='static' current='2'>16</vcpu>
  <cputune>
    <shares>1020</shares>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <sysinfo type='smbios'>
    <system>
      <entry name='manufacturer'>oVirt</entry>
      <entry name='product'>oVirt Node</entry>
      <entry name='version'>7.4-18.el7</entry>
      <entry name='serial'>94CBF655-4D26-456A-B5D0-40AD803C28E9</entry>
      <entry name='uuid'>702b8d03-e246-46d2-808d-e17a54bba708</entry>
    </system>
  </sysinfo>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.3.0'>hvm</type>
    <smbios mode='sysinfo'/>
  </os>
  <features>
    <acpi/>
  </features>
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>SandyBridge</model>
    <topology sockets='16' cores='1' threads='1'/>
    <feature policy='require' name='vme'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='xsaveopt'/>
    <numa>
      <cell id='0' cpus='0-1' memory='1048576' unit='KiB'/>
    </numa>
  </cpu>
  <clock offset='variable' adjustment='0' basis='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/rhev/data-center/mnt/dumbo.tlv.redhat.com:_voodoo_iso-02/b9cda581-f9ac-4650-83e0-17d00d7dfc75/images/11111111-1111-1111-1111-111111111111/Fedora-Server-dvd-x86_64-25-1.3.iso' startupPolicy='optional'/>
      <backingStore/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <boot order='2'/>
      <alias name='ide0-1-0'/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    </disk>
    <disk type='block' device='disk' snapshot='no'>
      <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/>
      <source dev='/rhev/data-center/mnt/blockSD/d0812abd-2beb-4103-b00c-b9bc4ac77085/images/4cc7444a-e5a4-48c4-82a3-809088989c08/02415b1a-ec47-40f9-a250-f912ccf1bbde'/>
      <backingStore type='block' index='1'>
        <format type='qcow2'/>
        <source dev='/rhev/data-center/mnt/blockSD/d0812abd-2beb-4103-b00c-b9bc4ac77085/images/4cc7444a-e5a4-48c4-82a3-809088989c08/93331705-46be-4cb8-9dc2-c1559843fd4a'/>
        <backingStore/>
      </backingStore>
      <target dev='sda' bus='scsi'/>
      <serial>4cc7444a-e5a4-48c4-82a3-809088989c08</serial>
      <boot order='1'/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='block' device='disk' snapshot='no'>
      <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/>
      <source dev='/rhev/data-center/mnt/blockSD/d0812abd-2beb-4103-b00c-b9bc4ac77085/images/3dadd84a-530d-47db-a434-5dee71119d26/60f9f879-cc10-462f-af34-910c872f58de'/>
      <backingStore/>
      <target dev='sdb' bus='scsi'/>
      <serial>3dadd84a-530d-47db-a434-5dee71119d26</serial>
      <alias name='scsi0-0-0-1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='piix3-uhci'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='scsi' index='0' model='virtio-scsi'>
      <alias name='scsi0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0' ports='16'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <controller type='ide' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <interface type='bridge'>
      <mac address='00:00:00:ff:41:00'/>
      <source bridge='ovirtmgmt'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <filterref filter='vdsm-no-mac-spoofing'/>
      <link state='up'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channels/702b8d03-e246-46d2-808d-e17a54bba708.com.redhat.rhevm.vdsm'/>
      <target type='virtio' name='com.redhat.rhevm.vdsm' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channels/702b8d03-e246-46d2-808d-e17a54bba708.org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel1'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0' state='disconnected'/>
      <alias name='channel2'/>
      <address type='virtio-serial' controller='0' bus='0' port='3'/>
    </channel>
    <input type='mouse' bus='ps2'>
      <alias name='input0'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input1'/>
    </input>
    <graphics type='spice' tlsPort='5900' autoport='yes' listen='10.35.0.112' defaultMode='secure' passwdValidTo='1970-01-01T00:00:01'>
      <listen type='network' address='10.35.0.112' network='vdsm-ovirtmgmt'/>
      <channel name='main' mode='secure'/>
      <channel name='display' mode='secure'/>
      <channel name='inputs' mode='secure'/>
      <channel name='cursor' mode='secure'/>
      <channel name='playback' mode='secure'/>
      <channel name='record' mode='secure'/>
      <channel name='smartcard' mode='secure'/>
      <channel name='usbredir' mode='secure'/>
    </graphics>
    <video>
      <model type='qxl' ram='65536' vram='8192' vgamem='16384' heads='1' primary='yes'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </memballoon>
    <rng model='virtio'>
      <backend model='random'>/dev/urandom</backend>
      <alias name='rng0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </rng>
  </devices>
  <seclabel type='dynamic' model='selinux' relabel='yes'>
    <label>system_u:system_r:svirt_t:s0:c57,c95</label>
    <imagelabel>system_u:object_r:svirt_image_t:s0:c57,c95</imagelabel>
  </seclabel>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+107:+107</label>
    <imagelabel>+107:+107</imagelabel>
  </seclabel>
</domain>


2. Login to the vm, and write some data to one of the disks:

# time dd if=/dev/zero of=/dev/sdb bs=8M count=66 oflag=direct
66+0 records in
66+0 records out
553648128 bytes (554 MB, 528 MiB) copied, 5.24422 s, 106 MB/s

real	0m5.298s
user	0m0.002s
sys	0m0.138s

3. Check disk allocation

virsh # domblkinfo 1 sdb
Capacity:       8589934592
Allocation:     0
Physical:       1073741824

In vdsm, we can see that we get alloc=0 on when we check the disk
allocation:

2017-07-05 14:19:53,809+0300 INFO  (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.getStats succeeded in 0.01 seconds (__init__:583)
2017-07-05 14:19:55,456+0300 DEBUG (periodic/3) [virt.vm] (vmId='702b8d03-e246-46d2-808d-e17a54bba708') Extension info for drive sda volume 02415b1a-ec47-40f9-a250-f912ccf1bbde (capacity=858
9934592, allocated=0, physical=5368709120) (vm:1053)
2017-07-05 14:19:55,458+0300 DEBUG (periodic/3) [virt.vm] (vmId='702b8d03-e246-46d2-808d-e17a54bba708') Extension info for drive sdb volume 60f9f879-cc10-462f-af34-910c872f58de (capacity=858
9934592, allocated=0, physical=1073741824) (vm:1053)
2017-07-05 14:19:57,459+0300 DEBUG (periodic/2) [virt.vm] (vmId='702b8d03-e246-46d2-808d-e17a54bba708') Extension info for drive sda volume 02415b1a-ec47-40f9-a250-f912ccf1bbde (capacity=858
9934592, allocated=0, physical=5368709120) (vm:1053)
2017-07-05 14:19:57,463+0300 DEBUG (periodic/2) [virt.vm] (vmId='702b8d03-e246-46d2-808d-e17a54bba708') Extension info for drive sdb volume 60f9f879-cc10-462f-af34-910c872f58de (capacity=858
9934592, allocated=0, physical=1073741824) (vm:1053)

Since vdsm get alloc=0, it never extend the disk.


4. Eventually, disk becomes full, and then the vm pause.

For example here I wrote more then 1g, the initial size of the lv:

    dd if=/dev/sda of=/dev/sdb bs=8M count=129 oflag=direct

Vdsm get many messages about this. On each message, vdsm check
if some disk should be extended, but since blockInfo always returns
alloc=0, no disk is extended:

2017-07-05 14:19:58,716+0300 INFO  (libvirt/events) [virt.vm] (vmId='702b8d03-e246-46d2-808d-e17a54bba708') abnormal vm stop device scsi0-0-0-1 error enospc (vm:4430)
2017-07-05 14:19:58,716+0300 INFO  (libvirt/events) [virt.vm] (vmId='702b8d03-e246-46d2-808d-e17a54bba708') CPU stopped: onIOError (vm:5359)

Vdsm handles ENOSPC errors by checking if drives should be extended:

2017-07-05 14:20:02,195+0300 DEBUG (libvirt/events) [virt.vm] (vmId='702b8d03-e246-46d2-808d-e17a54bba708') Extension info for drive sda volume 02415b1a-ec47-40f9-a250-f912ccf1bbde (capacity=8589934592, allocated=0, physical=5368709120) (vm:1053)
2017-07-05 14:20:02,200+0300 DEBUG (libvirt/events) [virt.vm] (vmId='702b8d03-e246-46d2-808d-e17a54bba708') Extension info for drive sdb volume 60f9f879-cc10-462f-af34-910c872f58de (capacity=8589934592, allocated=0, physical=1073741824) (vm:1053)
2017-07-05 14:20:02,200+0300 INFO  (libvirt/events) [virt.vm] (vmId='702b8d03-e246-46d2-808d-e17a54bba708') No VM drives were extended (vm:4437)

Since alloc=0, not drives were extended.

We get many more ENOSPC messages for some reason.

This practically breaks thin provisioning on block storage for RHV.
Comment 4 Nir Soffer 2017-07-05 07:36 EDT
Created attachment 1294548 [details]
Logs for run when vm was paused

In this run I started with 1g LV, writing more then 1g to this device:

    dd if=/dev/sda of=/dev/sdb bs=8M count=129

The tarball includes
- output from rpm -qa
- libvirtd.log (using log_filters="1:qemu 1:locking")
- vdsm.log
Comment 5 Nir Soffer 2017-07-05 07:59:31 EDT
Looking at libvirtd log - we get this reply from qemu:

2017-07-05 11:19:57.462+0000: 3304: info : qemuMonitorJSONIOProcessLine:213 : QEMU_MONITOR_RECV_REPLY: mon=0x7f341c00f9a0 reply={"return": [{"io-status": "ok", "device": "drive-ide0-1-0", "l
ocked": true, "removable": true, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 2018508800, "filename": "/rhev/data-center/mnt/dumbo.tlv.redhat.com:_voodoo_iso-
02/b9cda581-f9ac-4650-83e0-17d00d7dfc75/images/11111111-1111-1111-1111-111111111111/Fedora-Server-dvd-x86_64-25-1.3.iso", "format": "raw", "actual-size": 2018508800, "dirty-flag": false}, "i
ops_wr": 0, "ro": true, "node-name": "#block185", "backing_file_depth": 0, "drv": "raw", "iops": 0, "bps_wr": 0, "write_threshold": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"n
o-flush": false, "direct": false, "writeback": true}, "file": "/rhev/data-center/mnt/dumbo.tlv.redhat.com:_voodoo_iso-02/b9cda581-f9ac-4650-83e0-17d00d7dfc75/images/11111111-1111-1111-1111-1
11111111111/Fedora-Server-dvd-x86_64-25-1.3.iso", "encryption_key_missing": false}, "tray_open": false, "type": "unknown"}, {"io-status": "ok", "device": "drive-scsi0-0-0-0", "locked": false
, "removable": false, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"backing-image": {"virtual-size": 8589934592, "filename": "/rhev/data-center/mnt/blockSD/d0812abd-2beb-4103
-b00c-b9bc4ac77085/images/4cc7444a-e5a4-48c4-82a3-809088989c08/93331705-46be-4cb8-9dc2-c1559843fd4a", "cluster-size": 65536, "format": "qcow2", "actual-size": 0, "format-specific": {"type": 
"qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "refcount-bits": 16, "corrupt": false}}, "dirty-flag": false}, "backing-filename-format": "qcow2", "virtual-size": 8589934592, "fi
lename": "/rhev/data-center/mnt/blockSD/d0812abd-2beb-4103-b00c-b9bc4ac77085/images/4cc7444a-e5a4-48c4-82a3-809088989c08/02415b1a-ec47-40f9-a250-f912ccf1bbde", "cluster-size": 65536, "format
": "qcow2", "actual-size": 0, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "refcount-bits": 16, "corrupt": false}}, "full-backing-filename": "/rhev
/data-center/mnt/blockSD/d0812abd-2beb-4103-b00c-b9bc4ac77085/images/4cc7444a-e5a4-48c4-82a3-809088989c08/93331705-46be-4cb8-9dc2-c1559843fd4a", "backing-filename": "/rhev/data-center/mnt/bl
ockSD/d0812abd-2beb-4103-b00c-b9bc4ac77085/images/4cc7444a-e5a4-48c4-82a3-809088989c08/93331705-46be-4cb8-9dc2-c1559843fd4a", "dirty-flag": false}, "iops_wr": 0, "ro": false, "node-name": "#
block382", "backing_file_depth": 1, "drv": "qcow2", "iops": 0, "bps_wr": 0, "write_threshold": 0, "backing_file": "/rhev/data-center/mnt/blockSD/d0812abd-2beb-4103-b00c-b9bc4ac77085/images/4
cc7444a-e5a4-48c4-82a3-809088989c08/93331705-46be-4cb8-9dc2-c1559843fd4a", "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"no-flush": false, "direct": true, "writeback": true}, "file":
 "/rhev/data-center/mnt/blockSD/d0812abd-2beb-4103-b00c-b9bc4ac77085/images/4cc7444a-e5a4-48c4-82a3-809088989c08/02415b1a-ec47-40f9-a250-f912ccf1bbde", "encryption_key_missing": false}, "typ
e": "unknown"}, {"io-status": "ok", "device": "drive-scsi0-0-0-1", "locked": false, "removable": false, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 858993459
2, "filename": "/rhev/data-center/mnt/blockSD/d0812abd-2beb-4103-b00c-b9bc4ac77085/images/3dadd84a-530d-47db-a434-5dee71119d26/60f9f879-cc10-462f-af34-910c872f58de", "cluster-size": 65536, "
format": "qcow2", "actual-size": 0, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "refcount-bits": 16, "corrupt": false}}, "dirty-flag": false}, "io
ps_wr": 0, "ro": false, "node-name": "#block727", "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "write_threshold": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {
"no-flush": false, "direct": true, "writeback": true}, "file": "/rhev/data-center/mnt/blockSD/d0812abd-2beb-4103-b00c-b9bc4ac77085/images/3dadd84a-530d-47db-a434-5dee71119d26/60f9f879-cc10-4
62f-af34-910c872f58de", "encryption_key_missing": false}, "type": "unknown"}], "id": "libvirt-481"}

Formatting the json in a readable way, and extracting
the interesting device (vol_id=60f9f879-cc10-462f-af34-910c872f58de):

        {
            "locked": false, 
            "type": "unknown", 
            "io-status": "ok", 
            "removable": false, 
            "device": "drive-scsi0-0-0-1", 
            "inserted": {
                "bps_rd": 0, 
                "ro": false, 
                "backing_file_depth": 0, 
                "encrypted": false, 
                "image": {
                    "cluster-size": 65536, 
                    "format": "qcow2", 
                    "filename": "/rhev/data-center/mnt/blockSD/d0812abd-2beb-4103-b00c-b9bc4ac77085/images/3dadd84a-530d-47db-a434-5dee71119d26/60f9f879-cc10-462f-af34-910c872f58de", 
                    "virtual-size": 8589934592, 
                    "dirty-flag": false, 
                    "format-specific": {
                        "data": {
                            "compat": "1.1", 
                            "refcount-bits": 16, 
                            "corrupt": false, 
                            "lazy-refcounts": false
                        }, 
                        "type": "qcow2"
                    }, 
                    "actual-size": 0
                }, 
                "cache": {
                    "no-flush": false, 
                    "writeback": true, 
                    "direct": true
                }, 
                "bps_wr": 0, 
                "drv": "qcow2", 
                "node-name": "#block727", 
                "bps": 0, 
                "iops": 0, 
                "write_threshold": 0, 
                "file": "/rhev/data-center/mnt/blockSD/d0812abd-2beb-4103-b00c-b9bc4ac77085/images/3dadd84a-530d-47db-a434-5dee71119d26/60f9f879-cc10-462f-af34-910c872f58de", 
                "iops_rd": 0, 
                "encryption_key_missing": false, 
                "detect_zeroes": "off", 
                "iops_wr": 0
            }
        }

Looks like qemu does not return the expected allocation info.

Kevin, do you think this is actually qemu issue?
Comment 6 Nir Soffer 2017-07-05 08:05 EDT
Created attachment 1294559 [details]
qemu log from /var/log/libvirt/qemu/vmname.log
Comment 8 John Ferlan 2017-07-05 15:51:12 EDT
In order for me to reproduce it would help to have a few details about :

In oVirt: 
1. Create a guest with thin provisioned disk resides on a block-based storage (VDSM creates a 1G LV)
2. Perform write operation to the guest storage (could be acheived by simply installing OS on the guest)


IOW: steps to create the <source dev='$path'...> for targets sda and sdb

At first I assumed when you say thin-provisioned, I thought libvirt logical pools with sparse lv's, but I think you mean lv's that you've created using real thin provisioning (via --thinpool, I assume).

I do believe something from commit id 'b9b1aa63' is at fault here... trying to reconfigure my system to the test config I used in order to generate the logic. Something was certainly strange with some backing file format... Although in relooking at the code:

    if (entry->physical == 0 || info->allocation == 0 ||
        info->allocation == entry->physical) {
        info->allocation = entry->physical;
        if (info->allocation == 0)
            info->allocation = entry->physical;
...
I do wonder who caused me to do that setting and the check.

In your case, physical (e.g. actual-size == 0), so I can see why the value is the way it is...
Comment 9 Nir Soffer 2017-07-05 17:10:25 EDT
(In reply to John Ferlan from comment #8)
> In order for me to reproduce it would help to have a few details about :
> 
> In oVirt: 
> 1. Create a guest with thin provisioned disk resides on a block-based
> storage (VDSM creates a 1G LV)

Right

> 2. Perform write operation to the guest storage (could be acheived by simply
> installing OS on the guest)

Right, installing os typically an extend.

An easier way to test this is to hot-plug a disk using a block device, and
write directly to the disk within the guest. If libvirt reports correct
allocation values we are ok.

> IOW: steps to create the <source dev='$path'...> for targets sda and sdb
> 
> At first I assumed when you say thin-provisioned, I thought libvirt logical
> pools with sparse lv's, but I think you mean lv's that you've created using
> real thin provisioning (via --thinpool, I assume).

No, we use ovirt-thin provisioning, which is a regular lv. We extend the lv
when the highest offset reach the threshold (by default, physical - 512m).

> I do believe something from commit id 'b9b1aa63' is at fault here... trying
> to reconfigure my system to the test config I used in order to generate the
> logic. Something was certainly strange with some backing file format...
> Although in relooking at the code:
> 
>     if (entry->physical == 0 || info->allocation == 0 ||
>         info->allocation == entry->physical) {
>         info->allocation = entry->physical;
>         if (info->allocation == 0)
>             info->allocation = entry->physical;
> ...
> I do wonder who caused me to do that setting and the check.
> 
> In your case, physical (e.g. actual-size == 0), so I can see why the value
> is the way it is...

Looks like this manipulation is correct only for file based disks.
Comment 10 John Ferlan 2017-07-05 18:56:18 EDT
Still not enough details - the first few lines above which you "Right"'d on - are copies of the problem statement.

I can either hotplug or coldplug and start. It really doesn't matter that's not the issue being chased. For me to test a fix - it would be far easier to know how to create a similar disk device in my guest to what's been created. I don't have the secret decoder right for that. I don't use ovirt-thin provisioning so I have no idea how the file is created. That's what I need.

It'd also be useful to get the values for entry->physical and entry->wr_highest_offset in the first else where info->allocation gets set.

I'm still working on resetting up my environment...
Comment 11 Han Han 2017-07-06 03:01:10 EDT
I got the alloc=0 result by thin LV:

1. Create thin LV
# ls /dev/sdd
sdd                    8:48   1  14.6G  0 disk 
└─sdd1                 8:49   1  14.6G  0 part 

# pvcreate /dev/sdd1

# vgcreate vg /dev/sdd1

# lvcreate -n pool0 -L 10G vg

# lvcreate -n pool0meta -L 1G vg

# lvconvert --type thin-pool --poolmetadata vg/pool0meta vg/pool0

# lvcreate -n thin1 -V 1T --thinpool pool0 vg

# lvcreate -n thin2 -V 1T --thinpool pool0 vg

# lvdisplay vg/thin2
LV Path                /dev/vg/thin2
  LV Name                thin2
  VG Name                vg
  LV UUID                zfznA1-0EgD-z8m1-Fizk-833I-CqRR-R11UYQ
  LV Write Access        read/write
  LV Creation host, time no-server.com, 2017-07-06 14:19:24 +0800
  LV Pool name           pool0
  LV Status              available
  # open                 0
  LV Size                1.00 TiB
  Mapped size            0.00%
  Current LE             262144
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           253:7

2. Start VM with the thin lv
# virsh domblklist n2                                   
Target     Source
------------------------------------------------
vda        /var/lib/libvirt/images/n2.qcow2
vdb        /dev/mapper/vg-thin2

# virsh domblkinfo n2 vdb                                                             
Capacity:       1099511627776
Allocation:     0
Physical:       1099511627776

# virsh qemu-monitor-command n2 '{"execute":"query-block"}'
{"return":[{"io-status":"ok","device":"drive-virtio-disk0","locked":false,"removable":false,"inserted":{"iops_rd":0,"detect_zeroes":"off","image":{"virtual-size":10737418240,"filename":"/var/lib/libvirt/images/n2.qcow2","cluster-size":65536,"format":"qcow2","actual-size":1286078464,"format-specific":{"type":"qcow2","data":{"compat":"1.1","lazy-refcounts":false,"refcount-bits":16,"corrupt":false}},"dirty-flag":false},"iops_wr":0,"ro":false,"node-name":"#block185","backing_file_depth":0,"drv":"qcow2","iops":0,"bps_wr":0,"write_threshold":0,"encrypted":false,"bps":0,"bps_rd":0,"cache":{"no-flush":false,"direct":false,"writeback":true},"file":"/var/lib/libvirt/images/n2.qcow2","encryption_key_missing":false},"type":"unknown"},{"io-status":"ok","device":"drive-virtio-disk1","locked":false,"removable":false,"inserted":{"iops_rd":0,"detect_zeroes":"off","image":{"virtual-size":1099511627776,"filename":"/dev/mapper/vg-thin2","format":"raw","actual-size":0,"dirty-flag":false},"iops_wr":0,"ro":false,"node-name":"#block310","backing_file_depth":0,"drv":"raw","iops":0,"bps_wr":0,"write_threshold":0,"encrypted":false,"bps":0,"bps_rd":0,"cache":{"no-flush":false,"direct":false,"writeback":true},"file":"/dev/mapper/vg-thin2","encryption_key_missing":false},"type":"unknown"}],"id":"libvirt-22"}

# lvdisplay vg/thin2                                       
  /dev/mapper/36001405d688ea2ccd464b41901158e03: read failed after 0 of 4096 at 0: Input/output error
  /dev/mapper/36001405d688ea2ccd464b41901158e03: read failed after 0 of 4096 at 5368643584: Input/output error
  /dev/mapper/36001405d688ea2ccd464b41901158e03: read failed after 0 of 4096 at 5368700928: Input/output error
  /dev/mapper/36001405d688ea2ccd464b41901158e03: read failed after 0 of 4096 at 4096: Input/output error
  --- Logical volume ---
  LV Path                /dev/vg/thin2
  LV Name                thin2
  VG Name                vg
  LV UUID                zfznA1-0EgD-z8m1-Fizk-833I-CqRR-R11UYQ
  LV Write Access        read/write
  LV Creation host, time no-server.com, 2017-07-06 14:19:24 +0800
  LV Pool name           pool0
  LV Status              available
  # open                 1
  LV Size                1.00 TiB
  Mapped size            0.00%
  Current LE             262144
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           253:7


3. Write 1G data to the thin LV in VM and check again
(in vm) # dd if=/dev/urandom of=/dev/vdb bs=300M count=20
dd: warning: partial read (33554431 bytes); suggest iflag=fullblock
0+20 records in
0+20 records out
671088620 bytes (671 MB) copied, 2.75019 s, 244 MB/s

# virsh domblkinfo n2 vdb                                                                                      
Capacity:       1099511627776
Allocation:     0
Physical:       1099511627776

# virsh qemu-monitor-command n2 '{"execute":"query-block"}'
{"return":[{"io-status":"ok","device":"drive-virtio-disk0","locked":false,"removable":false,"inserted":{"iops_rd":0,"detect_zeroes":"off","image":{"virtual-size":10737418240,"filename":"/var/lib/libvirt/images/n2.qcow2","cluster-size":65536,"format":"qcow2","actual-size":1286078464,"format-specific":{"type":"qcow2","data":{"compat":"1.1","lazy-refcounts":false,"refcount-bits":16,"corrupt":false}},"dirty-flag":false},"iops_wr":0,"ro":false,"node-name":"#block185","backing_file_depth":0,"drv":"qcow2","iops":0,"bps_wr":0,"write_threshold":0,"encrypted":false,"bps":0,"bps_rd":0,"cache":{"no-flush":false,"direct":false,"writeback":true},"file":"/var/lib/libvirt/images/n2.qcow2","encryption_key_missing":false},"type":"unknown"},{"io-status":"ok","device":"drive-virtio-disk1","locked":false,"removable":false,"inserted":{"iops_rd":0,"detect_zeroes":"off","image":{"virtual-size":1099511627776,"filename":"/dev/mapper/vg-thin2","format":"raw","actual-size":0,"dirty-flag":false},"iops_wr":0,"ro":false,"node-name":"#block310","backing_file_depth":0,"drv":"raw","iops":0,"bps_wr":0,"write_threshold":0,"encrypted":false,"bps":0,"bps_rd":0,"cache":{"no-flush":false,"direct":false,"writeback":true},"file":"/dev/mapper/vg-thin2","encryption_key_missing":false},"type":"unknown"}],"id":"libvirt-25"}

Command 'lvdisplay vg/thin2' hangs when VM is running.
After vm is destroyed, the result:
LV Path                /dev/vg/thin2
  LV Name                thin2
  VG Name                vg
  LV UUID                zfznA1-0EgD-z8m1-Fizk-833I-CqRR-R11UYQ
  LV Write Access        read/write
  LV Creation host, time no-server.com, 2017-07-06 14:19:24 +0800
  LV Pool name           pool0
  LV Status              available
  # open                 1
  LV Size                1.00 TiB
  Mapped size            0.06%
  Current LE             262144
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           253:7

Please check the result
Comment 12 Han Han 2017-07-06 03:03:21 EDT
The disk xml in comment11 is:
# virsh dumpxml n2| awk '/<disk/,/<\/disk/'
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/n2.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw'/>
      <source dev='/dev/mapper/vg-thin2'/>
      <target dev='vdb' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/>
    </disk>
Comment 13 Han Han 2017-07-06 03:06:46 EDT
Soffer, I am not familiar with vdsm APIs. Could you help check if the thin LV creation steps are the same with what vdsm does?
Thanks
Comment 14 John Ferlan 2017-07-06 07:09:58 EDT
Perfect - thanks Han - that's exactly what I needed. I can now (easily) reproduce what's being seen.

BTW: query-blockstats and search on wr_highest_offset in order to "see" what is supposed to be used by libvirt to get the allocation value.

The commit referenced in comment 9 is what the problem is. In particular:

>         info->allocation = entry->physical;
>         if (info->allocation == 0)
>             info->allocation = entry->physical;

Should seemingly just drop the:

>         info->allocation = entry->physical;

I'm wondering now if that was an editing remnant or some stray (and perhaps erroneous) edge condition that I was trying to handle. I've reconfigured a guest to have the environment from my notes and I'm checking on that now.

In any case, I should have a patch posted upstream sometime today.
Comment 15 Nir Soffer 2017-07-06 10:34:20 EDT
Vdsm does not use thinpool lv, but is good enough to reproduce the issue.

To test ovirt-like "thin" lv, you can do:

pvcreate /dev/sdxxx
vgcreate ovirt-vg /dev/sdxxx
lvcreatae --name ovirt-lv --size 1g ovit-vg
qemu-img create -f qcow2 /dev/ovirt-vg/ovirt-lv 10g

And use /dev/ovirt-vg/ovirt-lv as a disk.
Comment 16 Nir Soffer 2017-07-06 15:35:34 EDT
Here is a simple way to reproduce this, given the lv created in commit 15:

$ sudo virt-install \
    --name bz1467826 \
    --memory 1024 \
    --disk /dev/ovirt-vg/ovirt-lv,format=qcow2,driver_type=qcow2 \
    --cdrom Fedora-Server-dvd-x86_64-25-1.3.iso

During the installation, we can see:

virsh # domblkinfo 6 vda
Capacity:       5368709120
Allocation:     0
Physical:       1073741824
Comment 17 Nir Soffer 2017-07-06 15:45:03 EDT
(continued from comment 16)

virsh # qemu-monitor-command 6 '{"execute":"query-blockstats"}'
{"return":[{"device":"drive-virtio-disk0","parent":{"stats":{"flush_total_time_ns":0,"wr_highest_offset":1076109312,...

At this point the vm is paused, we can extend the lv and resume:

lvextend --size +1g ovirt-vg/ovirt-lv

virsh # resume 6
Domain 6 resumed

virsh # qemu-monitor-command 6 '{"execute":"query-blockstats"}'
{"return":[{"device":"drive-virtio-disk0","parent":{"stats":{"flush_total_time_ns":0,"wr_highest_offset":1245970432,...
Comment 18 John Ferlan 2017-07-06 16:07:43 EDT
Thanks for the example... I've posted a patch upstream that will remove an unconditional setting of @allocation.  Still not quite clear what caused me to put it there other than perhaps disjoint patches...  If you start at commit id 'cf436a56' and work backwards you'll see a number of adjustments in the code where IIRC the goal was to try and make common code paths between storage pools and domains.

Upstream patch for this issue:

https://www.redhat.com/archives/libvir-list/2017-July/msg00143.html
Comment 21 John Ferlan 2017-07-11 22:24:37 EDT
This is now pushed upstream:

commit fde654be5307a570b7b0f31537e18e70a274cd50
Author: John Ferlan <jferlan@redhat.com>
Date:   Thu Jul 6 14:08:38 2017 -0400

    qemu: Fix qemuDomainGetBlockInfo allocation value setting
    
...
    
    Commit id 'b9b1aa639' was supposed to add logic to set the allocation
    for sparse files when wr_highest_offset was zero; however, an unconditional
    setting was done just prior. For block devices, this means allocation is
    always returning 0 since 'actual-size' will be zero.
    
    Remove the unconditional setting and add the note about it being possible
    to still be zero for block devices. As soon as the guest starts writing to
    the volume, the allocation value will then be obtainable from qemu via
    the wr_highest_offset.

$ git describe fde654be5307a570b7b0f31537e18e70a274cd50
v3.5.0-72-gfde654b
$
Comment 23 Yaniv Lavi (Dary) 2017-07-12 04:19:39 EDT
Why should we expect the downstream build to test?
This is urgent.
Comment 24 John Ferlan 2017-07-12 08:20:42 EDT
Yaniv - the best I can say is as long as the process takes.  Unfortunately for me I'm not in the office today as I have family business out of town.  I'm sure someone will pick up the rest of this "today" at some point.
Comment 27 Han Han 2017-07-13 05:52:36 EDT
(In reply to John Ferlan from comment #21)
> This is now pushed upstream:
> 
> commit fde654be5307a570b7b0f31537e18e70a274cd50
> Author: John Ferlan <jferlan@redhat.com>
> Date:   Thu Jul 6 14:08:38 2017 -0400
> 
>     qemu: Fix qemuDomainGetBlockInfo allocation value setting
>     
> ...
>     
>     Commit id 'b9b1aa639' was supposed to add logic to set the allocation
>     for sparse files when wr_highest_offset was zero; however, an
> unconditional
>     setting was done just prior. For block devices, this means allocation is
>     always returning 0 since 'actual-size' will be zero.
>     
>     Remove the unconditional setting and add the note about it being possible
>     to still be zero for block devices. As soon as the guest starts writing
> to
>     the volume, the allocation value will then be obtainable from qemu via
>     the wr_highest_offset.
> 
> $ git describe fde654be5307a570b7b0f31537e18e70a274cd50
> v3.5.0-72-gfde654b
> $

This patch works in virsh. Test with the scratch build contains this patch.
Steps:
Prepare a LV.
# lvs VG/lv2
  LV   VG Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lv2  VG -wi-a----- 10.00g

Attach the disk to VM:
# virsh attach-disk HH /dev/mapper/VG-lv2 vdb                  
Disk attached successfully

# virsh domblkinfo HH vdb                     
Capacity:       10737418240
Allocation:     0
Physical:       10737418240

Write some data in VM and check domblkinfo:
(in VM)# dd if=/dev/urandom of=/dev/vdb bs=10M count=1                                                                                 
1+0 records in
1+0 records out
10485760 bytes (10 MB) copied, 0.879529 s, 11.9 MB/s

# virsh domblkinfo HH vdb
Capacity:       10737418240
Allocation:     10485760
Physical:       10737418240

Extend the lv and check domblkinfo:
# lvextend --size +1g VG/lv2                  
  Size of logical volume VG/lv2 changed from 10.00 GiB (2560 extents) to 11.00 GiB (2816 extents).
  Logical volume VG/lv2 successfully resized.

# virsh domblkinfo HH vdb   
Capacity:       10737418240
Allocation:     10485760
Physical:       11811160064

All works well. Bug fixed.
Comment 28 Han Han 2017-07-13 06:03:06 EDT
Hi John,
Is the 'Allocation' of domblkinfo equal to the highest offset of write or read byte?
Comment 29 John Ferlan 2017-07-13 08:25:11 EDT
Short answer is it's complicated...  and can vary depending on active or inactive domain.

The BlockInfo is (and has been) a bit vague about what gets returned for Allocation when the domain is running. There's a bit of history on this that I ended up following when I made the prior change, but essentially for an active domain the @allocation ends up being the highest written offset "once written" or the value that QEMU returns for actual-size (which is similar to what 'du' would return) if not written yet (IOW: the highest read).

For a device backed to a file, the answer is generally speaking spot on. For a device backed to a block device, the answer gets a lot more complicated which is what all the code in the area of change is handling.

If you read the commit message from 

http://libvirt.org/git/?p=libvirt.git;a=commit;h=b9b1aa639

and from

http://libvirt.org/git/?p=libvirt.git;a=commit;h=0282ca45a

I think you'll get a bit more understanding of the "history". That first commit also references a couple of other commits which could be followed. That second commit notes that physical cannot be obtained via virsh which is not true any more since one could pass the --physical flag and we'd get the physical value as well. There's another bz on that - I don't have the reference handy.

Personally, I think for the purposes of what's being "required" for oVirt to get the high water mark of the device for the active domain, perhaps it would be better to use the  DomainStats API because it always returns the offset of the highest written sector and only works for active domains.

For reference, see:

http://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockInfo

and

http://libvirt.org/html/libvirt-libvirt-domain.html#virConnectGetAllDomainStats
Comment 31 Han Han 2017-09-12 22:54:01 EDT
Test it on libvirt-3.7.0-2.el7.x86_64 qemu-kvm-rhev-2.9.0-16.el7_4.6.x86_64. Passed.

1. Start a VM

2. Attach a block device to the VM:
# virsh attach-disk test /dev/mapper/rhel_dell--op740--04-test vdb
Disk attached successfully

3. Check with domblkinfo
# virsh domblkinfo test vdb
Capacity:       10737418240
Allocation:     0
Physical:       10737418240

4. In VM, write 4096 bytes to the block device
(In VM)# dd if=/dev/urandom of=/dev/vdb count=1 bs=4096
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.00253738 s, 1.6 MB/s

5. Check if domblkinfo's Allocation value mathes the size of writing:
# virsh domblkinfo test vdb                                       
Capacity:       10737418240
Allocation:     4096
Physical:       10737418240

The result is expected.

Note You need to log in before you can comment on or make changes to this bug.