RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1452045 - qemuDomainGetBlockInfo get error when execute domblkinfo and blockcommit in parallel for block device
Summary: qemuDomainGetBlockInfo get error when execute domblkinfo and blockcommit in p...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.4
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: rc
: ---
Assignee: Peter Krempa
QA Contact: Han Han
URL:
Whiteboard:
Depends On:
Blocks: 1449577 1456511
TreeView+ depends on / blocked
 
Reported: 2017-05-18 08:41 UTC by Yanqiu Zhang
Modified: 2019-04-28 13:42 UTC (History)
8 users (show)

Fixed In Version: libvirt-3.2.0-9.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-02 00:08:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
libvirt-testing-libvirtd.log (3.80 MB, application/x-gzip)
2017-05-18 08:59 UTC, Yanqiu Zhang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1846 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2017-08-01 18:02:50 UTC

Description Yanqiu Zhang 2017-05-18 08:41:05 UTC
Description of problem:
qemuDomainGetBlockInfo get error when execute domblkinfo and blockcommit in parallel for block device, which will above 50% probably cause snapshot deleting failure for rhevm product with iscsi storage

Version-Release number of selected component (if applicable):
libvirt-3.2.0-5.el7.x86_64
qemu-kvm-rhev-2.9.0-5.el7.x86_64
vdsm-4.19.14-1.el7ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Prepare an external snapshot xml with a block device:
# cat snap.xml
<domainsnapshot>
<name>layer2</name>
<disks>
<disk name="vda" snapshot="external" type="block">
<source dev="/dev/sdd" />
</disk>
</disks>
</domainsnapshot>

2.Create external snapshot with above xml for a running domain
# virsh snapshot-create V snap.xml  --disk-only
Domain snapshot layer2 created from 'snap.xml'

3.In a terminal, monitor libvirtd error by:
# tail -f /var/log/libvirt/libvirtd.log|grep " error "

4.Open a second terminal, continuously do domblkinfo:
# while true; do virsh domblkinfo V vda;done
...
Capacity:       10737418240
Allocation:     1361248256
Physical:       1361248256
...

5.Open a third terminal, execute blockcommit:
# virsh blockcommit V vda --active --shallow --wait --verbose --pivot
Block commit: [100 %]

6.On rhevm web page,  with iscsi storage, create a snapshot for a running guest, then try to delete the snapshot:
Above 50% probably, will fail to delete the snapshot.


Actual results:
1.In the first terminal as step3, will get following error:
# tail -f /var/log/libvirt/libvirtd.log|grep " error "
2017-05-18 07:10:32.752+0000: 29134: error : qemuDomainGetBlockInfo:11550 : internal error: failed to query the maximum written offset of block device 'vda'
2017-05-18 07:10:32.831+0000: 29135: error : qemuDomainGetBlockInfo:11550 : internal error: failed to query the maximum written offset of block device 'vda'
2017-05-18 07:10:32.911+0000: 29133: error : qemuDomainGetBlockInfo:11550 : internal error: failed to query the maximum written offset of block device 'vda'
2017-05-18 07:10:32.980+0000: 29131: error : qemuDomainGetBlockInfo:11550 : internal error: failed to query the maximum written offset of block device 'vda'
2017-05-18 07:10:33.056+0000: 29134: error : qemuDomainGetBlockInfo:11550 : internal error: failed to query the maximum written offset of block device 'vda'
2017-05-18 07:10:33.132+0000: 29132: error : qemuDomainGetBlockInfo:11550 : internal error: failed to query the maximum written offset of block device 'vda'
2017-05-18 07:10:33.211+0000: 29135: error : qemuDomainGetBlockInfo:11550 : internal error: failed to query the maximum written offset of block device 'vda'

2.In step6, sometimes will fail to delete the snapshot with following error, the probability is above 50%.
"May 18, 2017 1:55:01 PM Failed to delete snapshot 'yan-V2-snap' for VM 'yan-V2'."
"May 18, 2017 1:50:57 PM VDSM hostB command MergeVDS failed: Drive image file could not be found"


Expected results
Should not get error in step3 terminal, and should not block snapshot deleting for rhevm product with iscsi storage.


Additional info:
1.Not reproduce on rhel7.3

2.For rhevm product testing, the vdsmd.log and libvirtd.log are as follows:
(1)# tail -f vdsm.log|grep error -i
2017-05-17 08:20:52,243-0400 ERROR (jsonrpc/2) [jsonrpc.JsonRpcServer] Internal server error (_init_:570)
if ret is None: raise libvirtError ('virDomainGetBlockInfo() failed', dom=self)
libvirtError: internal error: failed to query the maximum written offset of block device 'vdb'
...
(2)# cat libvirtd.log|grep error
2017-05-18 05:53:27.924+0000: 10728: error : qemuDomainAgentAvailable:5775 : Guest agent is not responding: QEMU guest agent is not connected
2017-05-18 05:53:29.061+0000: 10729: error : qemuDomainAgentAvailable:5775 : Guest agent is not responding: QEMU guest agent is not connected
2017-05-18 05:54:02.058+0000: 10728: error : qemuDomainGetBlockInfo:11550 : internal error: failed to query the maximum written offset of block device 'vdb'
2017-05-18 05:54:03.326+0000: 10729: error : qemuDomainGetBlockInfo:11498 : invalid argument: invalid path /rhev/data-center/69e52e62-c582-4491-8f8f-778a2c4db5e5/8094cb97-bb4d-4e9c-9970-e6eb3a1af6a8/images/97452a56-485c-4a1a-8990-facc59685d2a/0a826a49-7755-45cc-8598-279c5bdaebde not assigned to domain
2017-05-18 05:54:03.329+0000: 10728: error : qemuDomainGetBlockInfo:11498 : invalid argument: invalid path /rhev/data-center/69e52e62-c582-4491-8f8f-778a2c4db5e5/8094cb97-bb4d-4e9c-9970-e6eb3a1af6a8/images/41e81e14-e087-4ad0-9603-4d9312bfb142/e798ae9b-ff33-461a-b4df-c83356543c82 not assigned to domain

Comment 3 Peter Krempa 2017-05-18 08:52:43 UTC
(In reply to yanqzhan from comment #0)

[...]

> 2.In step6, sometimes will fail to delete the snapshot with following error,
> the probability is above 50%.
> "May 18, 2017 1:55:01 PM Failed to delete snapshot 'yan-V2-snap' for VM
> 'yan-V2'."
> "May 18, 2017 1:50:57 PM VDSM hostB command MergeVDS failed: Drive image
> file could not be found"

What is the libvirt error in this case?

> Expected results
> Should not get error in step3 terminal, and should not block snapshot
> deleting for rhevm product with iscsi storage.

So these seem like two separate issues, but first we need to clarify what is the cause for the failure of snapshot deletion.

I'd like to know which libvirt APIs are called and what is the error.

Comment 4 Yanqiu Zhang 2017-05-18 08:59:22 UTC
Created attachment 1279933 [details]
libvirt-testing-libvirtd.log

Comment 5 Yanqiu Zhang 2017-05-18 09:28:13 UTC
(In reply to Peter Krempa from comment #3)
> (In reply to yanqzhan from comment #0)
> 
> [...]
> 
> > 2.In step6, sometimes will fail to delete the snapshot with following error,
> > the probability is above 50%.
> > "May 18, 2017 1:55:01 PM Failed to delete snapshot 'yan-V2-snap' for VM
> > 'yan-V2'."
> > "May 18, 2017 1:50:57 PM VDSM hostB command MergeVDS failed: Drive image
> > file could not be found"
> 
> What is the libvirt error in this case?
> 
Please refer to Additional info 2, the libvirt error is:
if ret is None: raise libvirtError ('virDomainGetBlockInfo() failed', dom=self)
libvirtError: internal error: failed to query the maximum written offset of block device 'vdb'


> > Expected results
> > Should not get error in step3 terminal, and should not block snapshot
> > deleting for rhevm product with iscsi storage.
> 
> So these seem like two separate issues, but first we need to clarify what is
> the cause for the failure of snapshot deletion.
> 
> I'd like to know which libvirt APIs are called and what is the error.

The API is virDomainGetBlockInfo().


And to reproduce the error in libvirt way is step1-5.

Comment 6 Peter Krempa 2017-05-18 14:01:30 UTC
So this is the sequence of events, obtained from the logs (and trimmed) by:

egrep '(QEMU_MONITOR_IO_WRITE|error|BLOCK|Commit)' 1452045-libvirtd.log


2017-05-18 08:53:56.128+0000: 13484: info : qemuMonitorIOWrite:536 : QEMU_MONITOR_IO_WRITE: mon=0x7f0868000fb0 buf={"execute":"query-blockstats","id":"libvirt-373"}
2017-05-18 08:53:56.131+0000: 13484: info : qemuMonitorIOWrite:536 : QEMU_MONITOR_IO_WRITE: mon=0x7f0868000fb0 buf={"execute":"query-block","id":"libvirt-374"}
2017-05-18 08:53:56.176+0000: 13484: info : qemuMonitorIOWrite:536 : QEMU_MONITOR_IO_WRITE: mon=0x7f0868000fb0 buf={"execute":"query-blockstats","id":"libvirt-375"}
2017-05-18 08:53:56.180+0000: 13484: info : qemuMonitorIOWrite:536 : QEMU_MONITOR_IO_WRITE: mon=0x7f0868000fb0 buf={"execute":"query-block","id":"libvirt-376"}

qemuDomainGetBlockInfo works properly

2017-05-18 08:53:56.189+0000: 13488: debug : virThreadJobSet:96 : Thread 13488 (virNetServerHandleJob) is now running job remoteDispatchDomainBlockCommit
2017-05-18 08:53:56.189+0000: 13488: debug : remoteDispatchDomainBlockCommitHelper:3545 : server=0x557e1c588a90 client=0x557e1c5be580 msg=0x557e1c5be770 rerr=0x7f08d3ee0c00 args=0x7f08bc002020 ret=0x7f08bc000bb0
2017-05-18 08:53:56.189+0000: 13488: debug : virDomainBlockCommit:10203 : dom=0x7f08bc0020f0, (VM: name=V, uuid=c3a57d2a-7d75-493a-a699-786ac2fd54b3), disk=vda, base=<null>, top=<null>, bandwidth=0, flags=5
2017-05-18 08:53:56.221+0000: 13484: info : qemuMonitorIOWrite:536 : QEMU_MONITOR_IO_WRITE: mon=0x7f0868000fb0 buf={"execute":"query-block","id":"libvirt-377"}
2017-05-18 08:53:56.224+0000: 13484: info : qemuMonitorIOWrite:536 : QEMU_MONITOR_IO_WRITE: mon=0x7f0868000fb0 buf={"execute":"query-block","id":"libvirt-378"}
2017-05-18 08:53:56.227+0000: 13488: debug : qemuMonitorBlockCommit:3321 : device=drive-virtio-disk0, top=/dev/sdd, base=/var/lib/libvirt/images/V.qcow2, backingName=<null>, bandwidth=0
2017-05-18 08:53:56.227+0000: 13488: debug : qemuMonitorBlockCommit:3323 : mon:0x7f0868000fb0 vm:0x7f08742b8950 json:1 fd:23
2017-05-18 08:53:56.228+0000: 13484: info : qemuMonitorIOWrite:536 : QEMU_MONITOR_IO_WRITE: mon=0x7f0868000fb0 buf={"execute":"block-commit","arguments":{"device":"drive-virtio-disk0","top":"/dev/sdd","base":"/var/lib/libvirt/images/V.qcow2"},"id":"libvirt-379"}

Active layer block commit operation is started

2017-05-18 08:53:56.232+0000: 13484: info : qemuMonitorIOProcess:431 : QEMU_MONITOR_IO_PROCESS: mon=0x7f0868000fb0 buf={"timestamp": {"seconds": 1495097636, "microseconds": 232749}, "event": "BLOCK_JOB_READY", "data": {"device": "drive-virtio-disk0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
2017-05-18 08:53:56.232+0000: 13484: debug : qemuMonitorJSONIOProcessLine:193 : Line [{"timestamp": {"seconds": 1495097636, "microseconds": 232749}, "event": "BLOCK_JOB_READY", "data": {"device": "drive-virtio-disk0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}]
2017-05-18 08:53:56.232+0000: 13484: debug : virJSONValueFromString:1713 : string={"timestamp": {"seconds": 1495097636, "microseconds": 232749}, "event": "BLOCK_JOB_READY", "data": {"device": "drive-virtio-disk0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
2017-05-18 08:53:56.233+0000: 13484: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7f0868000fb0 event={"timestamp": {"seconds": 1495097636, "microseconds": 232749}, "event": "BLOCK_JOB_READY", "data": {"device": "drive-virtio-disk0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
2017-05-18 08:53:56.233+0000: 13484: debug : qemuMonitorEmitEvent:1297 : mon=0x7f0868000fb0 event=BLOCK_JOB_READY
2017-05-18 08:53:56.233+0000: 13484: debug : qemuMonitorJSONIOProcessEvent:179 : handle BLOCK_JOB_READY handler=0x7f088fef1b60 data=0x557e1c5773d0

Block commit finihed, the job is ready to be merged.

2017-05-18 08:53:56.234+0000: 13488: debug : virThreadJobClear:121 : Thread 13488 (virNetServerHandleJob) finished job remoteDispatchDomainBlockCommit with ret=0

virDomainBlockCommit api returns success.

2017-05-18 08:53:56.236+0000: 13484: info : qemuMonitorIOWrite:536 : QEMU_MONITOR_IO_WRITE: mon=0x7f0868000fb0 buf={"execute":"query-blockstats","id":"libvirt-380"}
2017-05-18 08:53:56.240+0000: 13484: info : qemuMonitorIOWrite:536 : QEMU_MONITOR_IO_WRITE: mon=0x7f0868000fb0 buf={"execute":"query-block","id":"libvirt-381"}
2017-05-18 08:53:56.243+0000: 13489: error : qemuDomainGetBlockInfo:11550 : internal error: failed to query the maximum written offset of block device 'vda'

qemuDomainGetBlockInfo fails at this point


2017-05-18 08:53:56.244+0000: 13484: info : qemuMonitorIOWrite:536 : QEMU_MONITOR_IO_WRITE: mon=0x7f0868000fb0 buf={"execute":"query-block-jobs","id":"libvirt-382"}
2017-05-18 08:53:56.249+0000: 13484: info : qemuMonitorIOWrite:536 : QEMU_MONITOR_IO_WRITE: mon=0x7f0868000fb0 buf={"execute":"block-job-complete","arguments":{"device":"drive-virtio-disk0"},"id":"libvirt-383"}

Request to pivot to the backing image

2017-05-18 08:53:56.250+0000: 13484: info : qemuMonitorIOProcess:431 : QEMU_MONITOR_IO_PROCESS: mon=0x7f0868000fb0 buf={"timestamp": {"seconds": 1495097636, "microseconds": 250756}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
2017-05-18 08:53:56.250+0000: 13484: debug : qemuMonitorJSONIOProcessLine:193 : Line [{"timestamp": {"seconds": 1495097636, "microseconds": 250756}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}]
2017-05-18 08:53:56.251+0000: 13484: debug : virJSONValueFromString:1713 : string={"timestamp": {"seconds": 1495097636, "microseconds": 250756}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
2017-05-18 08:53:56.251+0000: 13484: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7f0868000fb0 event={"timestamp": {"seconds": 1495097636, "microseconds": 250756}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
2017-05-18 08:53:56.251+0000: 13484: debug : qemuMonitorEmitEvent:1297 : mon=0x7f0868000fb0 event=BLOCK_JOB_COMPLETED
2017-05-18 08:53:56.251+0000: 13484: debug : qemuMonitorJSONIOProcessEvent:179 : handle BLOCK_JOB_COMPLETED handler=0x7f088fef1bc0 data=0x557e1c5789d0

Pivot is finished successfully, events are dispatched.

2017-05-18 08:53:56.268+0000: 13484: info : qemuMonitorIOWrite:536 : QEMU_MONITOR_IO_WRITE: mon=0x7f0868000fb0 buf={"execute":"query-block","id":"libvirt-384"}
2017-05-18 08:53:56.271+0000: 13484: info : qemuMonitorIOWrite:536 : QEMU_MONITOR_IO_WRITE: mon=0x7f0868000fb0 buf={"execute":"query-named-block-nodes","id":"libvirt-385"}
2017-05-18 08:53:56.342+0000: 13484: info : qemuMonitorIOWrite:536 : QEMU_MONITOR_IO_WRITE: mon=0x7f0868000fb0 buf={"execute":"query-blockstats","id":"libvirt-386"}
2017-05-18 08:53:56.345+0000: 13484: info : qemuMonitorIOWrite:536 : QEMU_MONITOR_IO_WRITE: mon=0x7f0868000fb0 buf={"execute":"query-block","id":"libvirt-387"}
2017-05-18 08:53:56.389+0000: 13484: info : qemuMonitorIOWrite:536 : QEMU_MONITOR_IO_WRITE: mon=0x7f0868000fb0 buf={"execute":"query-blockstats","id":"libvirt-388"}
2017-05-18 08:53:56.392+0000: 13484: info : qemuMonitorIOWrite:536 : QEMU_MONITOR_IO_WRITE: mon=0x7f0868000fb0 buf={"execute":"query-block","id":"libvirt-389"}

qemuDomainGetBlockInfo works properly again

Comment 9 Peter Krempa 2017-06-02 10:52:03 UTC
commit c245f558369de5eb404e54728bf69ad27942e912
Author: Peter Krempa <pkrempa>
Date:   Mon May 22 17:44:00 2017 +0200

    qemu: Don't error out if allocation info can't be queried
    
    qemuDomainGetBlockInfo would error out if qemu did not report
    'wr_highest_offset'. This usually does not happen, but can happen
    briefly during active layer block commit. There's no need to report the
    error, we can simply report that the disk is fully alocated at that
    point.

Comment 12 Han Han 2017-06-08 02:28:42 UTC
Verify it on libvirt-3.2.0-9.virtcov.el7.x86_64 qemu-kvm-rhev-2.9.0-8.el7.x86_64
Using following script to get blkinfo while blockcommit, blockpull, blockcopy:
1. Prepare a VM xml with following disk:
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/V.qcow2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>
2. Prepare block sdb with partition sdb1,sdb2
# lsblk /dev/sdb
NAME                                MAJ:MIN RM SIZE RO TYPE  MOUNTPOINT
sdb                                   8:16   0  30G  0 disk  
├─sdb1                                8:17   0  15G  0 part  
├─sdb2                                8:18   0  15G  0 part 

3. Run following scripts to get blkinfo while blockcommit, blockpull, blockcopy:
#!/bin/bash
DOM=V
SNAP=/dev/sdb
echo '' > /var/log/libvirt/libvirtd.log
echo '' > /var/log/libvirt/qemu/$DOM.log
set -x
virsh create $DOM.xml && sleep 10
virsh snapshot-create-as $DOM --disk-only --diskspec vda,file=/dev/sdb1 --no-metadata
set +x
while true; do virsh domblkinfo $DOM vda 1>/dev/null;done&
set -x
virsh blockcommit $DOM vda --active --shallow --wait --verbose --pivot --async
virsh blockpull $DOM vda --wait --verbose --async
virsh blockcopy $DOM vda /dev/sdb2 --blockdev --wait --verbose --async --pivot
sleep 2
kill %1
virsh destroy $DOM
grep 'error' /var/log/libvirt/libvirtd.log /var/log/libvirt/qemu/$DOM.log >> result

The result:
+ virsh create V.xml
Domain V created from V.xml

+ sleep 10
+ virsh snapshot-create-as V --disk-only --diskspec vda,file=/dev/sdb1 --no-metadata
Domain snapshot 1496887939 created
+ set +x
+ virsh blockcommit V vda --active --shallow --wait --verbose --pivot --async
Block commit: [100 %]
Successfully pivoted
+ virsh blockpull V vda --wait --verbose --async
Block Pull: [100 %]
Pull complete
+ virsh blockcopy V vda /dev/sdb2 --blockdev --wait --verbose --async --pivot
Block Copy: [100 %]
Successfully pivoted
+ sleep 2
+ kill %1
+ virsh destroy V
Domain V destroyed

./1452045.sh: line 36:  6768 Terminated              while true; do
    virsh domblkinfo $DOM vda > /dev/null;
done

No error found in result file.
Bug fixed.

Comment 13 errata-xmlrpc 2017-08-02 00:08:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1846

Comment 14 errata-xmlrpc 2017-08-02 01:32:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1846


Note You need to log in before you can comment on or make changes to this bug.