Bug 1185172

Summary: The blockcopy command will hang there in the mirror period with the raw disk
Product: Red Hat Enterprise Linux 7 Reporter: zhenfeng wang <zhwang>
Component: qemu-kvm-rhevAssignee: Jeff Cody <jcody>
Status: CLOSED ERRATA QA Contact: Qianqian Zhu <qizhu>
Severity: high Docs Contact:
Priority: high    
Version: 7.1CC: coli, dyuan, gsun, hhuang, huding, jdenemar, juzhang, knoel, lhuang, michen, mrezanin, mzhan, qizhu, rbalakri, virt-bugs, virt-maint, yanyang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.9.0-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 23:27:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
the log about libvirt
none
the qemu log none

Description zhenfeng wang 2015-01-23 06:39:40 UTC
Description of problem:
The blockcopy command will hang there in the mirror period with the raw disk

Version-Release number:
libvirt-1.2.8-15.el7.x86_64
kernel-3.10.0-224.el7.x86_64
qemu-kvm-rhev-2.1.2-20.el7.x86_64
selinux-policy-3.13.1-19.el7.noarch

How reproducible:
100%

Steps to Reproduce:

1.Create a guest with raw disk
#virsh create rhel7f.xml
#virsh dumpxml rhel7f |grep disk -A6
--
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/var/lib/libvirt/images/rhel7.raw'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>

2.Do blockcopy with the guest, the command will hang there
In the first terminal, run the following command
# time virsh blockcopy rhel7f vda /var/lib/libvirt/images/e.img --wait --verbose

Open another terminal, found we couldn't check the blockjobinfo, also couldn't quit the job by the abort command
except destroy the guest

# virsh blockjob rhel7f vda
error: Timed out during operation: cannot acquire state change lock

# virsh blockjob rhel7f vda --abort
error: Timed out during operation: cannot acquire state change lock

# virsh destroy rhel7f
Domain rhel7f destroyed

after destroy the guest, could see error show in the first terminal
# time virsh blockcopy rhel7f vda /var/lib/libvirt/images/e.img --wait --verbose
^C^C^Cerror: Unable to read from monitor: Connection reset by peer


real        10m40.363s
user        0m0.016s
sys        0m0.017s


3.Downgrade the qemu packet to qemu-kvm-rhev-2.1.2-5.el7.x86_64, then re-do blockcopy
with the raw disk, could do it successfully
# virsh blockcopy rhel7f vda /var/lib/libvirt/images/f.img --wait --verbose
Block Copy: [100 %]
Now in mirroring phase

# virsh dumpxml rhel7f |grep disk -a6
 <disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/var/lib/libvirt/images/rhel7.raw'/>
      <backingStore/>
      <mirror type='file' file='/var/lib/libvirt/images/f.img' format='raw' job='copy' ready='yes'>
        <format type='raw'/>
        <source file='/var/lib/libvirt/images/f.img'/>
      </mirror>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>




Actual results:
Fail to do mirror with the raw disk

Expected results:
should do it successfully

Comment 2 zhenfeng wang 2015-01-23 07:02:32 UTC
The qemu colleague try it with the pure qemu command and they didn't hit the issue, the following steps were got from them and they try it with the qemu-kvm-rhev-2.1.2-20.el7.x86_64. Hope this info is helpfull for you

/usr/libexec/qemu-kvm -enable-kvm -M pc-i440fx-rhel7.0.0 -smp 4 -m 4G -name rhel6.3-64 -uuid 3f2ea5cd-3d29-48ff-aab2-23df1b6ae213 -drive file=/home/RHEL-Server-7.1-64-virtio.raw,cache=none,if=none,rerror=stop,werror=stop,id=drive-virtio-disk0,format=raw,aio=native -device virtio-blk-pci,drive=drive-virtio-disk0,id=device-virtio-disk0,bootindex=1 -boot order=cd -monitor stdio -readconfig nfs/ich9-ehci-uhci.cfg -device usb-tablet,id=input0 -chardev socket,id=s1,path=/tmp/s1,server,nowait -device isa-serial,chardev=s1 -monitor tcp::1235,server,nowait -vga qxl -global qxl-vga.revision=3 -spice port=5930,disable-ticketing -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -vnc :10 -qmp tcp:0:5555,server,nowait -sandbox on -cpu host -netdev tap,script=/etc/qemu-ifup,id=netdev0 -device virtio-net-pci,netdev=netdev0,id=device-net0,mac=aa:54:00:11:22:33

{ 'execute' : 'qmp_capabilities' }
{"return": {}}
{"timestamp": {"seconds": 1421989798, "microseconds": 674653}, "event": "NIC_RX_FILTER_CHANGED", "data": {"name": "device-net0", "path": "/machine/peripheral/device-net0/virtio-backend"}}


{ "execute": "drive-mirror", "arguments": { "device": "drive-virtio-disk0", "target": "/home/sn1", "format": "qcow2", "mode": "absolute-paths", "sync": "true", "speed": 1000000000, "on-source-error": "stop", "on-target-error": "stop" } }
{"error": {"class": "GenericError", "desc": "Invalid parameter 'true'"}}

{ "execute": "drive-mirror", "arguments": { "device": "drive-virtio-disk0", "target": "/home/sn1", "format": "qcow2", "mode": "absolute-paths", "sync": "full", "speed": 1000000000, "on-source-error": "stop", "on-target-error": "stop" } }
{"return": {}}

{"timestamp": {"seconds": 1421990163, "microseconds": 676036}, "event": "VNC_CONNECTED", "data": {"server": {"auth": "none", "family": "ipv4", "service": "5910", "host": "0.0.0.0"}, "client": {"family": "ipv4", "service": "39823", "host": "10.66.115.154"}}}
{"timestamp": {"seconds": 1421990163, "microseconds": 748665}, "event": "VNC_INITIALIZED", "data": {"server": {"auth": "none", "family": "ipv4", "service": "5910", "host": "0.0.0.0"}, "client": {"family": "ipv4", "service": "39823", "host": "10.66.115.154"}}}
{"timestamp": {"seconds": 1421990391, "microseconds": 198984}, "event": "BLOCK_JOB_READY", "data": {"device": "drive-virtio-disk0", "len": 21474836480, "offset": 21474836480, "speed": 1000000000, "type": "mirror"}}

{"execute": "block-job-complete", "arguments": { "device": "drive-virtio-disk0"} }
{"return": {}}
{"timestamp": {"seconds": 1421991719, "microseconds": 70199}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk0", "len": 21474836480, "offset": 21474836480, "speed": 1000000000, "type": "mirror"}}

{"timestamp": {"seconds": 1421991754, "microseconds": 416189}, "event": "RESET"}
{"timestamp": {"seconds": 1421991754, "microseconds": 455172}, "event": "RESET"}
{"timestamp": {"seconds": 1421991776, "microseconds": 303532}, "event": "SHUTDOWN"}
{"timestamp": {"seconds": 1421991776, "microseconds": 325294}, "event": "DEVICE_TRAY_MOVED", "data": {"device": "ide1-cd0", "tray-open": true}}
{"timestamp": {"seconds": 1421991776, "microseconds": 325447}, "event": "DEVICE_TRAY_MOVED", "data": {"device": "floppy0", "tray-open": true}}

Comment 3 Shaolong Hu 2015-01-23 07:37:00 UTC
Hi,

I test qemu side, does have problem, has file a bug:

Bug 1185178 - QMP drive_mirror command with "format":"raw" hang

Comment 4 Jiri Denemark 2015-01-23 08:17:13 UTC
Could you provide the output of "virsh domcontrol $DOMAIN" while qemu hangs (i.e., when virsh blockjob times out)?

Comment 5 Shaolong Hu 2015-01-23 08:56:01 UTC
(In reply to Shaolong Hu from comment #3)
> Hi,
> 
> I test qemu side, does have problem, has file a bug:
> 
> Bug 1185178 - QMP drive_mirror command with "format":"raw" hang

A negtive one, so far no evidence qemu's problem.

Comment 6 zhenfeng wang 2015-01-23 08:57:55 UTC
Terminal1# virsh blockcopy rhel7f vda /var/lib/libvirt/images/f.img --wait --verbose
 ==> hang here


Terminal2# virsh domcontrol rhel7f
occupied (3.300s)

Terminal2# virsh blockjob rhel7f vda
error: Timed out during operation: cannot acquire state change lock
Terminal2# virsh domcontrol rhel7f
occupied (40.461s)

Comment 7 Jiri Denemark 2015-01-23 09:06:18 UTC
Which means libvirt is stuck waiting for a response from QEMU. Can you attach libvirtd debug logs so that we can look at what commands we sent to QEMU and which one caused it to hang?

Comment 8 zhenfeng wang 2015-01-23 09:19:50 UTC
Created attachment 983230 [details]
the log about libvirt

Comment 9 zhenfeng wang 2015-01-23 09:20:35 UTC
Created attachment 983231 [details]
the qemu log

Comment 10 Jiri Denemark 2015-01-23 09:55:12 UTC
So the last QMP command libvirt sent to QEMU is:

    2015-01-23 09:13:34.199+0000: 14530: debug : qemuMonitorIOWrite:507 : QEMU_MONITOR_IO_WRITE: mon=0x7fba8c00d750 buf=
    {
        "execute":"drive-mirror",
        "arguments":{
            "device":"drive-virtio-disk0",
            "target":"/var/lib/libvirt/images/f1.img",
            "speed":0,
            "sync":"full",
            "mode":"absolute-paths",
            "format":"raw"
        },
        "id":"libvirt-96"
    }
    len=195 ret=195 errno=0

and we we got no answer from QEMU for 4 minutes.

The only thing in qemu log is

    Formatting '/var/lib/libvirt/images/f1.img', fmt=raw size=2147483648

Comment 11 Shaolong Hu 2015-01-23 10:38:10 UTC
Btw, this bug can not be reproduced by qemu-kvm QMP command line directly so far, no matter target format is raw or qcow2.

Comment 15 zhenfeng wang 2015-09-23 07:35:38 UTC
Hi Jeff,

While try it with the latest qemu-kvm-rhev-2.3.0-24.el7.x86_64 , virsh blockcopy command won't hang there, if we only excute it alone. However, the blockcopy command will quit automatically , if we do the blockcopy in one terminal meanwhile do blockjob several times in another terminal. BTW, The blockcopy job will continue, even the blockcopy command quit , however, it would excute very slowly, I wait for more than 20 minutes, but it still didn't finish 

pkginfo
libvirt-1.2.17-9.el7.x86_64
kernel-3.10.0-316.el7.x86_64
qemu-kvm-rhev-2.3.0-24.el7.x86_64


steps
1.Create a guest

# virsh create rhel7raw.xml 
Domain rhel7raw created from rhel7raw.xml

2.Open the first terminal, Do blockcopy with the guest while the guest start completely in the first terminal

# time virsh blockcopy rhel7raw vda /var/lib/libvirt/images/f.img --wait --verbose


3.open the second terminal, constantly check the blockjob info, it will report the following error sometimes
# virsh blockjob rhel7raw vda
error: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainGetBlockJobInfo)


# time virsh blockjob rhel7raw vda
error: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainGetBlockJobInfo)


real	0m30.010s
user	0m0.003s
sys	0m0.008s

# time virsh blockjob rhel7raw vda
Block Copy: [  4 %]


real	0m9.508s
user	0m0.007s
sys	0m0.003s

# time virsh blockjob rhel7raw vda  ===>wait long time here


4.Re-check the first terminal, The virsh blockcopy command has quit unexpectly

# time virsh blockcopy rhel7raw vda /var/lib/libvirt/images/f.img --wait --verbose
Block Copy: [  23 %]error: failed to query job for disk vda
error: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainGetBlockJobInfo)


real	3m52.704s
user	0m0.008s
sys	0m0.012s



5.Go back the second terminal , find the blockjob command in step 3 has return the blockjob info, but it wait too long, go on check the blockcopy job, the job was continuing, but excuted very slowly

# time virsh blockjob rhel7raw vda
Block Copy: [ 26 %]


real	7m34.682s
user	0m0.015s
sys	0m0.013s

# time virsh blockjob rhel7raw vda
Block Copy: [ 30 %]


real	4m21.903s
user	0m0.010s
sys	0m0.014s

--
--

Comment 20 Qianqian Zhu 2017-04-24 06:29:30 UTC
KVM QE can not reproduce this issue on qemu-kvm-rhev-2.9.0-1.el7.x86_64.

Steps:
1. Launch guest with raw disk:
/usr/libexec/qemu-kvm -smp 1 -m 1024 -name linux -uuid 3f2ea5cd-3d29-48ff-aab2-23df1b6ae213 -drive file=/home/kvm_autotest_root/images/rhel74-64-virtio.raw,cache=none,if=none,rerror=stop,werror=stop,id=drive-virtio-disk0,format=raw,aio=native -device virtio-blk-pci,drive=drive-virtio-disk0,id=device-virtio-disk0,bootindex=1 -monitor stdio -qmp tcp::1234,server,nowait
2. Block mirror:
{ "execute": "drive-mirror", "arguments": { "device": "drive-virtio-disk0", "target": "/home/sn1", "format": "qcow2", "mode": "absolute-paths", "sync": "full"}}
{"return": {}}
{"execute":"query-block-jobs"}
{"return": [{"io-status": "ok", "device": "drive-virtio-disk0", "busy": true, "len": 21475491840, "offset": 3991928832, "paused": false, "speed": 0, "ready": false, "type": "mirror"}]}
{"timestamp": {"seconds": 1493014995, "microseconds": 370734}, "event": "BLOCK_JOB_READY", "data": {"device": "drive-virtio-disk0", "len": 21486698496, "offset": 21486698496, "speed": 0, "type": "mirror"}}
{"execute": "block-job-complete", "arguments": { "device": "drive-virtio-disk0"} }
{"return": {}}

Result:
Block mirror goes well, no hang.

Comment 23 Qianqian Zhu 2017-05-02 08:07:11 UTC
Per Comment #20, moving to VERIFIED

Comment 25 errata-xmlrpc 2017-08-01 23:27:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 26 errata-xmlrpc 2017-08-02 01:04:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 27 errata-xmlrpc 2017-08-02 01:56:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 28 errata-xmlrpc 2017-08-02 02:37:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 29 errata-xmlrpc 2017-08-02 03:02:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 30 errata-xmlrpc 2017-08-02 03:22:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392