Bug 798857 - pkill qemu-kvm appear block I/O error after live snapshot for multiple vms in parallelly
pkill qemu-kvm appear block I/O error after live snapshot for multiple vms in...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
6.3
x86_64 Linux
medium Severity medium
: rc
: ---
Assigned To: Kevin Wolf
Virtualization Bugs
:
: 794691 805893 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-01 00:41 EST by Sibiao Luo
Modified: 2013-01-09 19:44 EST (History)
22 users (show)

See Also:
Fixed In Version: qemu-kvm-0.12.1.2-2.275.el6
Doc Type: Bug Fix
Doc Text:
No documentation needed
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-06-20 07:44:04 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Sibiao Luo 2012-03-01 00:41:14 EST
Description of problem:
create live snapshot for multiple vms in parallelly, and do qemu-img check the snapshot, then pkill qemu-kvm processes, there were block I/O error. 
 
I have wrote a script to create live snapshot for 3 vms in parallelly,
# cat create_live_snapshot_parallelly.sh 
#!/bin/bash
for i in 1 2 3
do 
   /usr/libexec/qemu-kvm -smp 4 -m 4G -usbdevice tablet -name RHEL-Server-6.3-64 -drive file=/home/$i,if=none,id=$i-drive-virtio-disk,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,drive=$i-drive-virtio-disk,id=virtio,bootindex=1 -netdev tap,id=$i-hostnet0,vhost=on -device virtio-net-pci,netdev=$i-hostnet0,id=$i-net,mac=00:1a:4a:12:0b:1$i,bus=pci.0 -vnc :$i -monitor unix:/tmp/monitor$i,server,nowait &
done
sleep 30
for i in 1 2 3
do
   echo "snapshot_blkdev $i-drive-virtio-disk /home/snapshot$i qcow2" | nc -U /tmp/monitor$i  &
done

Version-Release number of selected component (if applicable):
host info:
# uname -r & rpm -q qemu-kvm
2.6.32-235.el6.x86_64
qemu-kvm-0.12.1.2-2.231.el6.x86_64
guest info:
# uname -r
2.6.32-235.el6.x86_64
# rpm -q iozone
iozone-3-4.el5.x86_64

How reproducible:
6/10

Steps to Reproduce:
1.start 3 guests and run iozone for each guest at the begin of the vms set up.
2.create live snapshot for each vm in parallelly.
3.qemu-img check the snapshots.
[root@localhost home]# qemu-img check snapshot1
No errors were found on the image.
[root@localhost home]# qemu-img check snapshot2
No errors were found on the image.
[root@localhost home]# qemu-img check snapshot3
No errors were found on the image.
4.pkill qemu-kvm processes.
# pkill qemu-kvm
5.qemu-img check the images. 
[root@localhost home]# qemu-img check 1
No errors were found on the image.
[root@localhost home]# qemu-img check 2
No errors were found on the image.
[root@localhost home]# qemu-img check 3
No errors were found on the image.
 
Actual results:
After the step 4, the result as following,
[root@localhost home]# pkill qemu-kvm
qemu: terminating on signal 15 from pid 16502
qemu: terminating on signal 15 from pid 16502
[root@localhost home]# block I/O error in device '2-drive-virtio-disk': Operation not permitted (1)
block I/O error in device '2-drive-virtio-disk': Operation not permitted (1)
block I/O error in device '2-drive-virtio-disk': Operation not permitted (1)
block I/O error in device '2-drive-virtio-disk': Operation not permitted (1)
/etc/qemu-ifdown: could not launch network script
qemu-kvm: block/qcow2-cache.c:69: qcow2_cache_destroy: Assertion `c->entries[i].ref == 0' failed.

Expected results:
pkill all the qemu-kill processes successfully, and there should not be appear "qemu-kvm: block/qcow2-cache.c:69: qcow2_cache_destroy: Assertion `c->entries[i].ref == 0' failed" error.

Additional info:
If add a step before step 3 to stop the vms in the monitor or do not make qemu-img check the snapshots before the step 4, there were no error to produce, and the issue would be easily to reproduce when the images were new installed for each testing.
Comment 2 juzhang 2012-03-01 01:01:30 EST
From ""qemu-kvm: block/qcow2-cache.c:69: qcow2_cache_destroy: Assertion
`c->entries[i].ref == 0' failed" error.", this issue seems duplicate Bug 798499 - Guest aborted sometimes when quit it after a savevm. however,this bug has extra info "block I/O error in device '2-drive-virtio-disk':
Operation not permitted (1)
block I/O error in device '2-drive-virtio-disk': Operation not permitted (1)".anyway,mark qa_ack+ first and cc kwolf.
Comment 3 Sibiao Luo 2012-03-01 01:22:00 EST
(In reply to comment #2)
> From ""qemu-kvm: block/qcow2-cache.c:69: qcow2_cache_destroy: Assertion
> `c->entries[i].ref == 0' failed" error.", this issue seems duplicate Bug 798499
> - Guest aborted sometimes when quit it after a savevm. however,this bug has
> extra info "block I/O error in device '2-drive-virtio-disk':
> Operation not permitted (1)
> block I/O error in device '2-drive-virtio-disk': Operation not permitted
> (1)".anyway,mark qa_ack+ first and cc kwolf.

I have communicated this issue with kwolf, and he said "Some I/O requests seem to go wrong for some reason, and not quite sure where they would come form".
According my test results, the "block I/O error" not always came up with "qemu-kvm: block/qcow2-cache.c:69: qcow2_cache_destroy: Assertion
`c->entries[i].ref == 0' failed" at the same time. Some times only the aborted error produced, and some times they came up with together.
Comment 4 Kevin Wolf 2012-04-10 11:10:17 EDT
I noticed that bdrv_close() doesn't call qemu_aio_flush(), so there can be dangling pointers. Not sure if it's related to the root cause of this bug.
Comment 5 Kevin Wolf 2012-04-11 06:45:59 EDT
Sent a patch upstream, along with a qemu-iotests case:
[PATCH 0/3] block: Drain requests in bdrv_close

Sibiao has successfully tested a RHEL 6.3 backport, and I'm going to send it to rhvirt-patches now.
Comment 6 Kevin Wolf 2012-04-12 06:24:02 EDT
*** Bug 794691 has been marked as a duplicate of this bug. ***
Comment 7 Kevin Wolf 2012-04-12 06:29:01 EDT
*** Bug 805893 has been marked as a duplicate of this bug. ***
Comment 10 Huang Wenlong 2012-04-18 07:18:16 EDT
Hi, 


version: 
qemu-kvm-rhev-0.12.1.2-2.277

I find another issue in the qemu-kvm-rhev spec , 
libguestfs requires "qemu-kvm >= 2:0.12.1.0"  ,but qemu-kvm-rhev provides 
"qemu-kvm = 0.12.1.2-2.275.el6"  it loses the "2:"  , when update libguestfs it will show error message even I install the qemu-kvm-rhev : 

qemu-kvm >= 2:0.12.1.0 is needed by libguestfs-1:1.16.15-1.el6.x86_64

Do I need file a new bug to trace this issue?

Wenlong
Comment 11 Huang Wenlong 2012-04-18 07:21:28 EDT
Sorry ! Please ignore Comment 10 , it is my mistake :(
Comment 12 Sibiao Luo 2012-04-18 23:02:22 EDT
verified this issue with the same steps.

test environment and results as following: 
host info:
# uname -r && rpm -q qemu-kvm-rhev
2.6.32-262.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.278.el6.x86_64
guest info:
# uname -r
2.6.32-262.el6.x86_64

Actual results:
After the step 4, pkill all the qemu-kill processes successfully.
# qemu: terminating on signal 15qemu: terminating on signal 15 from pid 4936
 from pid 4936
qemu: terminating on signal 15 from pid 4936
qemu: terminating on signal 15 from pid 4936

Above all, this issue has been fixed.
Comment 14 Dor Laor 2012-04-22 07:30:16 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
No documentation needed
Comment 15 errata-xmlrpc 2012-06-20 07:44:04 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0746.html

Note You need to log in before you can comment on or make changes to this bug.