798857 – pkill qemu-kvm appear block I/O error after live snapshot for multiple vms in parallelly

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 798857 - pkill qemu-kvm appear block I/O error after live snapshot for multiple vms in parallelly

Summary: pkill qemu-kvm appear block I/O error after live snapshot for multiple vms in...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	6.3
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Kevin Wolf
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	794691 805893 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-03-01 05:41 UTC by Sibiao Luo
Modified:	2013-01-10 00:44 UTC (History)
CC List:	22 users (show)
Fixed In Version:	qemu-kvm-0.12.1.2-2.275.el6
Doc Type:	Bug Fix
Doc Text:	No documentation needed
Clone Of:
Environment:
Last Closed:	2012-06-20 11:44:04 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2012:0746	0	normal	SHIPPED_LIVE	qemu-kvm bug fix and enhancement update	2012-06-19 19:31:48 UTC

Description Sibiao Luo 2012-03-01 05:41:14 UTC

Description of problem:
create live snapshot for multiple vms in parallelly, and do qemu-img check the snapshot, then pkill qemu-kvm processes, there were block I/O error. 
 
I have wrote a script to create live snapshot for 3 vms in parallelly,
# cat create_live_snapshot_parallelly.sh 
#!/bin/bash
for i in 1 2 3
do 
   /usr/libexec/qemu-kvm -smp 4 -m 4G -usbdevice tablet -name RHEL-Server-6.3-64 -drive file=/home/$i,if=none,id=$i-drive-virtio-disk,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,drive=$i-drive-virtio-disk,id=virtio,bootindex=1 -netdev tap,id=$i-hostnet0,vhost=on -device virtio-net-pci,netdev=$i-hostnet0,id=$i-net,mac=00:1a:4a:12:0b:1$i,bus=pci.0 -vnc :$i -monitor unix:/tmp/monitor$i,server,nowait &
done
sleep 30
for i in 1 2 3
do
   echo "snapshot_blkdev $i-drive-virtio-disk /home/snapshot$i qcow2" | nc -U /tmp/monitor$i  &
done

Version-Release number of selected component (if applicable):
host info:
# uname -r & rpm -q qemu-kvm
2.6.32-235.el6.x86_64
qemu-kvm-0.12.1.2-2.231.el6.x86_64
guest info:
# uname -r
2.6.32-235.el6.x86_64
# rpm -q iozone
iozone-3-4.el5.x86_64

How reproducible:
6/10

Steps to Reproduce:
1.start 3 guests and run iozone for each guest at the begin of the vms set up.
2.create live snapshot for each vm in parallelly.
3.qemu-img check the snapshots.
[root@localhost home]# qemu-img check snapshot1
No errors were found on the image.
[root@localhost home]# qemu-img check snapshot2
No errors were found on the image.
[root@localhost home]# qemu-img check snapshot3
No errors were found on the image.
4.pkill qemu-kvm processes.
# pkill qemu-kvm
5.qemu-img check the images. 
[root@localhost home]# qemu-img check 1
No errors were found on the image.
[root@localhost home]# qemu-img check 2
No errors were found on the image.
[root@localhost home]# qemu-img check 3
No errors were found on the image.
 
Actual results:
After the step 4, the result as following,
[root@localhost home]# pkill qemu-kvm
qemu: terminating on signal 15 from pid 16502
qemu: terminating on signal 15 from pid 16502
[root@localhost home]# block I/O error in device '2-drive-virtio-disk': Operation not permitted (1)
block I/O error in device '2-drive-virtio-disk': Operation not permitted (1)
block I/O error in device '2-drive-virtio-disk': Operation not permitted (1)
block I/O error in device '2-drive-virtio-disk': Operation not permitted (1)
/etc/qemu-ifdown: could not launch network script
qemu-kvm: block/qcow2-cache.c:69: qcow2_cache_destroy: Assertion `c->entries[i].ref == 0' failed.

Expected results:
pkill all the qemu-kill processes successfully, and there should not be appear "qemu-kvm: block/qcow2-cache.c:69: qcow2_cache_destroy: Assertion `c->entries[i].ref == 0' failed" error.

Additional info:
If add a step before step 3 to stop the vms in the monitor or do not make qemu-img check the snapshots before the step 4, there were no error to produce, and the issue would be easily to reproduce when the images were new installed for each testing.

Comment 2 juzhang 2012-03-01 06:01:30 UTC

From ""qemu-kvm: block/qcow2-cache.c:69: qcow2_cache_destroy: Assertion
`c->entries[i].ref == 0' failed" error.", this issue seems duplicate Bug 798499 - Guest aborted sometimes when quit it after a savevm. however,this bug has extra info "block I/O error in device '2-drive-virtio-disk':
Operation not permitted (1)
block I/O error in device '2-drive-virtio-disk': Operation not permitted (1)".anyway,mark qa_ack+ first and cc kwolf.

Comment 3 Sibiao Luo 2012-03-01 06:22:00 UTC

(In reply to comment #2)
> From ""qemu-kvm: block/qcow2-cache.c:69: qcow2_cache_destroy: Assertion
> `c->entries[i].ref == 0' failed" error.", this issue seems duplicate Bug 798499
> - Guest aborted sometimes when quit it after a savevm. however,this bug has
> extra info "block I/O error in device '2-drive-virtio-disk':
> Operation not permitted (1)
> block I/O error in device '2-drive-virtio-disk': Operation not permitted
> (1)".anyway,mark qa_ack+ first and cc kwolf.

I have communicated this issue with kwolf, and he said "Some I/O requests seem to go wrong for some reason, and not quite sure where they would come form".
According my test results, the "block I/O error" not always came up with "qemu-kvm: block/qcow2-cache.c:69: qcow2_cache_destroy: Assertion
`c->entries[i].ref == 0' failed" at the same time. Some times only the aborted error produced, and some times they came up with together.

Comment 4 Kevin Wolf 2012-04-10 15:10:17 UTC

I noticed that bdrv_close() doesn't call qemu_aio_flush(), so there can be dangling pointers. Not sure if it's related to the root cause of this bug.

Comment 5 Kevin Wolf 2012-04-11 10:45:59 UTC

Sent a patch upstream, along with a qemu-iotests case:
[PATCH 0/3] block: Drain requests in bdrv_close

Sibiao has successfully tested a RHEL 6.3 backport, and I'm going to send it to rhvirt-patches now.

Comment 6 Kevin Wolf 2012-04-12 10:24:02 UTC

*** Bug 794691 has been marked as a duplicate of this bug. ***

Comment 7 Kevin Wolf 2012-04-12 10:29:01 UTC

*** Bug 805893 has been marked as a duplicate of this bug. ***

Comment 10 Huang Wenlong 2012-04-18 11:18:16 UTC

Hi, 


version: 
qemu-kvm-rhev-0.12.1.2-2.277

I find another issue in the qemu-kvm-rhev spec , 
libguestfs requires "qemu-kvm >= 2:0.12.1.0"  ,but qemu-kvm-rhev provides 
"qemu-kvm = 0.12.1.2-2.275.el6"  it loses the "2:"  , when update libguestfs it will show error message even I install the qemu-kvm-rhev : 

qemu-kvm >= 2:0.12.1.0 is needed by libguestfs-1:1.16.15-1.el6.x86_64

Do I need file a new bug to trace this issue?

Wenlong

Comment 11 Huang Wenlong 2012-04-18 11:21:28 UTC

Sorry ! Please ignore Comment 10 , it is my mistake :(

Comment 12 Sibiao Luo 2012-04-19 03:02:22 UTC

verified this issue with the same steps.

test environment and results as following: 
host info:
# uname -r && rpm -q qemu-kvm-rhev
2.6.32-262.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.278.el6.x86_64
guest info:
# uname -r
2.6.32-262.el6.x86_64

Actual results:
After the step 4, pkill all the qemu-kill processes successfully.
# qemu: terminating on signal 15qemu: terminating on signal 15 from pid 4936
 from pid 4936
qemu: terminating on signal 15 from pid 4936
qemu: terminating on signal 15 from pid 4936

Above all, this issue has been fixed.

Comment 14 Dor Laor 2012-04-22 11:30:16 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
No documentation needed

Comment 15 errata-xmlrpc 2012-06-20 11:44:04 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0746.html

Note You need to log in before you can comment on or make changes to this bug.

acathrow
areis
bcao
bsarathy
chayang
flang
gyue
juzhang
kwolf
michen
mkenneth
qzhang
shu
sluo
tburke
virt-maint
wdai
whuang
wquan
xfu
xigao
xwei