Bug 812705

Summary: Installing guest with cluster_size=4096, failed
Product: Red Hat Enterprise Linux 6 Reporter: daiwei <wdai>
Component: qemu-kvmAssignee: Kevin Wolf <kwolf>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.3CC: acathrow, areis, asias, bsarathy, dyasny, flang, juzhang, kwolf, michen, minovotn, mkenneth, pbonzini, shuang, sluo, syeghiay, tburke, virt-maint, xwei
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-0.12.1.2-2.288.el6 Doc Type: Bug Fix
Doc Text:
NEEDINFO
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 11:46:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description daiwei 2012-04-16 02:49:31 UTC
Description of problem:

Install guest with virtio-scsi interface and cluster_size=4096, when formating disk qemu-kvm gets Aborted.

Version-Release number of selected component (if applicable):

# uname -r;rpm -q qemu-kvm
2.6.32-259.el6.x86_64
qemu-kvm-0.12.1.2-2.269.el6.scsifixes.x86_64


How reproducible:
3/3

Steps to Reproduce:
1. Create a qcow2 image with cluster_size=4096
e.g
# qemu-img create -f qcow2 sysdisk.qcow2 20G -o cluster_size=4096

2.Install guest 

/usr/libexec/qemu-kvm -cpu cpu64-rhel6 -rtc base=localtime,clock=host,driftfix=slew -M rhel6.3.0 -enable-kvm -name rhel6.3-64 -smp 4,cores=2,threads=1,sockets=2 -m 4G -uuid c944829b-9aa0-46a2-b3d0-493c135da24d -boot menu=on -drive file=/home/sysdisk.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,aio=native,media=disk,werror=stop,rerror=stop -device virtio-scsi-pci,id=bus1 -device scsi-hd,bus=bus1.0,drive=drive-virtio-disk0,id=virtio-scsi-pci0 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup-switch -device virtio-net-pci,netdev=hostnet0,id=net0,mac=44:37:E6:5E:A3:F7 -spice port=9000,disable-ticketing -vga qxl -global qxl-vga.vram_size=67108864 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -monitor stdio -usb -device usb-tablet,id=input1 -drive file=/home/RHEL6.3-20120329.0-Server-x86_64-DVD1.iso,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native,media=cdrom -device virtio-scsi-pci,id=bus2 -device scsi-cd,bus=bus2.0,drive=drive-virtio-disk1,id=virtio-scsi-pci1,bootindex=1

3.
  
Actual results:
When formating disk, qemu-kvm gets Aborted.

Program received signal SIGABRT, Aborted.
0x00007ffff57788a5 in raise () from /lib64/libc.so.6

(gdb) bt
#0  0x00007ffff57788a5 in raise () from /lib64/libc.so.6
#1  0x00007ffff577a085 in abort () from /lib64/libc.so.6
#2  0x00007ffff7e3c42e in qcow2_cache_find_entry_to_replace (bs=0x7ffff8854c30, c=0x7ffff87027d0, offset=3279413248,
    table=0x7fffd75764b8, read_from_disk=false) at block/qcow2-cache.c:209
#3  qcow2_cache_do_get (bs=0x7ffff8854c30, c=0x7ffff87027d0, offset=3279413248, table=0x7fffd75764b8, read_from_disk=false)
    at block/qcow2-cache.c:229
#4  0x00007ffff7e3a299 in l2_allocate (bs=0x7ffff8854c30, offset=6463520768, new_l2_table=0x7fffd7576548, new_l2_offset=0x7fffd7576550,
    new_l2_index=0x7fffd757655c) at block/qcow2-cluster.c:180
#5  get_cluster_table (bs=0x7ffff8854c30, offset=6463520768, new_l2_table=0x7fffd7576548, new_l2_offset=0x7fffd7576550,
    new_l2_index=0x7fffd757655c) at block/qcow2-cluster.c:512
#6  0x00007ffff7e3a6e6 in qcow2_alloc_cluster_offset (bs=0x7ffff8854c30, offset=6463520768, n_start=0, n_end=1008, num=0x7fffd757666c,
    m=0x7fffd7576600) at block/qcow2-cluster.c:714
#7  0x00007ffff7e363bf in qcow2_co_writev (bs=0x7ffff8854c30, sector_num=<value optimized out>, remaining_sectors=1008,
    qiov=0x7fffd855e838) at block/qcow2.c:555
#8  0x00007ffff7e215fa in bdrv_co_do_writev (bs=0x7ffff8854c30, sector_num=12624064, nb_sectors=1008, qiov=0x7fffd855e838,
    flags=<value optimized out>) at block.c:1734
#9  0x00007ffff7e216a1 in bdrv_co_do_rw (opaque=0x7fffd855e890) at block.c:3032
#10 0x00007ffff7e26b6b in coroutine_trampoline (i0=<value optimized out>, i1=<value optimized out>) at coroutine-ucontext.c:129
#11 0x00007ffff5789630 in ?? () from /lib64/libc.so.6
#12 0x00007fffedd5a530 in ?? ()
#13 0x0000000000000000 in ?? ()

Expected results:
Install guest successfully.

Additional info:
In guest format a data disk with cluster_size=4096, gets the same issue.
With virtio-blk interface no this issue.

Comment 2 Xiaoqing Wei 2012-04-16 04:55:33 UTC
reproduced on:
2.6.32-262.el6.x86_64
qemu-kvm-0.12.1.2-2.275.el6.x86_64

using virtio-blk

Comment 3 daiwei 2012-04-16 12:35:41 UTC
Sorry for report this bug on a private tree, i can reproduce this on the latest qemu-kvm and using virtio-scsi disk :

# uname -r;rpm -q qemu-kvm
2.6.32-262.el6.x86_64
qemu-kvm-0.12.1.2-2.275.el6.x86_64

Comment 4 Dor Laor 2012-04-17 12:19:52 UTC
Kevin, can you look whether its a symptom of a critical issue?

Comment 5 Kevin Wolf 2012-04-24 14:59:16 UTC
The problem could in theory occur even with the default cluster size, even though it's rather unlikely. It happens when allocating requests to more than 16 different L2 tables are queued because they depend on other requests.

It causes a qemu abort(), but image consistency is not harmed. Critical enough to be fixed in 6.3, I'd say.

I found a reproducer, added it to qemu-iotests and sent upstream patches. RHEL code is different, so I'm working on a different fix there.

Comment 9 langfang 2012-05-03 05:51:53 UTC
reporduce this issue with steps and  environment as follows:
# uname -r
2.6.32-269.el6.x86_64
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.287.el6.x86_64

steps:
1)# qemu-img create -f qcow2 sysdisk.qcow2 20G -o cluster_size=4096
2)boot guest with virtio-scsi interface and cluster_size=4096,
 /usr/libexec/qemu-kvm -cpu cpu64-rhel6 -rtc base=localtime,clock=host,driftfix=slew -M rhel6.3.0 -enable-kvm -name rhel6.3 -smp 4,cores=2,threads=1,sockets=2 -m 4G -uuid a3d13230-f1c1-4dc9-95de-bb92b2017674 -boot menu=on -drive file=/home/sysdisk.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,aio=native,media=disk,werror=stop,rerror=stop -device virtio-scsi-pci,id=bus1 -device scsi-hd,bus=bus1.0,drive=drive-virtio-disk0,id=virtio-scsi-pci0 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=44:37:E6:97:58:89 -spice port=9000,disable-ticketing -vga qxl -global qxl-vga.vram_size=67108864 -monitor stdio -usb -device usb-tablet,id=input1 -drive file=/home/RHEL6.3-20120426.2-Server-x86_64-DVD1.iso,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native,media=cdrom -device virtio-scsi-pci,id=bus2 -device scsi-cd,bus=bus2.0,drive=drive-virtio-disk1,id=virtio-scsi-pci1,bootindex=1

results : when formating
disk qemu-kvm gets Aborted.
bt
#0  0x00007ffff57798a5 in raise () from /lib64/libc.so.6
#1  0x00007ffff577b085 in abort () from /lib64/libc.so.6
#2  0x00007ffff7e3d7ae in qcow2_cache_find_entry_to_replace (bs=0x7ffff86ef010, c=0x7ffff86d8df0, offset=70295552, table=0x7fffda5a4108, 
    read_from_disk=false) at block/qcow2-cache.c:209
#3  qcow2_cache_do_get (bs=0x7ffff86ef010, c=0x7ffff86d8df0, offset=70295552, table=0x7fffda5a4108, read_from_disk=false)
    at block/qcow2-cache.c:229
#4  0x00007ffff7e3b619 in l2_allocate (bs=0x7ffff86ef010, offset=2693144576, new_l2_table=0x7fffda5a4198, new_l2_offset=0x7fffda5a41a0, 
    new_l2_index=0x7fffda5a41ac) at block/qcow2-cluster.c:180
#5  get_cluster_table (bs=0x7ffff86ef010, offset=2693144576, new_l2_table=0x7fffda5a4198, new_l2_offset=0x7fffda5a41a0, 
    new_l2_index=0x7fffda5a41ac) at block/qcow2-cluster.c:512
#6  0x00007ffff7e3ba66 in qcow2_alloc_cluster_offset (bs=0x7ffff86ef010, offset=2693144576, n_start=0, n_end=1008, num=0x7fffda5a42bc, 
    m=0x7fffda5a4250) at block/qcow2-cluster.c:714
#7  0x00007ffff7e3773f in qcow2_co_writev (bs=0x7ffff86ef010, sector_num=<value optimized out>, remaining_sectors=1008, 
    qiov=0x7fffda4a4088) at block/qcow2.c:555
#8  0x00007ffff7e2293a in bdrv_co_do_writev (bs=0x7ffff86ef010, sector_num=5260048, nb_sectors=1008, qiov=0x7fffda4a4088, 
    flags=<value optimized out>) at block.c:1741
#9  0x00007ffff7e229e1 in bdrv_co_do_rw (opaque=0x7fffda4a4290) at block.c:3039
#10 0x00007ffff7e27eeb in coroutine_trampoline (i0=<value optimized out>, i1=<value optimized out>) at coroutine-ucontext.c:129
#11 0x00007ffff578a630 in ?? () from /lib64/libc.so.6
#12 0x00007fffed148530 in ?? ()
#13 0x0000000000000000 in ?? ()


verify this issue with steps and  environment as follows:
version
# uname -r
2.6.32-262.el6.x86_64
rpm -q  qemu-kvm
qemu-kvm-0.12.1.2-2.290.el6.x86_64
 
the steps as same as reproduce.

results:

qemu-kvm work well ,not abort on the point formating.so this issue has been fixed.

Comment 11 Michal Novotny 2012-05-04 13:29:43 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
No Documentation Needed

Comment 12 Paolo Bonzini 2012-05-04 13:45:55 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-No Documentation Needed+NEEDINFO

Comment 13 errata-xmlrpc 2012-06-20 11:46:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0746.html