Bug 1012244

Summary: make live snapshot with QED disk specified qcow2 format will cause guest hang and host call trace
Product: Red Hat Enterprise Linux 6 Reporter: Sibiao Luo <sluo>
Component: qemu-kvmAssignee: Jeff Cody <jcody>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.5CC: bsarathy, chayang, juli, juzhang, kwolf, michen, mkenneth, qzhang, rbalakri, shyu, sluo, virt-maint, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-0.12.1.2-2.425.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-14 06:51:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Sibiao Luo 2013-09-26 05:23:41 UTC
Description of problem:
create a qed data disk and boot a guest attach this data disk via scsi-hd interface, then do live snapshot speicified qcow2 format, and run 'fdisk -l' in guest will cause guest and HMP monitor hang, after a while QEMU will killed and host call trace.

Version-Release number of selected component (if applicable):
host info:
# uname -r && rpm -q qemu-kvm-rhev && rpm -q seabios
2.6.32-420.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.406.el6.x86_64
seabios-0.6.1.2-28.el6.x86_64
guest info:
2.6.32-420.el6.x86_64

How reproducible:
4/4

Steps to Reproduce:
1.create a qed data disk.
# qemu-img create -f qed my-data-disk.qed 5G
Formatting 'my-data-disk.qed', fmt=qed size=5368709120 cluster_size=0 table_size=0
2.boot a guest attach this data disk via scsi-hd interface.
e.g:...-drive file=/home/my-data-disk.qed,if=none,id=drive-data-disk,format=qed,cache=none,werror=stop,rerror=stop -device virtio-scsi-pci,bus=pci.0,addr=0x7,id=scsi0 -device scsi-hd,bus=scsi0.0,drive=drive-data-disk,id=data-disk
3.run 'fdisk -l' in guest.
guest]# fdisk -l
4.do live snapshot speicified qcow2 format.
(qemu) info status 
VM status: running
(qemu) info block
drive-virtio-disk: removable=0 io-status=ok file=/home/RHEL6.5_20130924.2_x86_64.qed ro=0 drv=qed encrypted=0 bps=0 bps_rd=0 bps_wr=0 iops=0 iops_rd=0 iops_wr=0
drive-data-disk: removable=0 io-status=ok file=/home/my-data-disk.qed ro=0 drv=qed encrypted=0 bps=0 bps_rd=0 bps_wr=0 iops=0 iops_rd=0 iops_wr=0
ide1-cd0: removable=1 locked=0 tray-open=0 io-status=ok [not inserted]
floppy0: removable=1 locked=0 tray-open=0 [not inserted]
sd0: removable=1 locked=0 tray-open=0 [not inserted]
(qemu) snapshot_blkdev drive-data-disk /home/snapshot-file qcow2
Formatting '/home/snapshot-file', fmt=qcow2 size=5368709120 backing_file='/home/my-data-disk.qed' backing_fmt='qed' encryption=off cluster_size=65536
5.run 'fdisk -l' in guest.
guest]# fdisk -l

Actual results:
after step 3, execute 'fdisk -l' command successfully and list the disk device info correctly.
after step 5, execute 'fdisk -l' hang there and output nothing, it cause guest and HMP monitor hang, after about 3 min QEMU will killed and host call trace.

Expected results:
it should execute 'fdisk -l' command successfully and list the disk device info correctly.

Additional info:
# /usr/libexec/qemu-kvm -M pc -S -cpu host -enable-kvm -m 2048 -smp 2,sockets=2,cores=1,threads=1 -no-kvm-pit-reinjection -usb -device usb-tablet,id=input0 -name sluo -uuid f22105cf-c6a4-4e35-95d9-db1b2748f26a -rtc base=localtime,clock=host,driftfix=slew -device virtio-serial-pci,id=virtio-serial0,max_ports=16,vectors=0,bus=pci.0,addr=0x3 -chardev socket,id=channel1,path=/tmp/helloworld1,server,nowait -device virtserialport,chardev=channel1,name=com.redhat.rhevm.vdsm,bus=virtio-serial0.0,id=port1 -chardev socket,id=channel2,path=/tmp/helloworld2,server,nowait -device virtserialport,chardev=channel2,name=com.redhat.rhevm.vdsm,bus=virtio-serial0.0,id=port2 -drive file=/home/RHEL6.5_20130924.2_x86_64.qed,if=none,id=drive-virtio-disk,format=qed,cache=none,aio=native,werror=stop,rerror=stop -device virtio-blk-pci,vectors=0,bus=pci.0,addr=0x4,scsi=off,drive=drive-virtio-disk,id=virtio-disk,bootindex=1 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=10:A1:1C:18:AB:11,bus=pci.0,addr=0x5 -device virtio-balloon-pci,id=ballooning,bus=pci.0,addr=0x6 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -drive file=/home/my-data-disk.qed,if=none,id=drive-data-disk,format=qed,cache=none,werror=stop,rerror=stop -device virtio-scsi-pci,bus=pci.0,addr=0x7,id=scsi0 -device scsi-hd,bus=scsi0.0,drive=drive-data-disk,id=data-disk -k en-us -boot menu=on -qmp tcp:0:4444,server,nowait -serial unix:/tmp/ttyS0,server,nowait -vnc :1 -spice disable-ticketing,port=5931 -monitor stdio

Comment 1 Sibiao Luo 2013-09-26 05:24:47 UTC
host]# dmesg 
device tap0 entered promiscuous mode
switch: port 2(tap0) entering forwarding state
tap0: no IPv6 routers present
kvm: 2524: cpu0 unhandled rdmsr: 0x345
kvm: 2524: cpu0 unhandled wrmsr: 0x680 data 0
kvm: 2524: cpu0 unhandled wrmsr: 0x6c0 data 0
kvm: 2524: cpu0 unhandled wrmsr: 0x681 data 0
kvm: 2524: cpu0 unhandled wrmsr: 0x6c1 data 0
kvm: 2524: cpu0 unhandled wrmsr: 0x682 data 0
kvm: 2524: cpu0 unhandled wrmsr: 0x6c2 data 0
kvm: 2524: cpu0 unhandled wrmsr: 0x683 data 0
kvm: 2524: cpu0 unhandled wrmsr: 0x6c3 data 0
kvm: 2524: cpu0 unhandled wrmsr: 0x684 data 0
switch: port 2(tap0) entering forwarding state
__ratelimit: 57 callbacks suppressed
qemu-kvm invoked oom-killer: gfp_mask=0x84d0, order=0, oom_adj=0, oom_score_adj=0
qemu-kvm cpuset=/ mems_allowed=0
Pid: 2535, comm: qemu-kvm Not tainted 2.6.32-420.el6.x86_64 #1
Call Trace:
 [<ffffffff810d0831>] ? cpuset_print_task_mems_allowed+0x91/0xb0
 [<ffffffff81122b20>] ? dump_header+0x90/0x1b0
 [<ffffffff81122fa2>] ? oom_kill_process+0x82/0x2a0
 [<ffffffff81122ee1>] ? select_bad_process+0xe1/0x120
 [<ffffffff811233e0>] ? out_of_memory+0x220/0x3c0
 [<ffffffff8112fcfc>] ? __alloc_pages_nodemask+0x8ac/0x8d0
 [<ffffffff81167c4a>] ? alloc_pages_current+0xaa/0x110
 [<ffffffff8104f09b>] ? pte_alloc_one+0x1b/0x50
 [<ffffffff811465d2>] ? __pte_alloc+0x32/0x160
 [<ffffffff81183702>] ? do_huge_pmd_anonymous_page+0x322/0x3b0
 [<ffffffff8114b510>] ? handle_mm_fault+0x2f0/0x300
 [<ffffffff8104aad8>] ? __do_page_fault+0x138/0x480
 [<ffffffff81289235>] ? rwsem_wake+0x75/0x170
 [<ffffffff8128e898>] ? call_rwsem_wake+0x18/0x30
 [<ffffffff8152d65e>] ? do_page_fault+0x3e/0xa0
 [<ffffffff8152aa15>] ? page_fault+0x25/0x30
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
CPU    2: hi:    0, btch:   1 usd:   0
CPU    3: hi:    0, btch:   1 usd:   0
CPU    4: hi:    0, btch:   1 usd:   0
CPU    5: hi:    0, btch:   1 usd:   0
CPU    6: hi:    0, btch:   1 usd:   0
CPU    7: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd:  39
CPU    1: hi:  186, btch:  31 usd:  98
CPU    2: hi:  186, btch:  31 usd:   0
CPU    3: hi:  186, btch:  31 usd:   0
CPU    4: hi:  186, btch:  31 usd:   0
CPU    5: hi:  186, btch:  31 usd: 167
CPU    6: hi:  186, btch:  31 usd:   0
CPU    7: hi:  186, btch:  31 usd:   0
Node 0 Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd:  57
CPU    1: hi:  186, btch:  31 usd: 168
CPU    2: hi:  186, btch:  31 usd:   1
CPU    3: hi:  186, btch:  31 usd:   1
CPU    4: hi:  186, btch:  31 usd:   0
CPU    5: hi:  186, btch:  31 usd: 178
CPU    6: hi:  186, btch:  31 usd:   1
CPU    7: hi:  186, btch:  31 usd:  18
active_anon:907479 inactive_anon:226796 isolated_anon:0
 active_file:86 inactive_file:281 isolated_file:0
 unevictable:0 dirty:0 writeback:33 unstable:0
 free:25494 slab_reclaimable:2626 slab_unreclaimable:19102
 mapped:6 shmem:2 pagetables:798209 bounce:0
Node 0 DMA free:15712kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15308kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 3246 8012 8012
Node 0 DMA32 free:46348kB min:27332kB low:34164kB high:40996kB active_anon:1682608kB inactive_anon:420672kB active_file:0kB inactive_file:8kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3324648kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:3720kB kernel_stack:0kB pagetables:895948kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:42 all_unreclaimable? yes
lowmem_reserve[]: 0 0 4765 4765
Node 0 Normal free:39916kB min:40124kB low:50152kB high:60184kB active_anon:1947308kB inactive_anon:486512kB active_file:344kB inactive_file:1116kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:4880320kB mlocked:0kB dirty:0kB writeback:132kB mapped:24kB shmem:8kB slab_reclaimable:10504kB slab_unreclaimable:72688kB kernel_stack:1864kB pagetables:2296888kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1402 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 2*4kB 1*8kB 1*16kB 2*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15712kB
Node 0 DMA32: 1*4kB 1*8kB 2*16kB 1*32kB 1*64kB 7*128kB 71*256kB 41*512kB 0*1024kB 1*2048kB 1*4096kB = 46348kB
Node 0 Normal: 361*4kB 145*8kB 48*16kB 86*32kB 398*64kB 33*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 39916kB
2010 total pagecache pages
1705 pages in swap cache
Swap cache stats: add 2045794, delete 2044089, find 965/1611
Free swap  = 0kB
Total swap = 8159224kB
2088959 pages RAM
84846 pages reserved
197 pages shared
1973569 pages non-shared
[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
[  582]     0   582     2923        1   3     -17         -1000 udevd
[ 1756]     0  1756     2280        1   1       0             0 dhclient
[ 1809]     0  1809    62274        1   0       0             0 rsyslogd
[ 1861]     0  1861     2723       28   1       0             0 irqbalance
[ 1875]    32  1875     4744       15   0       0             0 rpcbind
[ 1985]    81  1985     7981        1   0       0             0 dbus-daemon
[ 1996]     0  1996    22565        1   0       0             0 NetworkManager
[ 2001]     0  2001    14518        1   2       0             0 modem-manager
[ 2015]    29  2015     5837        1   0       0             0 rpc.statd
[ 2046]     0  2046    47332        1   0       0             0 cupsd
[ 2047]     0  2047    11242        1   6       0             0 wpa_supplicant
[ 2072]     0  2072     1020        0   4       0             0 acpid
[ 2081]    68  2081     9749      137   0       0             0 hald
[ 2082]     0  2082     5082        1   4       0             0 hald-runner
[ 2126]     0  2126     5612        2   2       0             0 hald-addon-inpu
[ 2127]    68  2127     4484        2   0       0             0 hald-addon-acpi
[ 2149]     0  2149    96432       31   4       0             0 automount
[ 2174]     0  2174    16651        0   0     -17         -1000 sshd
[ 2250]     0  2250    20318       23   4       0             0 master
[ 2256]    89  2256    20338       16   0       0             0 pickup
[ 2257]    89  2257    20355        1   0       0             0 qmgr
[ 2274]     0  2274    27580        1   0       0             0 abrtd
[ 2288]     0  2288    27052       39   2       0             0 ksmtuned
[ 2297]     0  2297    29325        5   0       0             0 crond
[ 2308]     0  2308     5385        0   0       0             0 atd
[ 2321]     0  2321    26005        1   0       0             0 rhsmcertd
[ 2334]     0  2334    15582       13   1       0             0 certmonger
[ 2356]     0  2356     1016        1   2       0             0 mingetty
[ 2358]     0  2358     1016        1   0       0             0 mingetty
[ 2360]     0  2360     1016        1   7       0             0 mingetty
[ 2362]     0  2362     1016        1   2       0             0 mingetty
[ 2364]     0  2364     1016        1   6       0             0 mingetty
[ 2366]     0  2366     1016        1   2       0             0 mingetty
[ 2372]     0  2372     3120        1   0     -17         -1000 udevd
[ 2373]     0  2373     3120        1   5     -17         -1000 udevd
[ 2391]     0  2391     6910        1   0     -17         -1000 auditd
[ 2416]     0  2416    25087        1   1       0             0 sshd
[ 2420]     0  2420    27085        1   4       0             0 bash
[ 2441]     0  2441    25663        1   1       0             0 sshd
[ 2445]     0  2445    27085        1   4       0             0 bash
[ 2463]     0  2463    25087        1   1       0             0 sshd
[ 2467]     0  2467    27085        1   5       0             0 bash
[ 2485]     0  2485    25089        1   1       0             0 sshd
[ 2489]     0  2489    27085        1   0       0             0 bash
[ 2524]     0  2524 408488758  1132474   4       0             0 qemu-kvm
[ 2538]     0  2538    92383       17   1       0             0 remote-viewer
[ 2561]     0  2561     1887        1   7       0             0 nc
[ 2575]     0  2575    25089        1   5       0             0 sshd
[ 2579]     0  2579    27085        1   4       0             0 bash
[ 2597]     0  2597    25224        1   6       0             0 tailf
[ 2636]     0  2636    25227       18   7       0             0 sleep
Out of memory: Kill process 2524 (qemu-kvm) score 951 or sacrifice child
Killed process 2524, UID 0, (qemu-kvm) total-vm:1633955032kB, anon-rss:4529760kB, file-rss:136kB
Kill process 2533 (vhost-2524) sharing same memory
switch: port 2(tap0) entering disabled state
device tap0 left promiscuous mode
switch: port 2(tap0) entering disabled state

Comment 2 Sibiao Luo 2013-09-26 05:25:35 UTC
# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 42
Stepping:              7
CPU MHz:               1600.000
BogoMIPS:              6784.57
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K
NUMA node0 CPU(s):     0-7

Comment 3 Sibiao Luo 2013-09-26 05:35:28 UTC
Tried the qcow2 data disk with the same testing as comment #0 that did not meet such issue. execute 'fdisk -l' command after make live snapshot successfully and list the disk device info correctly.

# qemu-img create -f qcow2 my-data-disk.qcow2 5G
Formatting 'my-data-disk.qcow2', fmt=qcow2 size=5368709120 encryption=off cluster_size=65536 

e.g:...-drive file=/home/my-data-disk.qcow2,if=none,id=drive-data-disk,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-scsi-pci,bus=pci.0,addr=0x7,id=scsi0 -device scsi-hd,bus=scsi0.0,drive=drive-data-disk,id=data-disk
(qemu) snapshot_blkdev drive-data-disk /home/snapshot 
Formatting '/home/snapshot', fmt=qcow2 size=5368709120 backing_file='/home/my-data-disk.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 

guest]# fdisk -l

Disk /dev/vda: 10.7 GB, 10737418240 bytes
16 heads, 63 sectors/track, 20805 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000cd5bf
...

Best Regards,
sluo

Comment 7 Jeff Cody 2014-04-14 21:24:23 UTC
QED's internal data structure has BDS pointers, that become incorrect once bdrv_append() (from a live snapshot) is performed.

Upstream commit e023b2e2 'block: fix snapshot on QED' fixes this.

Comment 9 Miroslav Rezanina 2014-04-29 06:02:28 UTC
Fix included in qemu-kvm-0.12.1.2-2.425.el6

Comment 11 Jun Li 2014-06-19 05:58:12 UTC
Reproduce:
Version of the components:
qemu-kvm-rhev-0.12.1.2-2.406.el6.x86_64
Guest kernel:
3.10.0-123.el7.x86_64
Host kernel:
2.6.32-483.el6.x86_64

Steps the same with comment #0, cli as followings:
# /usr/libexec/qemu-kvm -M pc -cpu SandyBridge -enable-kvm -m 8G -smp 2,sockets=2,cores=1,threads=1 -name juli -uuid 355a2475-4e03-4cdd-bf7b-5d6a59edaa61 -rtc base=localtime,clock=host,driftfix=slew -drive file=/home/RHEL-Server-7.0-z-64.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0,werror=stop,rerror=stop,cache=none,aio=native -device virtio-blk-pci,bus=pci.0,drive=drive-scsi0-0-0,id=scsi0-0-0 -device virtio-balloon-pci,id=ballooning,addr=0x6 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device e1000,netdev=hostnet0,id=virtio-net-pci0,mac=24:be:05:0c:12:11,addr=0x7,bootindex=2 -k en-us -boot menu=on -qmp tcp:0:8888,server,nowait -serial unix:/tmp/ttyS0,server,nowait -vga qxl -vnc :1 -spice port=5931,disable-ticketing  -monitor stdio \
-drive file=/home/juli/my-data-disk.qed,if=none,id=drive-data-disk,format=qed,cache=none,werror=stop,rerror=stop \
-device virtio-scsi-pci,bus=pci.0,id=scsi0 \
-device scsi-hd,bus=scsi0.0,drive=drive-data-disk,id=data-disk

After step 5, execute 'fdisk -l' hang there and output nothing, it cause guest and HMP monitor hang, after about 5 min QEMU will killed but host works well.

Based on above test, this bz has been reproduced.

=======================

Verify:
Version of the components:
qemu-img-rhev-0.12.1.2-2.428.el6.x86_64
Guest kernel:
3.10.0-123.el7.x86_64
Host kernel:
2.6.32-483.el6.x86_64

Steps the same with comment #0, cli as followings:
# /usr/libexec/qemu-kvm -M pc -cpu SandyBridge -enable-kvm -m 8G -smp 2,sockets=2,cores=1,threads=1 -name juli -uuid 355a2475-4e03-4cdd-bf7b-5d6a59edaa61 -rtc base=localtime,clock=host,driftfix=slew -drive file=/home/RHEL-Server-7.0-z-64.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0,werror=stop,rerror=stop,cache=none,aio=native -device virtio-blk-pci,bus=pci.0,drive=drive-scsi0-0-0,id=scsi0-0-0 -device virtio-balloon-pci,id=ballooning,addr=0x6 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device e1000,netdev=hostnet0,id=virtio-net-pci0,mac=24:be:05:0c:12:11,addr=0x7,bootindex=2 -k en-us -boot menu=on -qmp tcp:0:8888,server,nowait -serial unix:/tmp/ttyS0,server,nowait -vga qxl -vnc :1 -spice port=5931,disable-ticketing  -monitor stdio \
-drive file=/home/juli/my-data-disk.qed,if=none,id=drive-data-disk,format=qed,cache=none,werror=stop,rerror=stop \
-device virtio-scsi-pci,bus=pci.0,id=scsi0 \
-device scsi-hd,bus=scsi0.0,drive=drive-data-disk,id=data-disk

After step 5, guest, qemu-kvm and host all works well. Wait for about 10 minutes, Checking guest and host dmesg, no error found. And execute "info status" via HMP, results as followings:
(qemu) info status 
VM status: running

Based on above testing, this bug has been verified.

Comment 13 errata-xmlrpc 2014-10-14 06:51:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1490.html