| Summary: | make live snapshot with QED disk specified qcow2 format will cause guest hang and host call trace | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Sibiao Luo <sluo> |
| Component: | qemu-kvm | Assignee: | Jeff Cody <jcody> |
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 6.5 | CC: | bsarathy, chayang, juli, juzhang, kwolf, michen, mkenneth, qzhang, rbalakri, shyu, sluo, virt-maint, xfu |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | qemu-kvm-0.12.1.2-2.425.el6 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2014-10-14 06:51:40 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Sibiao Luo
2013-09-26 05:23:41 UTC
host]# dmesg device tap0 entered promiscuous mode switch: port 2(tap0) entering forwarding state tap0: no IPv6 routers present kvm: 2524: cpu0 unhandled rdmsr: 0x345 kvm: 2524: cpu0 unhandled wrmsr: 0x680 data 0 kvm: 2524: cpu0 unhandled wrmsr: 0x6c0 data 0 kvm: 2524: cpu0 unhandled wrmsr: 0x681 data 0 kvm: 2524: cpu0 unhandled wrmsr: 0x6c1 data 0 kvm: 2524: cpu0 unhandled wrmsr: 0x682 data 0 kvm: 2524: cpu0 unhandled wrmsr: 0x6c2 data 0 kvm: 2524: cpu0 unhandled wrmsr: 0x683 data 0 kvm: 2524: cpu0 unhandled wrmsr: 0x6c3 data 0 kvm: 2524: cpu0 unhandled wrmsr: 0x684 data 0 switch: port 2(tap0) entering forwarding state __ratelimit: 57 callbacks suppressed qemu-kvm invoked oom-killer: gfp_mask=0x84d0, order=0, oom_adj=0, oom_score_adj=0 qemu-kvm cpuset=/ mems_allowed=0 Pid: 2535, comm: qemu-kvm Not tainted 2.6.32-420.el6.x86_64 #1 Call Trace: [<ffffffff810d0831>] ? cpuset_print_task_mems_allowed+0x91/0xb0 [<ffffffff81122b20>] ? dump_header+0x90/0x1b0 [<ffffffff81122fa2>] ? oom_kill_process+0x82/0x2a0 [<ffffffff81122ee1>] ? select_bad_process+0xe1/0x120 [<ffffffff811233e0>] ? out_of_memory+0x220/0x3c0 [<ffffffff8112fcfc>] ? __alloc_pages_nodemask+0x8ac/0x8d0 [<ffffffff81167c4a>] ? alloc_pages_current+0xaa/0x110 [<ffffffff8104f09b>] ? pte_alloc_one+0x1b/0x50 [<ffffffff811465d2>] ? __pte_alloc+0x32/0x160 [<ffffffff81183702>] ? do_huge_pmd_anonymous_page+0x322/0x3b0 [<ffffffff8114b510>] ? handle_mm_fault+0x2f0/0x300 [<ffffffff8104aad8>] ? __do_page_fault+0x138/0x480 [<ffffffff81289235>] ? rwsem_wake+0x75/0x170 [<ffffffff8128e898>] ? call_rwsem_wake+0x18/0x30 [<ffffffff8152d65e>] ? do_page_fault+0x3e/0xa0 [<ffffffff8152aa15>] ? page_fault+0x25/0x30 Mem-Info: Node 0 DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 CPU 2: hi: 0, btch: 1 usd: 0 CPU 3: hi: 0, btch: 1 usd: 0 CPU 4: hi: 0, btch: 1 usd: 0 CPU 5: hi: 0, btch: 1 usd: 0 CPU 6: hi: 0, btch: 1 usd: 0 CPU 7: hi: 0, btch: 1 usd: 0 Node 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 39 CPU 1: hi: 186, btch: 31 usd: 98 CPU 2: hi: 186, btch: 31 usd: 0 CPU 3: hi: 186, btch: 31 usd: 0 CPU 4: hi: 186, btch: 31 usd: 0 CPU 5: hi: 186, btch: 31 usd: 167 CPU 6: hi: 186, btch: 31 usd: 0 CPU 7: hi: 186, btch: 31 usd: 0 Node 0 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 57 CPU 1: hi: 186, btch: 31 usd: 168 CPU 2: hi: 186, btch: 31 usd: 1 CPU 3: hi: 186, btch: 31 usd: 1 CPU 4: hi: 186, btch: 31 usd: 0 CPU 5: hi: 186, btch: 31 usd: 178 CPU 6: hi: 186, btch: 31 usd: 1 CPU 7: hi: 186, btch: 31 usd: 18 active_anon:907479 inactive_anon:226796 isolated_anon:0 active_file:86 inactive_file:281 isolated_file:0 unevictable:0 dirty:0 writeback:33 unstable:0 free:25494 slab_reclaimable:2626 slab_unreclaimable:19102 mapped:6 shmem:2 pagetables:798209 bounce:0 Node 0 DMA free:15712kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15308kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 3246 8012 8012 Node 0 DMA32 free:46348kB min:27332kB low:34164kB high:40996kB active_anon:1682608kB inactive_anon:420672kB active_file:0kB inactive_file:8kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3324648kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:3720kB kernel_stack:0kB pagetables:895948kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:42 all_unreclaimable? yes lowmem_reserve[]: 0 0 4765 4765 Node 0 Normal free:39916kB min:40124kB low:50152kB high:60184kB active_anon:1947308kB inactive_anon:486512kB active_file:344kB inactive_file:1116kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:4880320kB mlocked:0kB dirty:0kB writeback:132kB mapped:24kB shmem:8kB slab_reclaimable:10504kB slab_unreclaimable:72688kB kernel_stack:1864kB pagetables:2296888kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1402 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 Node 0 DMA: 2*4kB 1*8kB 1*16kB 2*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15712kB Node 0 DMA32: 1*4kB 1*8kB 2*16kB 1*32kB 1*64kB 7*128kB 71*256kB 41*512kB 0*1024kB 1*2048kB 1*4096kB = 46348kB Node 0 Normal: 361*4kB 145*8kB 48*16kB 86*32kB 398*64kB 33*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 39916kB 2010 total pagecache pages 1705 pages in swap cache Swap cache stats: add 2045794, delete 2044089, find 965/1611 Free swap = 0kB Total swap = 8159224kB 2088959 pages RAM 84846 pages reserved 197 pages shared 1973569 pages non-shared [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name [ 582] 0 582 2923 1 3 -17 -1000 udevd [ 1756] 0 1756 2280 1 1 0 0 dhclient [ 1809] 0 1809 62274 1 0 0 0 rsyslogd [ 1861] 0 1861 2723 28 1 0 0 irqbalance [ 1875] 32 1875 4744 15 0 0 0 rpcbind [ 1985] 81 1985 7981 1 0 0 0 dbus-daemon [ 1996] 0 1996 22565 1 0 0 0 NetworkManager [ 2001] 0 2001 14518 1 2 0 0 modem-manager [ 2015] 29 2015 5837 1 0 0 0 rpc.statd [ 2046] 0 2046 47332 1 0 0 0 cupsd [ 2047] 0 2047 11242 1 6 0 0 wpa_supplicant [ 2072] 0 2072 1020 0 4 0 0 acpid [ 2081] 68 2081 9749 137 0 0 0 hald [ 2082] 0 2082 5082 1 4 0 0 hald-runner [ 2126] 0 2126 5612 2 2 0 0 hald-addon-inpu [ 2127] 68 2127 4484 2 0 0 0 hald-addon-acpi [ 2149] 0 2149 96432 31 4 0 0 automount [ 2174] 0 2174 16651 0 0 -17 -1000 sshd [ 2250] 0 2250 20318 23 4 0 0 master [ 2256] 89 2256 20338 16 0 0 0 pickup [ 2257] 89 2257 20355 1 0 0 0 qmgr [ 2274] 0 2274 27580 1 0 0 0 abrtd [ 2288] 0 2288 27052 39 2 0 0 ksmtuned [ 2297] 0 2297 29325 5 0 0 0 crond [ 2308] 0 2308 5385 0 0 0 0 atd [ 2321] 0 2321 26005 1 0 0 0 rhsmcertd [ 2334] 0 2334 15582 13 1 0 0 certmonger [ 2356] 0 2356 1016 1 2 0 0 mingetty [ 2358] 0 2358 1016 1 0 0 0 mingetty [ 2360] 0 2360 1016 1 7 0 0 mingetty [ 2362] 0 2362 1016 1 2 0 0 mingetty [ 2364] 0 2364 1016 1 6 0 0 mingetty [ 2366] 0 2366 1016 1 2 0 0 mingetty [ 2372] 0 2372 3120 1 0 -17 -1000 udevd [ 2373] 0 2373 3120 1 5 -17 -1000 udevd [ 2391] 0 2391 6910 1 0 -17 -1000 auditd [ 2416] 0 2416 25087 1 1 0 0 sshd [ 2420] 0 2420 27085 1 4 0 0 bash [ 2441] 0 2441 25663 1 1 0 0 sshd [ 2445] 0 2445 27085 1 4 0 0 bash [ 2463] 0 2463 25087 1 1 0 0 sshd [ 2467] 0 2467 27085 1 5 0 0 bash [ 2485] 0 2485 25089 1 1 0 0 sshd [ 2489] 0 2489 27085 1 0 0 0 bash [ 2524] 0 2524 408488758 1132474 4 0 0 qemu-kvm [ 2538] 0 2538 92383 17 1 0 0 remote-viewer [ 2561] 0 2561 1887 1 7 0 0 nc [ 2575] 0 2575 25089 1 5 0 0 sshd [ 2579] 0 2579 27085 1 4 0 0 bash [ 2597] 0 2597 25224 1 6 0 0 tailf [ 2636] 0 2636 25227 18 7 0 0 sleep Out of memory: Kill process 2524 (qemu-kvm) score 951 or sacrifice child Killed process 2524, UID 0, (qemu-kvm) total-vm:1633955032kB, anon-rss:4529760kB, file-rss:136kB Kill process 2533 (vhost-2524) sharing same memory switch: port 2(tap0) entering disabled state device tap0 left promiscuous mode switch: port 2(tap0) entering disabled state # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 42 Stepping: 7 CPU MHz: 1600.000 BogoMIPS: 6784.57 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 8192K NUMA node0 CPU(s): 0-7 Tried the qcow2 data disk with the same testing as comment #0 that did not meet such issue. execute 'fdisk -l' command after make live snapshot successfully and list the disk device info correctly. # qemu-img create -f qcow2 my-data-disk.qcow2 5G Formatting 'my-data-disk.qcow2', fmt=qcow2 size=5368709120 encryption=off cluster_size=65536 e.g:...-drive file=/home/my-data-disk.qcow2,if=none,id=drive-data-disk,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-scsi-pci,bus=pci.0,addr=0x7,id=scsi0 -device scsi-hd,bus=scsi0.0,drive=drive-data-disk,id=data-disk (qemu) snapshot_blkdev drive-data-disk /home/snapshot Formatting '/home/snapshot', fmt=qcow2 size=5368709120 backing_file='/home/my-data-disk.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 guest]# fdisk -l Disk /dev/vda: 10.7 GB, 10737418240 bytes 16 heads, 63 sectors/track, 20805 cylinders Units = cylinders of 1008 * 512 = 516096 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000cd5bf ... Best Regards, sluo QED's internal data structure has BDS pointers, that become incorrect once bdrv_append() (from a live snapshot) is performed. Upstream commit e023b2e2 'block: fix snapshot on QED' fixes this. Fix included in qemu-kvm-0.12.1.2-2.425.el6 Reproduce: Version of the components: qemu-kvm-rhev-0.12.1.2-2.406.el6.x86_64 Guest kernel: 3.10.0-123.el7.x86_64 Host kernel: 2.6.32-483.el6.x86_64 Steps the same with comment #0, cli as followings: # /usr/libexec/qemu-kvm -M pc -cpu SandyBridge -enable-kvm -m 8G -smp 2,sockets=2,cores=1,threads=1 -name juli -uuid 355a2475-4e03-4cdd-bf7b-5d6a59edaa61 -rtc base=localtime,clock=host,driftfix=slew -drive file=/home/RHEL-Server-7.0-z-64.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0,werror=stop,rerror=stop,cache=none,aio=native -device virtio-blk-pci,bus=pci.0,drive=drive-scsi0-0-0,id=scsi0-0-0 -device virtio-balloon-pci,id=ballooning,addr=0x6 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device e1000,netdev=hostnet0,id=virtio-net-pci0,mac=24:be:05:0c:12:11,addr=0x7,bootindex=2 -k en-us -boot menu=on -qmp tcp:0:8888,server,nowait -serial unix:/tmp/ttyS0,server,nowait -vga qxl -vnc :1 -spice port=5931,disable-ticketing -monitor stdio \ -drive file=/home/juli/my-data-disk.qed,if=none,id=drive-data-disk,format=qed,cache=none,werror=stop,rerror=stop \ -device virtio-scsi-pci,bus=pci.0,id=scsi0 \ -device scsi-hd,bus=scsi0.0,drive=drive-data-disk,id=data-disk After step 5, execute 'fdisk -l' hang there and output nothing, it cause guest and HMP monitor hang, after about 5 min QEMU will killed but host works well. Based on above test, this bz has been reproduced. ======================= Verify: Version of the components: qemu-img-rhev-0.12.1.2-2.428.el6.x86_64 Guest kernel: 3.10.0-123.el7.x86_64 Host kernel: 2.6.32-483.el6.x86_64 Steps the same with comment #0, cli as followings: # /usr/libexec/qemu-kvm -M pc -cpu SandyBridge -enable-kvm -m 8G -smp 2,sockets=2,cores=1,threads=1 -name juli -uuid 355a2475-4e03-4cdd-bf7b-5d6a59edaa61 -rtc base=localtime,clock=host,driftfix=slew -drive file=/home/RHEL-Server-7.0-z-64.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0,werror=stop,rerror=stop,cache=none,aio=native -device virtio-blk-pci,bus=pci.0,drive=drive-scsi0-0-0,id=scsi0-0-0 -device virtio-balloon-pci,id=ballooning,addr=0x6 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device e1000,netdev=hostnet0,id=virtio-net-pci0,mac=24:be:05:0c:12:11,addr=0x7,bootindex=2 -k en-us -boot menu=on -qmp tcp:0:8888,server,nowait -serial unix:/tmp/ttyS0,server,nowait -vga qxl -vnc :1 -spice port=5931,disable-ticketing -monitor stdio \ -drive file=/home/juli/my-data-disk.qed,if=none,id=drive-data-disk,format=qed,cache=none,werror=stop,rerror=stop \ -device virtio-scsi-pci,bus=pci.0,id=scsi0 \ -device scsi-hd,bus=scsi0.0,drive=drive-data-disk,id=data-disk After step 5, guest, qemu-kvm and host all works well. Wait for about 10 minutes, Checking guest and host dmesg, no error found. And execute "info status" via HMP, results as followings: (qemu) info status VM status: running Based on above testing, this bug has been verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1490.html |