Description of problem: Snapshot chain: SN B -> SN A -> base image QEMU aborted after booting from snapshot B. Version-Release number of selected component (if applicable): qemu-system-ppc-2.0.0-2.1.pkvm2_1_1.20.40.ppc64 How reproducible: 1/10 Steps to Reproduce: 1. Launch a VM, copy a file from host to guest, and then shutdown the guest. # /bin/qemu-kvm ... \ -device spapr-vscsi,id=spapr_vscsi0 \ -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,file=/home/xuhan/autotest/client/tests/virt/shared/data/images/rhel71-ppc64-spapr_vscsi.qcow2 \ -device scsi-hd,id=image1,drive=drive_image1 (host)# dd if=/dev/urandom of=/tmp/test.img bs=4k count=250 (host)# scp -v -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o PreferredAuthentications=password -r -P 22 /tmp/test.img root@$GUEST_IP_ADDR:/tmp/test.img 2. Create snapshot A for base image, and repeat step 1. # /bin/qemu-img create -f qcow2 -b /home/xuhan/autotest/client/tests/virt/shared/data/images/rhel71-ppc64-spapr_vscsi.qcow2 -F qcow2 /home/xuhan/autotest/client/tests/virt/shared/data/images/snA.qcow2 20G 3. Create snapshot B for snapshot A, and repeat step 1. # /bin/qemu-img create -f qcow2 -b /home/xuhan/autotest/client/tests/virt/shared/data/images/snA.qcow2 -F qcow2 /home/xuhan/autotest/client/tests/virt/shared/data/images/snB.qcow2 20G Actual results: After booting from snapshot B, QEMU aborted: *** Error in `/usr/bin/qemu-system-ppc64': free(): invalid pointer: 0x00003fffa00afdd0 *** ... /tmp/aexpect/Bamw4MGY/aexpect-2lz1LQ.sh: line 1: 63106 Aborted (core dumped) ... Expected results: No crash. Additional info: QEMU command line: /bin/qemu-kvm \ -S \ -name 'virt-tests-vm1' \ -sandbox off \ -machine pseries,accel=kvm \ -nodefaults \ -device VGA,id=video0 \ -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20150117-152008-8Js3ZmgJ,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -serial unix:'/tmp/serial-serial0-20150117-152008-8Js3ZmgJ',server,nowait \ -device pci-ohci,id=usb1,bus=pci.0,addr=03 \ -device spapr-vscsi,id=spapr_vscsi0 \ -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,file=/home/xuhan/autotest/client/tests/virt/shared/data/images/rhel71-ppc64-spapr_vscsi.qcow2 \ -device scsi-hd,id=image1,drive=drive_image1 \ -device spapr-vlan,mac=9a:ca:cb:cc:cd:ce,id=idHkSasf,netdev=ideWRoyw \ -netdev tap,id=ideWRoyw,fd=22 \ -m 8192 \ -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \ -cpu 'POWER8' \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device usb-kbd,id=input0 \ -device usb-mouse,id=input1 \ -vnc :1 \ -rtc base=utc,clock=host,driftfix=slew \ -boot order=cdn,once=c,menu=off Backtrace: #0 0x00003fffb146d7d0 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00003fffb146f7c4 in __GI_abort () at abort.c:90 #2 0x00003fffb14b3634 in __libc_message (do_abort=<optimized out>, fmt=0x3fffb15a93c0 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196 #3 0x00003fffb14beab4 in malloc_printerr (ptr=<optimized out>, str=0x3fffb15a9448 "free(): invalid pointer", action=3) at malloc.c:4937 #4 _int_free (av=0x3fffb15f04e0 <main_arena>, p=<optimized out>, have_lock=<optimized out>) at malloc.c:3789 #5 0x000000003bd0b0f4 in free_and_trace (mem=<optimized out>) at vl.c:2878 #6 0x00003fffb2781dc8 in g_free (mem=<optimized out>) at gmem.c:252 #7 0x00003fffb27a1394 in g_slice_free1 (mem_size=0, mem_block=0x3fffa00afdd0) at gslice.c:1111 #8 0x000000003bb2c91c in qemu_aio_release (p=<optimized out>) at block.c:4676 #9 0x000000003bb3c44c in qemu_laio_process_completion (s=0x1000162df90, laiocb=0x3fffa00afdd0) at block/linux-aio.c:74 #10 qemu_laio_completion_cb (e=0x1000162df98) at block/linux-aio.c:96 #11 0x000000003bb1085c in aio_dispatch (ctx=ctx@entry=0x10001498230) at aio-posix.c:144 #12 0x000000003bb10c88 in aio_poll (ctx=0x10001498230, blocking=false) at aio-posix.c:193 #13 0x000000003bb110b8 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at async.c:205 #14 0x00003fffb2779b40 in g_main_dispatch (context=0x1000149ffc0) at gmain.c:3054 #15 g_main_context_dispatch (context=0x1000149ffc0) at gmain.c:3630 #16 0x000000003bc751bc in glib_pollfds_poll () at main-loop.c:190 #17 os_host_main_loop_wait (timeout=<optimized out>) at main-loop.c:235 #18 main_loop_wait (nonblocking=<optimized out>) at main-loop.c:484 #19 0x000000003bb06120 in main_loop () at vl.c:2070 #20 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4535
Created attachment 982169 [details] qemu.log
Created attachment 982170 [details] full backtrace
------- Comment From fnovak.com 2015-01-21 13:13 EDT------- reverse mirror of RHBZ 1184325 - [PowerKVM] QEMU aborted after booting from snapshot
This is clearly a host bug in PowerKVM, so I don't think it's a blocker for the RHEL 7.1 LE release.
Xu Han, one clarification. In your steps to reproduce, where you say "repeat step 1" do you mean repeat step 1 exactly as originaly executed, or repeat step 1 with the original disk replaced with the new snapshot?
(In reply to David Gibson from comment #6) > Xu Han, one clarification. > > In your steps to reproduce, where you say "repeat step 1" do you mean repeat > step 1 exactly as originaly executed, or repeat step 1 with the original > disk replaced with the new snapshot? Uh, sorry, I forgot to clear that "repeat step 1" means booting guest with the new snapshot(snA.qcow2 for step 2, snB.qcow2 for step 3), then copy a new file from host to guest with using a new filename(snA, snB) and finally shutdown the guest.
Xu Han, Thanks for the clarificiation. I'll get this mirrored to IBM, and investigate when I can. If you have a chance to see if this can be reproduced under a RHEL host, that would be useful.
(In reply to David Gibson from comment #8) > If you have a chance to see if this can be reproduced under a RHEL host, > that would be useful. Have not reproduced this bug by testing 50 times with a RHEL host. Thanks,
Ok, probably an upstream bug in 2.0. IBM, I'll leave this one to you.
------- Comment From fnovak.com 2015-02-12 03:23 EDT------- making another comment visible from an internal look as well...... I tried this on Build35 and I was not able to reproduce the problem. BTW - I did not run qemu command line directly, I did through virsh - but basically followed same steps - boot the guest, copy file, create a snapshot, use snapshot to boot the guest, copy file - repeat with next snapshot etc.. ------------------------------------ I sent a separate email out to umber of folks on RHEV side that there is a PowerKVM update (at the time of note coming up for release, just now released 11Feb )... Above build 35 was the all but GA version.. So hopefully the PowerKVM 2.1.1.1 version fixes this as well as slew of CVEs and other bug fixes.. Still need to close on how this moves to the RHEV for Power stream.... Once this makes it over, RH please verify..
Has this been checked against the PowerKVM 2.1.1.1 update? Is this still a problem, or can this be closed out?
Xu Han, It looks like this was never a problem on the RHEL host, and comment 11 suggests it has now been fixed in IBM PowerKVM. Can we close this as CURRENTRELEASE?
Hi Frank, Currently, We can not fetch the updates because yum repo can not be accessed. Could you please provide those packages for us, otherwise, we suggest to close this bug. Thanks, Xu
Xu Han -- 2.1.1.1 was released to Red Hat.. actually expecting that RH released update..
------- Comment From fnovak.com 2015-05-28 01:07 EDT------- (In reply to comment #18) > Xu Han -- 2.1.1.1 was released to Red Hat.. > actually expecting that RH released update.. Based on https://bugzilla.redhat.com/show_bug.cgi?id=1197197 and https://rhn.redhat.com/errata/RHEA-2015-1028.html closing as fixed,,,
Frank, What's the specific qemu package version in which this is fixed in PowerKVM? I know it's PowerKVM 2.1.1.1 overall, but we've got a customer hitting this and the situation is a bit unclear. We can more easily and precisely check the exact qemu package version than the overall hypervisor version.
May I ask what's the current status here? Could we now close this issue as CURRENTRELEASE as it has been suggested earlier?
According to the earlier comments on this BZ, I think we can consider this bug to be fixed in the current version of RHEV, so closing this now as CURRENTRELEASE. If anybody disagrees, please feel free to re-open it again.