1184325 – [PowerKVM] QEMU aborted after booting from snapshot

Bug 1184325 - [PowerKVM] QEMU aborted after booting from snapshot

Summary: [PowerKVM] QEMU aborted after booting from snapshot

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	RHEV for Power
Classification:	Retired
Component:	qemu-kvm-rhev
Sub Component:
Version:	unspecified
Hardware:	ppc64
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	David Gibson
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1155673 1182933 1185286 RHEV4.0PPC RHV4.1PPC
TreeView+	depends on / blocked

Reported:	2015-01-21 05:35 UTC by Xu Han
Modified:	2016-07-25 14:18 UTC (History)
CC List:	16 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2016-06-08 08:26:35 UTC
Embargoed:

Attachments	(Terms of Use)
qemu.log (36.74 KB, text/plain) 2015-01-21 05:36 UTC, Xu Han	no flags	Details
full backtrace (24.57 KB, text/plain) 2015-01-21 05:43 UTC, Xu Han	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
IBM Linux Technology Center	120650	0	None	None	None	Never

Description Xu Han 2015-01-21 05:35:16 UTC

Description of problem:
Snapshot chain: SN B -> SN A -> base image
QEMU aborted after booting from snapshot B.

Version-Release number of selected component (if applicable):
qemu-system-ppc-2.0.0-2.1.pkvm2_1_1.20.40.ppc64

How reproducible:
1/10

Steps to Reproduce:
1. Launch a VM, copy a file from host to guest, and then shutdown the guest.
# /bin/qemu-kvm ... \
    -device spapr-vscsi,id=spapr_vscsi0 \
    -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,file=/home/xuhan/autotest/client/tests/virt/shared/data/images/rhel71-ppc64-spapr_vscsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1

(host)# dd if=/dev/urandom of=/tmp/test.img bs=4k count=250
(host)# scp -v -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o PreferredAuthentications=password -r  -P 22 /tmp/test.img root@$GUEST_IP_ADDR:/tmp/test.img

2. Create snapshot A for base image, and repeat step 1.
# /bin/qemu-img create -f qcow2 -b /home/xuhan/autotest/client/tests/virt/shared/data/images/rhel71-ppc64-spapr_vscsi.qcow2 -F qcow2 /home/xuhan/autotest/client/tests/virt/shared/data/images/snA.qcow2 20G

3. Create snapshot B for snapshot A, and repeat step 1.
# /bin/qemu-img create -f qcow2 -b /home/xuhan/autotest/client/tests/virt/shared/data/images/snA.qcow2 -F qcow2 /home/xuhan/autotest/client/tests/virt/shared/data/images/snB.qcow2 20G


Actual results:
After booting from snapshot B, QEMU aborted:
*** Error in `/usr/bin/qemu-system-ppc64': free(): invalid pointer: 0x00003fffa00afdd0 ***
...
/tmp/aexpect/Bamw4MGY/aexpect-2lz1LQ.sh: line 1: 63106 Aborted                 (core dumped) ...


Expected results:
No crash.


Additional info:
QEMU command line:
/bin/qemu-kvm \
    -S  \
    -name 'virt-tests-vm1'  \
    -sandbox off  \
    -machine pseries,accel=kvm  \
    -nodefaults  \
    -device VGA,id=video0  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20150117-152008-8Js3ZmgJ,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -serial unix:'/tmp/serial-serial0-20150117-152008-8Js3ZmgJ',server,nowait \
    -device pci-ohci,id=usb1,bus=pci.0,addr=03 \
    -device spapr-vscsi,id=spapr_vscsi0 \
    -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,file=/home/xuhan/autotest/client/tests/virt/shared/data/images/rhel71-ppc64-spapr_vscsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device spapr-vlan,mac=9a:ca:cb:cc:cd:ce,id=idHkSasf,netdev=ideWRoyw  \
    -netdev tap,id=ideWRoyw,fd=22  \
    -m 8192  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
    -cpu 'POWER8' \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -device usb-kbd,id=input0 \
    -device usb-mouse,id=input1  \
    -vnc :1  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off

Backtrace:
#0  0x00003fffb146d7d0 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00003fffb146f7c4 in __GI_abort () at abort.c:90
#2  0x00003fffb14b3634 in __libc_message (do_abort=<optimized out>, fmt=0x3fffb15a93c0 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x00003fffb14beab4 in malloc_printerr (ptr=<optimized out>, str=0x3fffb15a9448 "free(): invalid pointer", action=3) at malloc.c:4937
#4  _int_free (av=0x3fffb15f04e0 <main_arena>, p=<optimized out>, have_lock=<optimized out>) at malloc.c:3789
#5  0x000000003bd0b0f4 in free_and_trace (mem=<optimized out>) at vl.c:2878
#6  0x00003fffb2781dc8 in g_free (mem=<optimized out>) at gmem.c:252
#7  0x00003fffb27a1394 in g_slice_free1 (mem_size=0, mem_block=0x3fffa00afdd0) at gslice.c:1111
#8  0x000000003bb2c91c in qemu_aio_release (p=<optimized out>) at block.c:4676
#9  0x000000003bb3c44c in qemu_laio_process_completion (s=0x1000162df90, laiocb=0x3fffa00afdd0) at block/linux-aio.c:74
#10 qemu_laio_completion_cb (e=0x1000162df98) at block/linux-aio.c:96
#11 0x000000003bb1085c in aio_dispatch (ctx=ctx@entry=0x10001498230) at aio-posix.c:144
#12 0x000000003bb10c88 in aio_poll (ctx=0x10001498230, blocking=false) at aio-posix.c:193
#13 0x000000003bb110b8 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at async.c:205
#14 0x00003fffb2779b40 in g_main_dispatch (context=0x1000149ffc0) at gmain.c:3054
#15 g_main_context_dispatch (context=0x1000149ffc0) at gmain.c:3630
#16 0x000000003bc751bc in glib_pollfds_poll () at main-loop.c:190
#17 os_host_main_loop_wait (timeout=<optimized out>) at main-loop.c:235
#18 main_loop_wait (nonblocking=<optimized out>) at main-loop.c:484
#19 0x000000003bb06120 in main_loop () at vl.c:2070
#20 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4535

Comment 1 Xu Han 2015-01-21 05:36:00 UTC

Created attachment 982169 [details]
qemu.log

Comment 3 Xu Han 2015-01-21 05:43:03 UTC

Created attachment 982170 [details]
full backtrace

Comment 4 IBM Bug Proxy 2015-01-21 13:20:31 UTC

------- Comment From fnovak.com 2015-01-21 13:13 EDT-------
reverse mirror of RHBZ 1184325 - [PowerKVM] QEMU aborted after booting from snapshot

Comment 5 David Gibson 2015-01-22 03:42:48 UTC

This is clearly a host bug in PowerKVM, so I don't think it's a blocker for the RHEL 7.1 LE release.

Comment 6 David Gibson 2015-01-22 03:44:36 UTC

Xu Han, one clarification.

In your steps to reproduce, where you say "repeat step 1" do you mean repeat step 1 exactly as originaly executed, or repeat step 1 with the original disk replaced with the new snapshot?

Comment 7 Xu Han 2015-01-22 05:02:01 UTC

(In reply to David Gibson from comment #6)
> Xu Han, one clarification.
> 
> In your steps to reproduce, where you say "repeat step 1" do you mean repeat
> step 1 exactly as originaly executed, or repeat step 1 with the original
> disk replaced with the new snapshot?

Uh, sorry, I forgot to clear that "repeat step 1" means booting guest with the new snapshot(snA.qcow2 for step 2, snB.qcow2 for step 3), then copy a new file from host to guest with using a new filename(snA, snB) and finally shutdown the guest.

Comment 8 David Gibson 2015-01-29 04:14:11 UTC

Xu Han,

Thanks for the clarificiation.  I'll get this mirrored to IBM, and investigate when I can.

If you have a chance to see if this can be reproduced under a RHEL host, that would be useful.

Comment 9 Xu Han 2015-02-11 06:01:45 UTC

(In reply to David Gibson from comment #8)
> If you have a chance to see if this can be reproduced under a RHEL host,
> that would be useful.

Have not reproduced this bug by testing 50 times with a RHEL host.

Thanks,

Comment 10 David Gibson 2015-02-12 01:30:54 UTC

Ok, probably an upstream bug in 2.0.

IBM, I'll leave this one to you.

Comment 11 IBM Bug Proxy 2015-02-12 03:30:22 UTC

------- Comment From fnovak.com 2015-02-12 03:23 EDT-------
making another comment visible from an internal look as well......

I tried this on Build35 and I was not able to reproduce the problem.

BTW - I did not run qemu command line directly, I did through virsh - but basically
followed same steps - boot the guest, copy file, create a snapshot, use snapshot to boot the guest, copy file - repeat with next snapshot etc..
------------------------------------
I sent a separate email out to umber of folks on RHEV side that there is a PowerKVM update (at the time of note coming up for release, just now released 11Feb )...   Above build 35 was the all but GA version.. So hopefully the PowerKVM 2.1.1.1 version fixes this as well as slew of CVEs and other bug fixes..  Still need to close on how this moves to the RHEV for Power stream....

Once this makes it over, RH  please verify..

Comment 12 Frank Novak 2015-04-15 13:45:51 UTC

Has this been checked against the PowerKVM 2.1.1.1 update?  Is this still a problem, or can this be closed out?

Comment 13 David Gibson 2015-04-16 04:28:37 UTC

Xu Han,

It looks like this was never a problem on the RHEL host, and comment 11 suggests it has now been fixed in IBM PowerKVM.

Can we close this as CURRENTRELEASE?

Comment 14 Xu Han 2015-04-16 07:47:48 UTC

Hi Frank,

Currently, We can not fetch the updates because yum repo can not be accessed. Could you please provide those packages for us, otherwise, we suggest to close this bug.

Thanks,
Xu

Comment 15 Frank Novak 2015-04-16 15:04:13 UTC

Xu Han -- 2.1.1.1 was released to Red Hat..
actually expecting that RH released update..

Comment 16 IBM Bug Proxy 2015-05-28 01:10:18 UTC

------- Comment From fnovak.com 2015-05-28 01:07 EDT-------
(In reply to comment #18)
> Xu Han -- 2.1.1.1 was released to Red Hat..
> actually expecting that RH released update..

Based on https://bugzilla.redhat.com/show_bug.cgi?id=1197197
and https://rhn.redhat.com/errata/RHEA-2015-1028.html

closing as fixed,,,

Comment 21 David Gibson 2015-07-27 01:43:48 UTC

Frank,

What's the specific qemu package version in which this is fixed in PowerKVM?  I know it's PowerKVM 2.1.1.1 overall, but we've got a customer hitting this and the situation is a bit unclear.  We can more easily and precisely check the exact qemu package version than the overall hypervisor version.

Comment 24 Thomas Huth 2016-05-17 10:53:47 UTC

May I ask what's the current status here? Could we now close this issue as CURRENTRELEASE as it has been suggested earlier?

Comment 25 Thomas Huth 2016-06-08 08:26:35 UTC

According to the earlier comments on this BZ, I think we can consider this bug to be fixed in the current version of RHEV, so closing this now as CURRENTRELEASE. If anybody disagrees, please feel free to re-open it again.

Note You need to log in before you can comment on or make changes to this bug.