Bug 750682 - qemu crashes (spice) on src host after migrating the guest to dst host (guest running iozone)
Summary: qemu crashes (spice) on src host after migrating the guest to dst host (guest...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm
Version: 6.2
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Alon Levy
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 798682
TreeView+ depends on / blocked
 
Reported: 2011-11-02 02:47 UTC by Qunfang Zhang
Modified: 2013-01-10 00:29 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-08-20 15:42:17 UTC
Target Upstream Version:


Attachments (Terms of Use)
script that fails to reproduce (local migration ping pong, nfs mounted image) (3.20 KB, text/plain)
2012-03-05 12:42 UTC, Alon Levy
no flags Details

Description Qunfang Zhang 2011-11-02 02:47:56 UTC
Description of problem:
Migrating a guest which is running iozone from host A to host B, migration succeeds but qemu crashes on source host. Actually it should be in paused status after finish migration.

Version-Release number of selected component (if applicable):
install tree: RHEL6.2-20111026.2
kernel-2.6.32-214.el6.x86_64
qemu-kvm-0.12.1.2-2.207.el6.x86_64

How reproducible:
1/10

Steps to Reproduce:
1.Boot a gust in source host A:
/usr/libexec/qemu-kvm -M rhel6.2.0 -cpu cpu64-rhel6,+x2apic  -enable-kvm -m 2048 -smp 2 -name RHEL6 -uuid 821af33f-9b98-4580-bd96-1f82f96280a4 -monitor stdio -rtc base=localtime -boot c -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4 -drive file=/media/rhel6u2.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,id=hostnet0,vhost=off,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:10:20:3a,bus=pci.0,addr=0x6 -chardev socket,id=charchannel0,path=/tmp/foo,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -usb -device usb-tablet -spice port=5931,disable-ticketing -vga qxl -global qxl-vga.vram_size=67108864

2.Run iozone inside guest: #iozone -a

3.Boot guest on host B with listening mode "-incoming tcp:0:5800"

4.Ping-pong migration between host A and host B.
  
Actual results:
QEMU crashes on src host after finish migration, though the guest works well on dst host.


Expected results:
QEMU should not crash, it should be in paused status.

Additional info:

gdb logs on src host after crash:

Program received signal SIGPIPE, Broken pipe.
0x00000031b4a0e48d in write () from /lib64/libpthread.so.0
(gdb) 
(gdb) bt
#0  0x00000031b4a0e48d in write () from /lib64/libpthread.so.0
#1  0x00000031b6a16811 in outgoing_write (m=<value optimized out>, state=0x1935ed0)
    at reds.c:885
#2  marshaller_outgoing_write (m=<value optimized out>, state=0x1935ed0) at reds.c:2285
#3  0x00000031b6a1b7c8 in inputs_handle_input (opaque=0x1935ed0, size=<value optimized out>, 
    type=<value optimized out>, message=0xfe29f0) at reds.c:2345
#4  0x00000031b6a15eaf in handle_incoming (stream=0xfd8750, handler=0x1935ee0) at reds.c:825
#5  0x00000031b6a189ed in inputs_event (fd=<value optimized out>, event=1, data=0x1935ed0)
    at reds.c:2478
#6  0x000000000040c3ef in main_loop_wait (timeout=1000)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4024
#7  0x000000000042aeaa in kvm_main_loop ()
    at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2225
#8  0x000000000040de35 in main_loop (argc=<value optimized out>, argv=<value optimized out>, 
    envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4234
#9  main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6470

Comment 1 Qunfang Zhang 2011-11-02 02:59:45 UTC
Hit the crash again but the logs is not the same as bug description.
This time I do migration with unix protocol on the same host:

(1)Boot a vm with the command line in bug description.
(2)Boot the vm with listening port in the same host

 <commandLine> -unix:/tmp/qzhang

(3)Start migration:
(qemu)migrate -d unix:/tmp/qzhang

Result: qemu crash in the src side. 

(qemu) info migrate 
Migration status: active
transferred ram: 860724 kbytes
remaining ram: 322112 kbytes
total ram: 1180040 kbytes
(qemu) 
(qemu) 
(qemu) info migrate handle_dev_input: stop
spice_server_migrate_end: completed=1
spice_server_migrate_switch: 
reds_mig_switch: 

Program received signal SIGPIPE, Broken pipe.
0x00000031b4a0e48d in write () from /lib64/libpthread.so.0
(gdb) 
(gdb) 
(gdb) bt
#0  0x00000031b4a0e48d in write () from /lib64/libpthread.so.0
#1  0x00000031b6a169a3 in sync_write (stream=0x10ebd40, in_buf=<value optimized out>, n=178)
    at reds.c:1930
#2  0x00000031b6a17388 in reds_send_link_ack (opaque=0x17f60b0) at reds.c:2048
#3  reds_handle_read_link_done (opaque=0x17f60b0) at reds.c:3557
#4  0x000000000040c3ef in main_loop_wait (timeout=1000)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4024
#5  0x000000000042aeaa in kvm_main_loop ()
    at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2225
#6  0x000000000040de35 in main_loop (argc=<value optimized out>, argv=<value optimized out>, 
    envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4234
#7  main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6470


I'm not sure if it is the same issue with bug description, just update the result here first.

Comment 2 Alon Levy 2011-11-02 16:54:06 UTC
(In reply to comment #1)
> Hit the crash again but the logs is not the same as bug description.
[snip]

You say crash, I just see a SIGPIPE, and seems normal. Can you:
 1. provide the full bt (all threads) at the time of the crash
 2. elaborate on reproducing the crash - specifically how do you run the ping pong migration, is it via autotest, is there some howto that I can quickly use? (I could do it myself, I'm lazy. Also, better for me to learn how you guys are doing it).

Alon
p.s. who is Dag Wieers?

Comment 3 Qunfang Zhang 2011-11-03 07:17:32 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > Hit the crash again but the logs is not the same as bug description.
> [snip]
> 
> You say crash, I just see a SIGPIPE, and seems normal. Can you:
>  1. provide the full bt (all threads) at the time of the crash
ok, but this issue is not always reproduced. I will keep trying and after get the log again, will provide the full bt.

>  2. elaborate on reproducing the crash - specifically how do you run the ping
> pong migration, is it via autotest, is there some howto that I can quickly use?
> (I could do it myself, I'm lazy. Also, better for me to learn how you guys are
> doing it).
I do the migration manually. Migrate the guest between host A and host B.
(1) Set up a nfs server on host A, and put the guest image on the nfs server.
Host A:
# cat /etc/exports 
/home *(rw,no_root_squash)
# service nfs start

(2) Host A: #mount $nfs_server_ip:/home /media
    Host B: #mount #nfs_server_ip:/home /media

(3)Boot the guest on host A with the command line in bug description

(4)Boot the guest on host B with listening mode: "-incoming tcp:0:5800"

(5)Run iozone test inside guest
#iozone -a

(6)Migrate guest from host A to host B:
Host A:
(qemu)migrate -d tcp:$host_B_ip:5800

(7)If migration finish but do not hit the issue, then migrate the guest back from host B to guest A:
At this moment, the qemu-kvm is paused status on host A because just finish migration. So need to quit qemu-kvm on host A and boot up again with listening mode "-incoming tcp:0:5800"

(8)On host B:
(qemu)migrate -d tcp:$host_A_IP:5800

(9)Repeat step 6-8 or 1-8 if can not hit the issue.
> 
> Alon
> p.s. who is Dag Wieers?

Comment 4 Alon Levy 2012-03-01 13:14:58 UTC
There is no spice client during this procedure, correct?

Comment 5 Alon Levy 2012-03-01 15:52:20 UTC
Hi,

 I couldn't reproduce when using a local machine with almost the same command line, only difference being I setup the tap devices myself and then use script=no,downscript=no,ifname=tap0

 I tried a little more then 10 migrations and no failure.

 Not using NFS seems to be the main difference, I'll try to add that on Sunday.

 Second difference is that I had a spice client connected for most of the time.

 I think you could give me a little more information if you did the following:
 when attaching with gdb prior to the fault, please give it the commands:
  handle SIGPIPE noprint nostop pass
  c
 
 Then it should not stop at a SIGPIPE and hopefully when it crashes you will have the correct stack trace, using "thread apply all bt"

Thanks,
Alon

Comment 6 Qunfang Zhang 2012-03-02 11:48:27 UTC
Hi Alon
I used the spice client to connect to the guest during the testing. And this bug is very hard to reproduce and I only hit it twice. Sorry I am focusing other stuff these days and I will re-test it next Monday and will following your advice for more information.

Comment 7 Alon Levy 2012-03-05 12:42:06 UTC
Created attachment 567600 [details]
script that fails to reproduce (local migration ping pong, nfs mounted image)

Comment 8 Alon Levy 2012-03-05 12:42:37 UTC
Could not reproduce with nfs locally.

For reference, I was using a local script attached together with spice_migrate.py from spice-tests at:
 https://git.gitorious.org/spice-space/spice-tests.git
 (https://gitorious.org/spice-space)
 (cgit.freedesktop.org is having some problem, it contains an older copy without recent patches developed to reproduce this bz)

to launch vms with the same parameters as you gave above minus using taps, and having both guests on the same computer.

Having them on the same computer probably changed the amount of time for the migration so it could be a reason for lack of reproduction, but Qunfang Zhang mentions it is hard to reproduce anyway.

Alon

Comment 9 David Blechter 2012-03-06 12:53:02 UTC
Moving to 6.4 based on comments 6 & 8. Devel will work with QE in order to reproduce it.

Comment 11 RHEL Program Management 2012-07-10 07:00:52 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 12 RHEL Program Management 2012-07-11 02:01:24 UTC
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development.  This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.

Comment 13 Alon Levy 2012-08-13 13:00:50 UTC
(In reply to comment #6)
> Hi Alon
> I used the spice client to connect to the guest during the testing. And this
> bug is very hard to reproduce and I only hit it twice. Sorry I am focusing
> other stuff these days and I will re-test it next Monday and will following
> your advice for more information.

Hi Qunfang,

 Is it possible for you to rerun the tests and silence the SIGPIPE errors, so we can have a stacktrace of the actual crash? I cannot reproduce locally so this is the best way to continue imo.

Thanks,
Alon

Comment 14 Qunfang Zhang 2012-08-14 04:59:22 UTC
Hi, Alon
Ok, I will dive into it this week and try comment 5.

Comment 15 David Blechter 2012-08-14 12:19:27 UTC
devel_ack+ and will wait for more info from Qunfang.

Comment 17 Ademar Reis 2012-08-20 15:42:17 UTC
Since Qunfang can't reproduce it anymore and there's an autotest, I'm closing this bug.


Note You need to log in before you can comment on or make changes to this bug.