Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 750682

Summary:

qemu crashes (spice) on src host after migrating the guest to dst host (guest running iozone)

Product:

Red Hat Enterprise Linux 6

Reporter:

Qunfang Zhang <qzhang>

Component:

qemu-kvm

Assignee:

Alon Levy <alevy>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Virtualization Bugs <virt-bugs>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

6.2

CC:

acathrow, alevy, areis, bcao, bsarathy, dblechte, juzhang, michen, mkenneth, shu, tburke, virt-maint, xwei

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2012-08-20 15:42:17 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

798682

Attachments:

Description	Flags
script that fails to reproduce (local migration ping pong, nfs mounted image)	none

Description Qunfang Zhang 2011-11-02 02:47:56 UTC

Description of problem:
Migrating a guest which is running iozone from host A to host B, migration succeeds but qemu crashes on source host. Actually it should be in paused status after finish migration.

Version-Release number of selected component (if applicable):
install tree: RHEL6.2-20111026.2
kernel-2.6.32-214.el6.x86_64
qemu-kvm-0.12.1.2-2.207.el6.x86_64

How reproducible:
1/10

Steps to Reproduce:
1.Boot a gust in source host A:
/usr/libexec/qemu-kvm -M rhel6.2.0 -cpu cpu64-rhel6,+x2apic  -enable-kvm -m 2048 -smp 2 -name RHEL6 -uuid 821af33f-9b98-4580-bd96-1f82f96280a4 -monitor stdio -rtc base=localtime -boot c -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4 -drive file=/media/rhel6u2.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,id=hostnet0,vhost=off,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:10:20:3a,bus=pci.0,addr=0x6 -chardev socket,id=charchannel0,path=/tmp/foo,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -usb -device usb-tablet -spice port=5931,disable-ticketing -vga qxl -global qxl-vga.vram_size=67108864

2.Run iozone inside guest: #iozone -a

3.Boot guest on host B with listening mode "-incoming tcp:0:5800"

4.Ping-pong migration between host A and host B.
  
Actual results:
QEMU crashes on src host after finish migration, though the guest works well on dst host.


Expected results:
QEMU should not crash, it should be in paused status.

Additional info:

gdb logs on src host after crash:

Program received signal SIGPIPE, Broken pipe.
0x00000031b4a0e48d in write () from /lib64/libpthread.so.0
(gdb) 
(gdb) bt
#0  0x00000031b4a0e48d in write () from /lib64/libpthread.so.0
#1  0x00000031b6a16811 in outgoing_write (m=<value optimized out>, state=0x1935ed0)
    at reds.c:885
#2  marshaller_outgoing_write (m=<value optimized out>, state=0x1935ed0) at reds.c:2285
#3  0x00000031b6a1b7c8 in inputs_handle_input (opaque=0x1935ed0, size=<value optimized out>, 
    type=<value optimized out>, message=0xfe29f0) at reds.c:2345
#4  0x00000031b6a15eaf in handle_incoming (stream=0xfd8750, handler=0x1935ee0) at reds.c:825
#5  0x00000031b6a189ed in inputs_event (fd=<value optimized out>, event=1, data=0x1935ed0)
    at reds.c:2478
#6  0x000000000040c3ef in main_loop_wait (timeout=1000)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4024
#7  0x000000000042aeaa in kvm_main_loop ()
    at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2225
#8  0x000000000040de35 in main_loop (argc=<value optimized out>, argv=<value optimized out>, 
    envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4234
#9  main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6470

Comment 1 Qunfang Zhang 2011-11-02 02:59:45 UTC

Hit the crash again but the logs is not the same as bug description.
This time I do migration with unix protocol on the same host:

(1)Boot a vm with the command line in bug description.
(2)Boot the vm with listening port in the same host

 <commandLine> -unix:/tmp/qzhang

(3)Start migration:
(qemu)migrate -d unix:/tmp/qzhang

Result: qemu crash in the src side. 

(qemu) info migrate 
Migration status: active
transferred ram: 860724 kbytes
remaining ram: 322112 kbytes
total ram: 1180040 kbytes
(qemu) 
(qemu) 
(qemu) info migrate handle_dev_input: stop
spice_server_migrate_end: completed=1
spice_server_migrate_switch: 
reds_mig_switch: 

Program received signal SIGPIPE, Broken pipe.
0x00000031b4a0e48d in write () from /lib64/libpthread.so.0
(gdb) 
(gdb) 
(gdb) bt
#0  0x00000031b4a0e48d in write () from /lib64/libpthread.so.0
#1  0x00000031b6a169a3 in sync_write (stream=0x10ebd40, in_buf=<value optimized out>, n=178)
    at reds.c:1930
#2  0x00000031b6a17388 in reds_send_link_ack (opaque=0x17f60b0) at reds.c:2048
#3  reds_handle_read_link_done (opaque=0x17f60b0) at reds.c:3557
#4  0x000000000040c3ef in main_loop_wait (timeout=1000)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4024
#5  0x000000000042aeaa in kvm_main_loop ()
    at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2225
#6  0x000000000040de35 in main_loop (argc=<value optimized out>, argv=<value optimized out>, 
    envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4234
#7  main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6470


I'm not sure if it is the same issue with bug description, just update the result here first.

Comment 2 Alon Levy 2011-11-02 16:54:06 UTC

(In reply to comment #1)
> Hit the crash again but the logs is not the same as bug description.
[snip]

You say crash, I just see a SIGPIPE, and seems normal. Can you:
 1. provide the full bt (all threads) at the time of the crash
 2. elaborate on reproducing the crash - specifically how do you run the ping pong migration, is it via autotest, is there some howto that I can quickly use? (I could do it myself, I'm lazy. Also, better for me to learn how you guys are doing it).

Alon
p.s. who is Dag Wieers?

Comment 3 Qunfang Zhang 2011-11-03 07:17:32 UTC

(In reply to comment #2)
> (In reply to comment #1)
> > Hit the crash again but the logs is not the same as bug description.
> [snip]
> 
> You say crash, I just see a SIGPIPE, and seems normal. Can you:
>  1. provide the full bt (all threads) at the time of the crash
ok, but this issue is not always reproduced. I will keep trying and after get the log again, will provide the full bt.

>  2. elaborate on reproducing the crash - specifically how do you run the ping
> pong migration, is it via autotest, is there some howto that I can quickly use?
> (I could do it myself, I'm lazy. Also, better for me to learn how you guys are
> doing it).
I do the migration manually. Migrate the guest between host A and host B.
(1) Set up a nfs server on host A, and put the guest image on the nfs server.
Host A:
# cat /etc/exports 
/home *(rw,no_root_squash)
# service nfs start

(2) Host A: #mount $nfs_server_ip:/home /media
    Host B: #mount #nfs_server_ip:/home /media

(3)Boot the guest on host A with the command line in bug description

(4)Boot the guest on host B with listening mode: "-incoming tcp:0:5800"

(5)Run iozone test inside guest
#iozone -a

(6)Migrate guest from host A to host B:
Host A:
(qemu)migrate -d tcp:$host_B_ip:5800

(7)If migration finish but do not hit the issue, then migrate the guest back from host B to guest A:
At this moment, the qemu-kvm is paused status on host A because just finish migration. So need to quit qemu-kvm on host A and boot up again with listening mode "-incoming tcp:0:5800"

(8)On host B:
(qemu)migrate -d tcp:$host_A_IP:5800

(9)Repeat step 6-8 or 1-8 if can not hit the issue.
> 
> Alon
> p.s. who is Dag Wieers?

Comment 4 Alon Levy 2012-03-01 13:14:58 UTC

There is no spice client during this procedure, correct?

Comment 5 Alon Levy 2012-03-01 15:52:20 UTC

Hi,

 I couldn't reproduce when using a local machine with almost the same command line, only difference being I setup the tap devices myself and then use script=no,downscript=no,ifname=tap0

 I tried a little more then 10 migrations and no failure.

 Not using NFS seems to be the main difference, I'll try to add that on Sunday.

 Second difference is that I had a spice client connected for most of the time.

 I think you could give me a little more information if you did the following:
 when attaching with gdb prior to the fault, please give it the commands:
  handle SIGPIPE noprint nostop pass
  c
 
 Then it should not stop at a SIGPIPE and hopefully when it crashes you will have the correct stack trace, using "thread apply all bt"

Thanks,
Alon

Comment 6 Qunfang Zhang 2012-03-02 11:48:27 UTC

Hi Alon
I used the spice client to connect to the guest during the testing. And this bug is very hard to reproduce and I only hit it twice. Sorry I am focusing other stuff these days and I will re-test it next Monday and will following your advice for more information.

Comment 7 Alon Levy 2012-03-05 12:42:06 UTC

Created attachment 567600 [details]
script that fails to reproduce (local migration ping pong, nfs mounted image)

Comment 8 Alon Levy 2012-03-05 12:42:37 UTC

Could not reproduce with nfs locally.

For reference, I was using a local script attached together with spice_migrate.py from spice-tests at:
 https://git.gitorious.org/spice-space/spice-tests.git
 (https://gitorious.org/spice-space)
 (cgit.freedesktop.org is having some problem, it contains an older copy without recent patches developed to reproduce this bz)

to launch vms with the same parameters as you gave above minus using taps, and having both guests on the same computer.

Having them on the same computer probably changed the amount of time for the migration so it could be a reason for lack of reproduction, but Qunfang Zhang mentions it is hard to reproduce anyway.

Alon

Comment 9 David Blechter 2012-03-06 12:53:02 UTC

Moving to 6.4 based on comments 6 & 8. Devel will work with QE in order to reproduce it.

Comment 11 RHEL Program Management 2012-07-10 07:00:52 UTC

This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 12 RHEL Program Management 2012-07-11 02:01:24 UTC

This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development.  This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.

Comment 13 Alon Levy 2012-08-13 13:00:50 UTC

(In reply to comment #6)
> Hi Alon
> I used the spice client to connect to the guest during the testing. And this
> bug is very hard to reproduce and I only hit it twice. Sorry I am focusing
> other stuff these days and I will re-test it next Monday and will following
> your advice for more information.

Hi Qunfang,

 Is it possible for you to rerun the tests and silence the SIGPIPE errors, so we can have a stacktrace of the actual crash? I cannot reproduce locally so this is the best way to continue imo.

Thanks,
Alon

Comment 14 Qunfang Zhang 2012-08-14 04:59:22 UTC

Hi, Alon
Ok, I will dive into it this week and try comment 5.

Comment 15 David Blechter 2012-08-14 12:19:27 UTC

devel_ack+ and will wait for more info from Qunfang.

Comment 17 Ademar Reis 2012-08-20 15:42:17 UTC

Since Qunfang can't reproduce it anymore and there's an autotest, I'm closing this bug.