Bug 1238212 - Migration failed if use spice client to connect guest
Summary: Migration failed if use spice client to connect guest
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: spice
Version: rawhide
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Christophe Fergeau
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-07-01 12:15 UTC by David Jaša
Modified: 2015-07-13 19:18 UTC (History)
2 users (show)

Fixed In Version: spice-0.12.5-6.fc22
Clone Of: 1035184
Environment:
Last Closed: 2015-07-13 19:18:50 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description David Jaša 2015-07-01 12:15:30 UTC
The patch for this bug weren't included in F22:
version: spice-server-0.12.5-5.fc22.x86_64
patch: http://lists.freedesktop.org/archives/spice-devel/2014-February/016217.html

current status - patch above is not applied:

# grep -A 15 '^    state->read_filter.discard_all = FALSE;' /usr/src/debug/spice-0.12.5/server/reds.c ; echo "--" ; grep -A 8 '^                spice_warning("failed to add client to agent");' /usr/src/debug/spice-0.12.5/server/reds.c ; echo "--" ; grep -A 6 '^        spice_warning("failed to create device state for %s", char_device->subtype);' /usr/src/debug/spice-0.12.5/server/reds.c
    state->read_filter.discard_all = FALSE;
    reds->agent_state.plug_generation++;

    if (reds->agent_state.mig_data) {
        spice_assert(reds->agent_state.plug_generation == 1);
        reds_agent_state_restore(reds->agent_state.mig_data);
        free(reds->agent_state.mig_data);
        reds->agent_state.mig_data = NULL;
    } else if (!red_channel_waits_for_migrate_data(&reds->main_channel->base)) {
        /* we will assoicate the client with the char device, upon reds_on_main_agent_start,
         * in response to MSGC_AGENT_START */
        main_channel_push_agent_connected(reds->main_channel);
    } else {
       spice_debug("waiting for migration data");
        if (!spice_char_device_client_exists(reds->agent_state.base, reds_get_client())) {
            int client_added;
--
                spice_warning("failed to add client to agent");
                reds_disconnect();
            }

        }
    }
    return state->base;
}

--
        spice_warning("failed to create device state for %s", char_device->subtype);
        return -1;
    }
    return 0;
}

static void spice_server_char_device_remove_interface(SpiceBaseInstance *sin)

+++ This bug was initially created as a clone of Bug #1035184 +++

Description of problem:
Migration failed if use spice client to connect guest

Version-Release number of selected component (if applicable):
libvirt-1.1.1-13.el7.x86_64
qemu-kvm-rhev-1.5.3-19.el7.x86_64
spice-gtk3-0.20-6.el7.x86_64
spice-glib-0.20-6.el7.x86_64
spice-server-0.12.4-3.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1.Prepare two host.
2.define a guest with spice on hostA and start
3.use virt-viewer open guest console.
#virt-viewer $guest
3.pre-create storage on hostB
4.do live storage migration
# virsh migrate --live rhel6 qemu+ssh://10.66.106.25/system --verbose --copy-storage-all
root.106.25's password:
Migration: [ 99 %]error: Unable to read from monitor: Connection reset by peer

if don't open spice console,migration will succeed.

Actual results:


Expected results:


Additional info:
if use qemu-kvm pkg, also get this issue,seems spice migration issue.

--- Additional comment from zhe peng on 2013-11-27 10:13:06 CET ---



--- Additional comment from zhe peng on 2013-11-27 10:13:46 CET ---



--- Additional comment from zhe peng on 2013-11-27 10:14:19 CET ---



--- Additional comment from CongDong on 2013-11-27 10:41:49 CET ---

I got the same problem when I do migration with:

# rpm -qa libvirt qemu-kvm virt-viewer 
libvirt-1.1.1-13.el7.x86_64
qemu-kvm-1.5.3-19.el7.x86_64

I do the migration with a nfs server.

1. Prepare nfs server and a spice guest image on it.
2. mount the nfs server on host A and host B with the same path.
# mount $nfs_ip/$share_path  /var/lib/libvirt/images/NFS
2. define a guest with spice on hostA and start
3. remote-viewer spice://$ip:$spice_port
4. # virsh migrate --live newwin qemu+ssh://<ip>/system --verbose 
root.6.211's password: 
Migration: [ 99 %]error: Unable to read from monitor: Connection reset by peer

--- Additional comment from Jiri Denemark on 2013-11-27 15:33:03 CET ---

The error suggests qemu on destination crashed at the end of migration. Could you attach /var/log/libvirt/qemu/rhel6.log from the destination host?

--- Additional comment from zhe peng on 2013-11-28 03:20:36 CET ---



--- Additional comment from Jiri Denemark on 2013-11-28 10:12:10 CET ---

Thanks, the log from comment #7 shows qemu died as a result of spice error.

---

--- Additional comment from Christophe Fergeau on 2013-12-03 15:57:59 CET ---



--- Additional comment from Christophe Fergeau on 2013-12-03 16:00:55 CET ---

Bug #1037455 has a backtrace with symbols:

Program received signal SIGABRT, Aborted.
0x00007ffff30db979 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff30db979 in raise () from /lib64/libc.so.6
#1  0x00007ffff30dd088 in abort () from /lib64/libc.so.6
#2  0x00007ffff3e9051c in spice_logv (log_domain=0x7ffff3f061c6 "Spice", log_level=SPICE_LOG_LEVEL_ERROR, 
    strloc=0x7ffff3f06569 "char_device.c:937", 
    function=0x7ffff3f06c60 <__FUNCTION__.21514> "spice_char_device_state_restore", 
    format=0x7ffff3f0619e "assertion `%s' failed", args=args@entry=0x7fffffffd188) at log.c:109
#3  0x00007ffff3e90668 in spice_log (log_domain=log_domain@entry=0x7ffff3f061c6 "Spice", 
    log_level=log_level@entry=SPICE_LOG_LEVEL_ERROR, strloc=strloc@entry=0x7ffff3f06569 "char_device.c:937", 
    function=function@entry=0x7ffff3f06c60 <__FUNCTION__.21514> "spice_char_device_state_restore", 
    format=format@entry=0x7ffff3f0619e "assertion `%s' failed") at log.c:123
#4  0x00007ffff3e384da in spice_char_device_state_restore (dev=0x5555569b9db0, mig_data=0x5555569b9320)
    at char_device.c:937
#5  0x00007ffff3e7217b in reds_agent_state_restore (mig_data=<optimized out>) at reds.c:1364
#6  0x00007ffff3e75835 in attach_to_red_agent (sin=0x555556502bf8) at reds.c:3594
#7  spice_server_char_device_add_interface (sin=0x555556502bf8, s=0x555556504810) at reds.c:3698
#8  spice_server_add_interface (s=0x555556504810, sin=sin@entry=0x555556502bf8) at reds.c:3836
#9  0x0000555555749995 in qemu_spice_add_interface (sin=sin@entry=0x555556502bf8) at ui/spice-core.c:831
#10 0x0000555555740ebb in vmc_register_interface (scd=0x555556502bf0) at spice-qemu-char.c:124
#11 0x000055555578b7fe in virtio_serial_post_load_timer_cb (opaque=<optimized out>)
    at /usr/src/debug/qemu-1.5.3/hw/char/virtio-serial-bus.c:602
#12 0x0000555555728156 in qemu_run_timers (clock=0x5555564f9f30) at qemu-timer.c:394
#13 0x00005555557283f5 in qemu_run_timers (clock=<optimized out>) at qemu-timer.c:459
#14 qemu_run_all_timers () at qemu-timer.c:452
#15 0x00005555556f608e in main_loop_wait (nonblocking=<optimized out>) at main-loop.c:470
#16 0x00005555555ff3f8 in main_loop () at vl.c:1986
#17 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4343
(gdb)

--- Additional comment from Christophe Fergeau on 2014-02-03 19:26:54 CET ---

I haven't been able to reproduce the issue following the steps in comment #5 with

qemu-kvm-1.5.3-21.el7.x86_64
spice-server-0.12.4-3.el7.x86_64 on the destination

and 

qemu-kvm-0.12.1.2-2.415.el6_5.3.x86_64
spice-server-0.12.4-7.el6.1.x86_64 on the source
using a fedora 20 guest sitting on gdm prompt

--- Additional comment from Christophe Fergeau on 2014-02-04 15:03:20 CET ---

Actually I've hit that once with qemu-kvm-1.5.3-22.el7.x86_64, but now I cannot reproduce.

--- Additional comment from Christophe Fergeau on 2014-02-04 16:02:04 CET ---

OK, I've reproduced twice in a row, the missing step seems to be

(In reply to CongDong from comment #5)
> I got the same problem when I do migration with:
> 
> # rpm -qa libvirt qemu-kvm virt-viewer 
> libvirt-1.1.1-13.el7.x86_64
> qemu-kvm-1.5.3-19.el7.x86_64
> 
> I do the migration with a nfs server.
> 
> 1. Prepare nfs server and a spice guest image on it.
> 2. mount the nfs server on host A and host B with the same path.
> # mount $nfs_ip/$share_path  /var/lib/libvirt/images/NFS
> 2. define a guest with spice on hostA and start
> 3. remote-viewer spice://$ip:$spice_port
> 4. # virsh migrate --live newwin qemu+ssh://<ip>/system --verbose 
> root.6.211's password: 

5. copy&paste some data from the VM to the client during migration

--- Additional comment from David Jaša on 2014-03-03 17:52:44 CET ---



--- Additional comment from David Jaša on 2014-03-03 18:10:26 CET ---

The reprocducer for me is just:
0) on RHEL 7 hosts, configure libvirt to listen on tcp, have a rhel6 vm installed on shared storage
1) run the with domain xml as in attachment 870038 [details] (path to storage will be different for you)
2) connect with a client
3) migrate the domain:
virsh -c qemu+tcp://source_host/system migrate --live --p2p --verbose DOMAIN_NAME qemu+tcp://dest_host/system

When the client is rhel7:
  * destination spice-server spits this assertion and exits:
> ((null):22946): Spice-Info **: reds.c:3691:spice_server_char_device_add_interface: CHAR_DEVICE vdagent
> ((null):22946): Spice-Debug **: char_device.c:661:spice_char_device_state_create: sin 0x7f676565a5b8 dev_state 0x7f67657f6450
> ((null):22946): Spice-Debug **: reds.c:1359:reds_agent_state_restore: to agent filter: discard all 0, wait_msg 0, msg_filter_result 0
> ((null):22946): Spice-Debug **: reds.c:1363:reds_agent_state_restore: from agent filter: discard all 0, wait_msg 0, msg_filter_result 0
> ((null):22946): Spice-ERROR **: char_device.c:937:spice_char_device_state_restore: assertion `dev->num_clients == 1 && dev->wait_for_migrate_data' failed
  * client says this among other (the same as Zhe Peng's client):
> (virt-viewer:22860): GSpice-DEBUG: spice-session.c:1722 connecting 0x7f0921fff890...
> (virt-viewer:22860): GSpice-DEBUG: spice-session.c:1791 open host djasa-ws.spice.brq.redhat.com:5900
> 
> (virt-viewer:22860): GSpice-CRITICAL **: spice_channel_iterate_read: assertion `c->state != SPICE_CHANNEL_STATE_MIGRATING' failed
> (virt-viewer:22860): GSpice-DEBUG: spice-session.c:1708 connect ready
> (virt-viewer:22860): GSpice-DEBUG: spice-channel.c:1163 inputs-3:0: channel type 3 id 0 num common caps 1 num caps 0
> (virt-viewer:22860): GSpice-DEBUG: spice-session.c:1708 connect ready
  * VM continues running on source host


When the client is F20 (spice-gtk-0.23-1.fc20.x86_64):
  * the VM successfully migrates to the destination
  * client disconnects at the process saying this:
> (virt-viewer:6212): GSpice-DEBUG: spice-channel.c:1189 record-6:0: Peer version: 2:2
> (virt-viewer:6212): GSpice-DEBUG: spice-channel.c:1690 record-6:0: spice_channel_recv_link_msg: 2 caps
> (virt-viewer:6212): GSpice-DEBUG: spice-channel.c:1700 record-6:0: got common caps 0:0xB
> (virt-viewer:6212): GSpice-DEBUG: spice-channel.c:1706 record-6:0: got channel caps 0:0x3
> (virt-viewer:6212): GSpice-DEBUG: spice-channel.c:2612 test cap 0 in 0xB: yes
> (virt-viewer:6212): GSpice-DEBUG: spice-channel.c:2612 test cap 2 in 0xB: no
> (virt-viewer:6212): GSpice-DEBUG: spice-channel.c:2612 test cap 1 in 0xB: yes
> (virt-viewer:6212): GSpice-DEBUG: spice-channel.c:2612 test cap 3 in 0xB: yes
> (virt-viewer:6212): GSpice-DEBUG: spice-channel.c:1735 record-6:0: use mini header: 1
> (virt-viewer:6212): GSpice-DEBUG: channel-main.c:2130 migration: channel opened chan:0x230d830, left 1
> (virt-viewer:6212): GSpice-DEBUG: channel-main.c:2130 migration: channel opened chan:0x22c6020, left 0
> (virt-viewer:6212): GSpice-DEBUG: channel-main.c:2278 migration (seamless): connections all ok
> (virt-viewer:6212): GSpice-DEBUG: spice-session.c:1368 migration channels left:10 (in migration:10)
> (virt-viewer:6212): GSpice-DEBUG: channel-main.c:2421 migrate_cancel
> (virt-viewer:6212): GSpice-DEBUG: spice-session.c:1576 session: disconnecting 0
> (virt-viewer:6212): GSpice-DEBUG: spice-channel.c:2016 usbredir-9:3: channel destroy
> (virt-viewer:6212): GSpice-DEBUG: spice-channel.c:2578 usbredir-9:3: channel disconnect 0
  * destinations spice-server says:
> ((null):29850): SpiceWorker-Info **: red_worker.c:9423:display_channel_client_on_disconnect: snd_receive: Connection reset by peer
> ...
> red_peer_receive: Connection reset by peer
  * the only log on source server that doesn't look normal is:
> ((null):12442): Spice-Warning **: red_channel.c:1405:red_channel_handle_migrate_data: rcc 0x7f6308273c40 type 2 id 0: unexpected
red_channel_client_disconnect: rcc=0x7f6308273c40 (channel=0x7f630821f360 type=2 id=0)

---

--- Additional comment from Marc-Andre Lureau on 2014-03-03 18:43:22 CET ---

David, can you reproduce without using libvirt/virsh? How do you connect the client? The lower level, always the better, to find out the location of the pb. Thanks

--- Additional comment from Christophe Fergeau on 2014-03-04 08:50:03 CET ---

http://lists.freedesktop.org/archives/spice-devel/2014-February/016217.html

--- Additional comment from David Jaša on 2014-03-04 13:28:28 CET ---

(In reply to Marc-Andre Lureau from comment #18)
> David, can you reproduce without using libvirt/virsh?

I could but is it necessary given that root cause is known and patches fix it?

> How do you connect the client?

manually, with --spice-debug

> The lower level, always the better, to find out the location of the pb. Thanks

I'm looking at debug outputs, when they match what I can see using a direct qemu invocation, it shouldn't make a difference to use libvirt instead of qemu/monitor directly.
Specifically, I prefer libvirt VMs because it simplifies storage and device configuration by great deal for me and because some bugs in qemu/libvirt interaction went unnoticed in the past when all the testing was done using plain qemu.

--- Additional comment from David Jaša on 2014-03-04 13:32:54 CET ---

(In reply to Christophe Fergeau from comment #19)
> http://lists.freedesktop.org/archives/spice-devel/2014-February/016217.html

The bug doesn't occur when destination server has this fix applied.

--- ...

--- Additional comment from Tomas Jamrisko on 2014-11-06 11:41:34 CET ---

It's better. Now it works most of the time, but when it fails, the VM gets shutdown completely.

--- Additional comment from Tomas Jamrisko on 2014-11-06 11:42:09 CET ---



--- Additional comment from Christophe Fergeau on 2014-11-06 11:52:34 CET ---

Looks like a segfault on one of the sides (destination I think?): "qemu: terminating on signal 15 from pid 1432"
Libvirt log for the VM would be useful (even better, backtrace with debug symbols with gdb --pid $(pgrep -f qemu-kvm) on the destination)

--- Additional comment from Christophe Fergeau on 2014-11-06 13:30:50 CET ---

(In reply to Christophe Fergeau from comment #50)
> Looks like a segfault on one of the sides (destination I think?): "qemu:
> terminating on signal 15 from pid 1432"
> Libvirt log for the VM would be useful (even better, backtrace with debug
> symbols with gdb --pid $(pgrep -f qemu-kvm) on the destination)

Forget that, the attachments must be the libvirt logs already ,and signal 15 is SIGSTOP, not SIGSEGV (signal 11).

--- Additional comment from Uri Lublin on 2014-11-06 15:13:26 CET ---

(In reply to Tomas Jamrisko from comment #48)
> It's better. Now it works most of the time, but when it fails, the VM gets
> shutdown completely.

The original problem ("assertion `dev->num_clients == 1 && dev->wait_for_migrate_data' failed") is now fixed.
In the ( attachment 954389 [details] and attachment 954390 [details] ) log files the
assertion line does not show up on 2014-11-06.

The new problem is that when a migration fails, the VM on the source host gets  killed, right ? (Of course the VM on the destination has to be killed).
This may or may not be caused by Spice.

I did not see where the migration failed in the attached qemu-kvm log files.

Anything suspicious in the libvirtd.log files (on src/dest hosts) ?

--- Additional comment from Tomas Jamrisko on 2014-11-06 15:50:56 CET ---

Adding vdsm logs. The  migration that finally failed is at about 14:29 mark.

--- Additional comment from Tomas Jamrisko on 2014-11-07 12:05:31 CET ---

(In reply to Uri Lublin from comment #52)
> (In reply to Tomas Jamrisko from comment #48)
> > It's better. Now it works most of the time, but when it fails, the VM gets
> > shutdown completely.
> 
> The original problem ("assertion `dev->num_clients == 1 &&
> dev->wait_for_migrate_data' failed") is now fixed.
> In the ( attachment 954389 [details] and attachment 954390 [details] ) log
> files the
> assertion line does not show up on 2014-11-06.
> 
> The new problem is that when a migration fails, the VM on the source host
> gets  killed, right ? (Of course the VM on the destination has to be killed).
> This may or may not be caused by Spice.
> 
> I did not see where the migration failed in the attached qemu-kvm log files.
> 
> Anything suspicious in the libvirtd.log files (on src/dest hosts) ?

Well, looking at libvirtd logs and abrt reports. And seems you're right. There usually is a libvirtd segfault during the failed migration

--- ...

--- Additional comment from errata-xmlrpc on 2015-03-05 08:55:52 CET ---

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0335.html

Comment 1 Fedora Update System 2015-07-02 12:39:53 UTC
spice-0.12.5-6.fc22 has been submitted as an update for Fedora 22.
https://admin.fedoraproject.org/updates/spice-0.12.5-6.fc22

Comment 2 Fedora Update System 2015-07-03 18:33:55 UTC
Package spice-0.12.5-6.fc22:
* should fix your issue,
* was pushed to the Fedora 22 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing spice-0.12.5-6.fc22'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2015-11040/spice-0.12.5-6.fc22
then log in and leave karma (feedback).

Comment 3 Fedora Update System 2015-07-13 19:18:50 UTC
spice-0.12.5-6.fc22 has been pushed to the Fedora 22 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.