Bug 731759

Summary: SPICE: migration fails with warning: error while loading state section id 4
Product: Red Hat Enterprise Linux 6 Reporter: Lubos Kocman <lkocman>
Component: qemu-kvmAssignee: David Blechter <dblechte>
Status: CLOSED ERRATA QA Contact: Desktop QE <desktop-qa-list>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.2CC: cmeadors, juzhang, kraxel, lyarwood, mkenneth, mkrcmari, quintela, shuang, tburke, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 15:56:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lubos Kocman 2011-08-18 14:53:15 UTC
Description of problem:

In one of 5 live migrations while spice is in use guest exits with messages:

source node:
Guest moved used index from 4741 to 53262011-08-18 16:33:08.986: shutting down

dest node:
warning: error while loading state section id 4

Version-Release number of selected component (if applicable):

libvirt-0.9.4-0rc1.2.el6.x86_64
qemu-kvm-0.12.1.2-2.182.el6.x86_64
spice-server-0.8.2-3.el6.x86_64
vdsm-4.9-91.el6.x86_64
rhevm-backend-3.0.0_0001-22.el6.x86_64 (ic136)

guest was win7x64 with ic136 tools
spicec.exe *0.18 with win7x64 client

How reproducible:

repeat 5-10 times
Steps to Reproduce:
1. start live migration of a guest (called from rhevmanager)
2. continuously move mouse while dragging some object
3.
  
Actual results:

in 1 of 5 migrations guest exits with message (client closes process seems to exit)

Guest moved used index from 4741 to 53262011-08-18 16:33:08.986: shutting down



Expected results:

no shutting down or crash should appear


Additional info:


hyper04 (10.34.58.5)
2011-08-18 16:30:56.233: starting up
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -S -M rhel6.0.0 -cpu Nehalem -enable-kvm -m 1024 -smp 2,sockets=2,cores=1,threads=1 -name win7x64-lk -uuid 51dec7f4-9491-425c-abc2-69a746b17914 -smbios type=1,manufacturer=Red Hat,product=RHEL,version=6Server-6.1.0.2.el6_1,serial=33313934-3432-5A43-3230-323437523147_78:E7:D1:E0:29:3A,uuid=51dec7f4-9491-425c-abc2-69a746b17914 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/win7x64-lk.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2011-08-18T15:30:22 -no-shutdown -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5 -drive file=/rhev/data-center/6d6bee26-b863-11e0-bef1-0016365acdc4/b8fd6843-d687-4a67-aa70-611a6f146117/images/87863edb-6ce2-4f7c-be61-33898d2ef989/5eda97f0-1c9e-492e-93fc-3973a13bb555,if=none,id=drive-virtio-disk0,format=qcow2,serial=7c-be61-33898d2ef989,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=31,id=hostnet0,vhost=on,vhostfd=32 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:22:3a:2b,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/win7x64-lk.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -usb -spice port=5900,tls-port=5901,addr=0,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=inputs -k en-us -vga qxl -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -incoming tcp:0.0.0.0:49157
do_spice_init: starting 0.8.2
spice_server_add_interface: SPICE_INTERFACE_KEYBOARD
spice_server_add_interface: SPICE_INTERFACE_MOUSE
spice_server_add_interface: SPICE_INTERFACE_QXL
red_worker_main: begin
spice_server_add_interface: SPICE_INTERFACE_PLAYBACK
spice_server_add_interface: SPICE_INTERFACE_RECORD
qemu: warning: error while loading state section id 4
load of migration failed
2011-08-18 16:31:20.146: shutting down

hyper03 (10.34.58.4)
2011-08-18 16:32:05.259: starting up
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -S -M rhel6.0.0 -cpu Nehalem -enable-kvm -m 1024 -smp 2,sockets=2,cores=1,threads=1 -name win7x64-lk -uuid 51dec7f4-9491-425c-abc2-69a746b17914 -smbios type=1,manufacturer=Red Hat,product=RHEL,version=6Server-6.1.0.2.el6_1,serial=33313934-3432-5A43-3230-323437523147_78:E7:D1:E0:29:3A,uuid=51dec7f4-9491-425c-abc2-69a746b17914 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/win7x64-lk.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2011-08-18T15:31:31 -no-shutdown -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5 -drive file=/rhev/data-center/6d6bee26-b863-11e0-bef1-0016365acdc4/b8fd6843-d687-4a67-aa70-611a6f146117/images/87863edb-6ce2-4f7c-be61-33898d2ef989/5eda97f0-1c9e-492e-93fc-3973a13bb555,if=none,id=drive-virtio-disk0,format=qcow2,serial=7c-be61-33898d2ef989,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:22:3a:2b,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/win7x64-lk.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -usb -spice port=5902,tls-port=5903,addr=0,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=inputs -k en-us -vga qxl -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -incoming tcp:0.0.0.0:49210
do_spice_init: starting 0.8.2
spice_server_add_interface: SPICE_INTERFACE_KEYBOARD
spice_server_add_interface: SPICE_INTERFACE_MOUSE
spice_server_add_interface: SPICE_INTERFACE_QXL
red_worker_main: begin
spice_server_add_interface: SPICE_INTERFACE_PLAYBACK
spice_server_add_interface: SPICE_INTERFACE_RECORD
spice_server_char_device_add_interface: CHAR_DEVICE vdagent
handle_dev_input: start
reds_handle_read_link_done: Peer doesn't support AUTH selection
reds_handle_main_link:
reds_show_new_channel: channel 1:0, connected successfully, over Secure link
reds_main_handle_message: agent start
handle_dev_input: mouse mode 2
reds_main_handle_message: net test: latency 17.083000 ms, bitrate 43574468085 bps (41555.851064 Mbps)
reds_handle_read_link_done: Peer doesn't support AUTH selection
reds_handle_read_link_done: Peer doesn't support AUTH selection
reds_show_new_channel: channel 6:0, connected successfully, over Non Secure link
reds_handle_read_link_done: Peer doesn't support AUTH selection
reds_show_new_channel: channel 5:0, connected successfully, over Non Secure link
reds_handle_read_link_done: Peer doesn't support AUTH selection
reds_show_new_channel: channel 4:0, connected successfully, over Non Secure link
red_dispatcher_set_cursor_peer:
handle_dev_input: cursor connect
reds_show_new_channel: channel 3:0, connected successfully, over Secure link
inputs_link:
reds_handle_read_link_done: Peer doesn't support AUTH selection
reds_show_new_channel: channel 2:0, connected successfully, over Non Secure link
red_dispatcher_set_peer:
handle_dev_input: connect
handle_new_display_channel: jpeg disabled
handle_new_display_channel: zlib-over-glz disabled
handle_dev_input: oom current 53 pipe 0
handle_dev_input: oom current 46 pipe 5
handle_dev_input: oom current 29 pipe 3
Guest moved used index from 4741 to 53262011-08-18 16:33:08.986: shutting down


vdsm:

Thread-142097::DEBUG::2011-08-18 16:32:41,674::clientIF::54::vds::(wrapper) [10.34.58.24]::call migrate with ({'src': '10.34.58.4', 'dst': '10.34.58.5:54321', 'vmId': '51dec7f4-9491-425c-abc2-69a746b17914', 'method': 'online'},) {}
Thread-142097::DEBUG::2011-08-18 16:32:41,674::clientIF::343::vds::(migrate) {'src': '10.34.58.4', 'dst': '10.34.58.5:54321', 'vmId': '51dec7f4-9491-425c-abc2-69a746b17914', 'method': 'online'}
Thread-142098::DEBUG::2011-08-18 16:32:41,682::vm::122::vm.Vm::(_setupVdsConnection) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::Destination server is: https://10.34.58.5:54321
Thread-142098::DEBUG::2011-08-18 16:32:41,682::vm::124::vm.Vm::(_setupVdsConnection) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::Initiating connection with destination
Thread-142098::DEBUG::2011-08-18 16:32:41,734::libvirtvm::111::vm.Vm::(_getCpuStats) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::CPU stats not available
Thread-142098::DEBUG::2011-08-18 16:32:41,734::libvirtvm::162::vm.Vm::(_getDiskStats) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::Disk stats not available
Thread-142098::DEBUG::2011-08-18 16:32:41,736::vm::170::vm.Vm::(_prepareGuest) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::migration Process begins
Thread-142098::DEBUG::2011-08-18 16:32:41,740::vm::216::vm.Vm::(run) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::migration semaphore acquired
Thread-142098::DEBUG::2011-08-18 16:32:45,086::libvirtvm::293::vm.Vm::(_startUnderlyingMigration) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::starting migration to qemu+tls://10.34.58.5/system
Thread-142101::DEBUG::2011-08-18 16:32:45,086::libvirtvm::195::vm.Vm::(run) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::migration downtime thread started
Thread-142102::DEBUG::2011-08-18 16:32:45,087::libvirtvm::223::vm.Vm::(run) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::starting migration monitor thread
Thread-142058::DEBUG::2011-08-18 16:32:47,097::guestIF::148::vm.Vm::(_parseBody) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::['10.34.58.249']
Thread-142108::DEBUG::2011-08-18 16:32:50,651::libvirtvm::162::vm.Vm::(_getDiskStats) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::Disk stats not available
Thread-142102::INFO::2011-08-18 16:32:55,090::libvirtvm::255::vm.Vm::(run) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::Migration Progress: 9 seconds elapsed, 37% of data processed, 37% of mem processed
Thread-142101::DEBUG::2011-08-18 16:33:00,088::libvirtvm::204::vm.Vm::(run) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::setting migration downtime to 50
Thread-142116::DEBUG::2011-08-18 16:33:00,942::libvirtvm::162::vm.Vm::(_getDiskStats) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::Disk stats not available
Thread-142102::INFO::2011-08-18 16:33:05,092::libvirtvm::255::vm.Vm::(run) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::Migration Progress: 20 seconds elapsed, 82% of data processed, 82% of mem processed
libvirtEventLoop::DEBUG::2011-08-18 16:33:09,539::libvirtvm::1252::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::event Stopped detail 5 opaque None
libvirtEventLoop::INFO::2011-08-18 16:33:09,545::vm::443::vm.Vm::(_onQemuDeath) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::underlying process disconnected
libvirtEventLoop::INFO::2011-08-18 16:33:09,545::libvirtvm::1156::vm.Vm::(releaseVm) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::Release VM resources
Thread-142098::DEBUG::2011-08-18 16:33:09,549::libvirtvm::210::vm.Vm::(cancel) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::canceling migration downtime thread
libvirtEventLoop::DEBUG::2011-08-18 16:33:09,550::utils::314::vm.Vm::(stop) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::Stop statistics collection
Thread-142101::DEBUG::2011-08-18 16:33:09,551::libvirtvm::207::vm.Vm::(run) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::migration downtime thread exiting
Thread-142098::DEBUG::2011-08-18 16:33:09,550::libvirtvm::260::vm.Vm::(stop) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::stopping migration monitor thread
Thread-142057::DEBUG::2011-08-18 16:33:09,551::utils::345::vm.Vm::(run) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::Stats thread finished
Thread-142098::ERROR::2011-08-18 16:33:09,552::vm::175::vm.Vm::(_recover) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::operation failed: domain is no longer running
libvirtEventLoop::WARNING::2011-08-18 16:33:09,554::libvirtvm::1167::vm.Vm::(releaseVm) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::Traceback (most recent call last):
libvirtError: Domain not found: no domain with matching uuid '51dec7f4-9491-425c-abc2-69a746b17914'
libvirtEventLoop::DEBUG::2011-08-18 16:33:09,611::vm::715::vm.Vm::(_cleanup) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::Drive None cleanup
libvirtEventLoop::DEBUG::2011-08-18 16:33:09,611::vm::715::vm.Vm::(_cleanup) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::Drive /rhev/data-center/6d6bee26-b863-11e0-bef1-0016365acdc4/b8fd6843-d687-4a67-aa70-611a6f146117/images/87863edb-6ce2-4f7c-be61-33898d2ef989/5eda97f0-1c9e-492e-93fc-3973a13bb555 cleanup
Thread-142098::ERROR::2011-08-18 16:33:11,045::vm::230::vm.Vm::(run) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::Traceback (most recent call last):
libvirtEventLoop::DEBUG::2011-08-18 16:33:11,051::vm::739::vm.Vm::(setDownStatus) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::Changed state to Down: Lost connection with qemu process
libvirtEventLoop::DEBUG::2011-08-18 16:33:11,052::utils::314::vm.Vm::(stop) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::Stop statistics collection
Thread-142125::DEBUG::2011-08-18 16:33:11,314::clientIF::54::vds::(wrapper) [10.34.58.24]::call destroy with ('51dec7f4-9491-425c-abc2-69a746b17914',) {}
Thread-142125::INFO::2011-08-18 16:33:11,315::clientIF::436::vds::(destroy) vmContainerLock acquired by vm 51dec7f4-9491-425c-abc2-69a746b17914
Thread-142125::DEBUG::2011-08-18 16:33:11,315::libvirtvm::1196::vm.Vm::(destroy) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::destroy Called
Thread-142125::DEBUG::2011-08-18 16:33:11,316::libvirtvm::1191::vm.Vm::(deleteVm) vmId=`51dec7f4-9491-425c-abc2-69a746b17914`::Total desktops after destroy of 51dec7f4-9491-425c-abc2-69a746b17914 is 2

Comment 2 Marian Krcmarik 2011-08-24 11:20:04 UTC
I've reproduced as well, with same observation as described in original report:

Guest moved used index from 9278 to 100582011-08-24 12:59:24.116: shutting down

Comment 3 Dor Laor 2011-09-12 06:41:24 UTC
Does it happens without spice?
Was this standard migration or a loadvm ?
Can you please reproduce it again with all the recent components?

Comment 4 Gerd Hoffmann 2011-09-12 07:32:47 UTC
hw/virtio.c:

static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
{
    uint16_t num_heads = vring_avail_idx(vq) - idx;

    /* Check it isn't doing very strange things with descriptor numbers. */
    if (num_heads > vq->vring.num) {
        fprintf(stderr, "Guest moved used index from %u to %u",
                idx, vring_avail_idx(vq));
        exit(1);
    }

    return num_heads;
}

Doesn't look spice related at first glance. 
Might be memory corruption though.
Same question as Dor: Does it happens without spice?

Comment 5 Lubos Kocman 2011-09-15 12:18:43 UTC
Issue does not appear any more on 

qemu-kvm-0.12.1.2-2.189.el6.alon.729621.x86_64
libvirt-0.9.4-7.el6.x86_64
spice-server-0.8.2-3.el6.x86_64
vdsm-4.9-96.el6.x86_64


Proposing to close as current release

Comment 6 Cameron Meadors 2011-09-22 19:35:38 UTC
Moving directly to VERIFIED.  After talking to dblechte and doing some investigation, I am confident that there was a code change that fixed this bug, but unfortunately it is not possible to identify it under the current time constraints.  The fact that two people could reproduce reinforces that the bug probably was not due to environment.  I think this is better than CLOSED CURRENTRELEASE.

Comment 13 errata-xmlrpc 2011-12-06 15:56:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1531.html