Bug 629453

Summary: Dest VM process terminated with status 1 during live migration
Product: Red Hat Enterprise Linux 6 Reporter: YangFeng <fyang>
Component: qemu-kvmAssignee: Juan Quintela <quintela>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 6.1CC: alex.williamson, michen, mkenneth, pbonzini, tburke, virt-maint
Target Milestone: betaKeywords: Triaged
Target Release: 6.1   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-03-24 10:31:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 580954    

Description YangFeng 2010-09-02 02:26:42 UTC
Description of problem:
Dest VM process terminated with status 1 during live migration. Even source VM report migrate completed.
Following is part of error message got from VM.

(qemu) handle_dev_input: stop
(qemu) spice_server_migrate_end: status success
(qemu) spice_server_migrate_end: no peer connected
(qemu) qemu: warning: error while loading state for instance 0x0 of device 'cpu'
(qemu) load of migration failed
(qemu) (Process terminated with status 1)

Version-Release number of selected component (if applicable):
Host kernel: 2.6.32-70.el6.x86_64
qemu-kvm: qemu-kvm-0.12.1.2-2.113.el6.x86_64

How reproducible:
2% 2 fails, for 100 times.

Steps to Reproduce:
1. Start source VM. e.g.
/usr/local/staf/test/RHEV/kvm-new/autotest/client/tests/kvm/qemu -name 'vm1' -chardev socket,id=human_monitor_UsRh,path=/tmp/monitor-humanmonitor1-20100901-144206-SphD,server,nowait -mon chardev=human_monitor_UsRh,mode=readline -chardev socket,id=serial_dkmv,path=/tmp/serial-20100901-144206-SphD,server,nowait -device isa-serial,chardev=serial_dkmv -drive file='/usr/local/staf/test/RHEV/kvm-new/autotest/client/tests/kvm/images/win7-32-virtio.qcow2',index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,boot=on,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=idc9iuBc,id=ndev00idc9iuBc,mac='02:30:1F:42:12:1e',bus=pci.0,addr=0x3 -netdev tap,id=idc9iuBc,ifname='virtio_0_8000',script='/usr/local/staf/test/RHEV/kvm-new/autotest/client/tests/kvm/scripts/qemu-ifup-switch',downscript='no' -m 2048 -smp 1 -drive file='/usr/local/staf/test/RHEV/kvm-new/autotest/client/tests/kvm/isos/windows/winutils.iso',index=1,if=none,id=drive-ide0-0-0,media=cdrom,readonly=on,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -cpu qemu64,+x2apic -vnc :0 -spice port=8000,disable-ticketing -rtc base=localtime,clock=host,driftfix=none -M rhel6.0.0 -usbdevice tablet
2. Run 'start ping -t localhost' in source VM as background process.
3. Start dest VM. e.g.
/usr/local/staf/test/RHEV/kvm-new/autotest/client/tests/kvm/qemu -name 'vm1' -chardev socket,id=human_monitor_bFKv,path=/tmp/monitor-humanmonitor1-20100901-144206-SphD,server,nowait -mon chardev=human_monitor_bFKv,mode=readline -chardev socket,id=serial_34va,path=/tmp/serial-20100901-144206-SphD,server,nowait -device isa-serial,chardev=serial_34va -drive file='/usr/local/staf/test/RHEV/kvm-new/autotest/client/tests/kvm/images/win7-32-virtio.qcow2',index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,boot=on,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=idCgSIoa,id=ndev00idCgSIoa,mac='02:30:1F:42:12:1e',bus=pci.0,addr=0x3 -netdev tap,id=idCgSIoa,ifname='virtio_0_8001',script='/usr/local/staf/test/RHEV/kvm-new/autotest/client/tests/kvm/scripts/qemu-ifup-switch',downscript='no' -m 2048 -smp 1 -drive file='/usr/local/staf/test/RHEV/kvm-new/autotest/client/tests/kvm/isos/windows/winutils.iso',index=1,if=none,id=drive-ide0-0-0,media=cdrom,readonly=on,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -cpu qemu64,+x2apic -vnc :1 -spice port=8001,disable-ticketing -rtc base=localtime,clock=host,driftfix=none -M rhel6.0.0 -usbdevice tablet -incoming tcp:0:5200
4. Send 'migrate -d tcp:localhost:5200' monitor command to source VM to start migration.

  
Actual results:
Some seconds late we got following error message and dest VM terminated. Even source VM return migrate completed by 'info migrate'.

(qemu) spice_server_migrate_end: no peer connected
(qemu) qemu: warning: error while loading state for instance 0x0 of device 'cpu'
(qemu) load of migration failed
(qemu) (Process terminated with status 1)


Expected results:
Dest VM do not terminate and Migration complete successfully.

Additional info:
More debug info got from autotest script:
09/01 15:30:33 DEBUG|kvm_subpro:0699| (qemu) create_cairo_context: using cairo canvas
09/01 15:30:36 INFO |kvm_test_u:0056| Trying to log into guest 'vm1', timeout 1200s
09/01 15:30:36 DEBUG| kvm_utils:0741| Trying to login with command 'nc 10.66.83.32 10022'
09/01 15:30:37 DEBUG| kvm_utils:0649| Got shell prompt -- logged in
09/01 15:30:37 INFO |kvm_test_u:0061| Logged into guest 'vm1'
09/01 15:30:37 DEBUG|kvm_subpro:1139| Sending command: tasklist | find /I "ping.exe"
09/01 15:30:38 DEBUG|kvm_monito:0191| Send command: help
09/01 15:30:38 DEBUG|kvm_monito:0191| Send command: screendump /dev/shm/scrdump-ChBOfF.ppm
09/01 15:30:38 DEBUG|kvm_subpro:0699| (qemu) handle_dev_input: detach
09/01 15:30:38 DEBUG|kvm_subpro:0699| (qemu) handle_dev_input: attach
09/01 15:30:38 DEBUG|kvm_subpro:0699| (qemu) create_cairo_context: using cairo canvas
09/01 15:30:40 DEBUG|    kvm_vm:1065| VM is already down
09/01 15:30:40 DEBUG|    kvm_vm:0571| Getting output of 'qemu -help'
09/01 15:30:40 DEBUG| kvm_utils:0134| generating mac addr 02:30:1F:42:12:1e 
09/01 15:30:41 DEBUG|    kvm_vm:0966| Running qemu command:
/usr/local/staf/test/RHEV/kvm-new/autotest/client/tests/kvm/qemu -name 'vm1' -chardev socket,id=human_monitor_bFKv,path=/tmp/monitor-humanmonitor1-20100901-144206-SphD,server,nowait -mon chardev=human_monitor_bFKv,mode=readline -chardev socket,id=serial_34va,path=/tmp/serial-20100901-144206-SphD,server,nowait -device isa-serial,chardev=serial_34va -drive file='/usr/local/staf/test/RHEV/kvm-new/autotest/client/tests/kvm/images/win7-32-virtio.qcow2',index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,boot=on,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=idCgSIoa,id=ndev00idCgSIoa,mac='02:30:1F:42:12:1e',bus=pci.0,addr=0x3 -netdev tap,id=idCgSIoa,ifname='virtio_0_8001',script='/usr/local/staf/test/RHEV/kvm-new/autotest/client/tests/kvm/scripts/qemu-ifup-switch',downscript='no' -m 2048 -smp 1 -drive file='/usr/local/staf/test/RHEV/kvm-new/autotest/client/tests/kvm/isos/windows/winutils.iso',index=1,if=none,id=drive-ide0-0-0,media=cdrom,readonly=on,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -cpu qemu64,+x2apic -vnc :1 -spice port=8001,disable-ticketing -rtc base=localtime,clock=host,driftfix=none -M rhel6.0.0 -usbdevice tablet -incoming tcp:0:5200
09/01 15:30:41 DEBUG|kvm_subpro:0699| (qemu) do_spice_init: starting 0.4.2-15..el6
09/01 15:30:41 DEBUG|kvm_subpro:0699| (qemu) interface_change_notifier: VD_INTERFACE_KEYBOARD
09/01 15:30:41 DEBUG|kvm_subpro:0699| (qemu) interface_change_notifier: VD_INTERFACE_MOUSE
09/01 15:30:41 DEBUG|kvm_subpro:0699| (qemu) interface_change_notifier: VD_INTERFACE_QXL
09/01 15:30:41 DEBUG|kvm_subpro:0699| (qemu) red_worker_main: begin
09/01 15:30:41 DEBUG|kvm_subpro:0699| (qemu) handle_dev_input: attach
09/01 15:30:41 DEBUG|kvm_subpro:0699| (qemu) create_cairo_context: using cairo canvas
09/01 15:30:42 DEBUG|kvm_monito:0191| Send command: help
09/01 15:30:42 DEBUG|kvm_monito:0191| Send command: help
09/01 15:30:42 DEBUG|    kvm_vm:1033| VM appears to be alive with PID 13276
09/01 15:30:42 DEBUG|kvm_monito:0339| Migrating to: tcp:localhost:5200
09/01 15:30:42 DEBUG|kvm_monito:0191| Send command: migrate -d tcp:localhost:5200
09/01 15:30:42 DEBUG|kvm_subpro:0699| (qemu) spice_server_migrate_start:
09/01 15:30:42 DEBUG|kvm_subpro:0699| (qemu) spice_server_migrate_start: not connected to peer
09/01 15:30:43 DEBUG|kvm_monito:0191| Send command: help
09/01 15:30:43 DEBUG|kvm_monito:0191| Send command: screendump /dev/shm/scrdump-ChBOfF.ppm
09/01 15:30:43 DEBUG|kvm_subpro:0699| (qemu) handle_dev_input: detach
09/01 15:30:43 DEBUG|kvm_subpro:0699| (qemu) handle_dev_input: attach
09/01 15:30:43 DEBUG|kvm_subpro:0699| (qemu) create_cairo_context: using cairo canvas
09/01 15:30:44 DEBUG| kvm_utils:1020| Waiting for migration to finish... (2.002051 secs)
09/01 15:30:44 DEBUG|kvm_monito:0191| Send command: info migrate
09/01 15:30:46 DEBUG| kvm_utils:1020| Waiting for migration to finish... (4.015443 secs)
09/01 15:30:46 DEBUG|kvm_monito:0191| Send command: info migrate
09/01 15:30:48 DEBUG| kvm_utils:1020| Waiting for migration to finish... (6.028508 secs)
09/01 15:30:48 DEBUG|kvm_monito:0191| Send command: info migrate
09/01 15:30:49 DEBUG|kvm_monito:0191| Send command: help
09/01 15:30:49 DEBUG|kvm_monito:0191| Send command: screendump /dev/shm/scrdump-ChBOfF.ppm
09/01 15:30:49 DEBUG|kvm_subpro:0699| (qemu) handle_dev_input: detach
09/01 15:30:49 DEBUG|kvm_subpro:0699| (qemu) handle_dev_input: attach
09/01 15:30:49 DEBUG|kvm_subpro:0699| (qemu) create_cairo_context: using cairo canvas
09/01 15:30:50 DEBUG| kvm_utils:1020| Waiting for migration to finish... (8.041617 secs)
09/01 15:30:50 DEBUG|kvm_monito:0191| Send command: info migrate
09/01 15:30:52 DEBUG| kvm_utils:1020| Waiting for migration to finish... (10.053775 secs)
09/01 15:30:52 DEBUG|kvm_monito:0191| Send command: info migrate
09/01 15:30:54 DEBUG|kvm_monito:0191| Send command: help
09/01 15:30:54 DEBUG|kvm_monito:0191| Send command: screendump /dev/shm/scrdump-ChBOfF.ppm
09/01 15:30:54 DEBUG|kvm_subpro:0699| (qemu) handle_dev_input: detach
09/01 15:30:54 DEBUG|kvm_subpro:0699| (qemu) handle_dev_input: attach
09/01 15:30:54 DEBUG|kvm_subpro:0699| (qemu) create_cairo_context: using cairo canvas
09/01 15:30:54 DEBUG| kvm_utils:1020| Waiting for migration to finish... (12.066798 secs)
09/01 15:30:54 DEBUG|kvm_monito:0191| Send command: info migrate
09/01 15:30:54 DEBUG|kvm_subpro:0699| (qemu) handle_dev_input: stop
09/01 15:30:54 DEBUG|kvm_subpro:0699| (qemu) spice_server_migrate_end: status success
09/01 15:30:54 DEBUG|kvm_subpro:0699| (qemu) spice_server_migrate_end: no peer connected
09/01 15:30:54 DEBUG|kvm_subpro:0699| (qemu) qemu: warning: error while loading state for instance 0x0 of device 'cpu'
09/01 15:30:54 DEBUG|kvm_subpro:0699| (qemu) load of migration failed
09/01 15:30:54 DEBUG|kvm_prepro:0412| (address cache) Adding cache entry: 00:15:17:a5:9a:b2 ---> 10.66.83.184
09/01 15:30:54 DEBUG|kvm_prepro:0412| (address cache) Adding cache entry: 00:15:17:a5:9a:b2 ---> 10.66.83.184
09/01 15:30:54 DEBUG|kvm_prepro:0412| (address cache) Adding cache entry: 00:15:17:a5:9a:b2 ---> 10.66.83.184
09/01 15:30:54 DEBUG|kvm_subpro:0699| (qemu) (Process terminated with status 1)
09/01 15:30:56 DEBUG| kvm_utils:1020| Waiting for migration to finish... (14.078883 secs)
09/01 15:30:56 DEBUG|kvm_monito:0191| Send command: info migrate
09/01 15:30:56 DEBUG|kvm_monito:0191| Send command: info migrate
09/01 15:30:56 INFO |kvm_test_u:0218| Migration finished successfully
09/01 15:30:56 DEBUG|kvm_monito:0191| Send command: info status
09/01 15:30:56 DEBUG|    kvm_vm:1065| VM is already down
09/01 15:30:56 ERROR|       kvm:0080| Test failed: MonitorSendError: Could not send monitor command 'info status'
09/01 15:30:56 DEBUG|kvm_prepro:0122| Postprocessing VM 'vm1'...
09/01 15:30:56 DEBUG|kvm_prepro:0125| VM object found in environment

Comment 6 Juan Quintela 2011-03-01 09:07:01 UTC
I think this is fixed by:

commit 67f2c53eac01bc0d1f45ddb0f688f8dc439134c2
Author: Paolo Bonzini <pbonzini>
Date:   Fri Feb 4 13:31:17 2011 -0200

    savevm: fix corruption in vmstate_subsection_load().
    

Could you confirm that you can reproduce it?
If you can reproduce it, please test with/without spice. spice got completely changed on RHEL6.1, so it is a good way to test it.

Comment 7 Dor Laor 2011-03-01 12:40:58 UTC
Juan is that downstream commit? Which version do you expect QE or Alex to test?

Comment 8 Dor Laor 2011-03-24 10:31:29 UTC

*** This bug has been marked as a duplicate of bug 671100 ***