This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 920205 - RHEVM 3.2, migration - long pause till vm starts running on destination
RHEVM 3.2, migration - long pause till vm starts running on destination
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt (Show other bugs)
6.4
x86_64 Windows
medium Severity medium
: rc
: ---
Assigned To: Michal Privoznik
Virtualization Bugs
: ZStream
Depends On: 984793
Blocks: 984413
  Show dependency treegraph
 
Reported: 2013-03-11 10:54 EDT by Vimal Patel
Modified: 2014-03-19 16:04 EDT (History)
23 users (show)

See Also:
Fixed In Version: libvirt-0.10.2-20.el6
Doc Type: Bug Fix
Doc Text:
Cause: During migration, libvirt is waiting for qemu to migrate. Moreover, it's waiting for spice to migrate too. The resume of domain on the destination was done after both qemu & spice are migrated. Consequence: The resume of domain on the destination was postponed until spice migrate too. Even though there is no real need for this. This resulted in long pause until domain was resumed on destination. Fix: The wait for spice migration was moved at the end of migration process. Hence, the resume is done as soon as qemu migrates itself. Result: There's no big pause on the destination until qemu resumes.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-11-21 03:49:18 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
src qemu log (189.27 KB, text/plain)
2013-03-12 17:44 EDT, Marian Krcmarik
no flags Details
dst qemu log (298.96 KB, text/plain)
2013-03-12 17:45 EDT, Marian Krcmarik
no flags Details
libvirtd_rhevm_log (106.98 KB, text/plain)
2013-07-17 02:45 EDT, Hu Jianwei
no flags Details
vdsm_rhevm_log (182.27 KB, text/plain)
2013-07-17 02:46 EDT, Hu Jianwei
no flags Details

  None (edit)
Description Vimal Patel 2013-03-11 10:54:51 EDT
Description of problem:
When migrating a guest from one host to another, if the migration happens while the client is watching a video a 5 second pause in the video can be seen by the user.

When migrating a guest from one host to another,if the migration happens while the client is recording audio, there was 15sec gap where recording stopped and recording after the migration was not recorded. 

*Tested using a Win7 64 client & guest with seamless migration on.

Version-Release number of selected component (if applicable):
RHEVM3.2 sf9
RHEL 6.4 host
mingw-virt-viewer: rhevm-spice-client-x64-cab-3.2-2

How reproducible:
100%

Steps to Reproduce:
1.  Set seamless migration on
2.  Migrate the guest while the client is watching a video or recording audio on the guest
3.
  
Actual results:
Client sees almost no loss of data, and is able to continue watching a video or record audio w/o interruption.

Expected results:
Client sees loss of data during migration of the guest.

Additional info:
Comment 1 Marian Krcmarik 2013-03-12 17:43:45 EDT
I did some more testing:

- The stop in video playback during migration happens even if running qemu directly from cli and migrating with using qemu monitor on the same hosts as are used in RHEVM.
- It's 1Gbit LAN, client is three hops away from hosts, hosts are next to each other.
- Clocks of hosts are synced.
- It seems that It's much more obvious/reproducible for Guests with more than one qxl device defined even though only one screen is enabled.

I will attach spice server debug dst and src logs.
Comment 2 Marian Krcmarik 2013-03-12 17:44:27 EDT
Created attachment 709192 [details]
src qemu log
Comment 3 Marian Krcmarik 2013-03-12 17:45:07 EDT
Created attachment 709193 [details]
dst qemu log
Comment 10 mazhang 2013-04-09 03:00:50 EDT
Can not reproduce this issue with qemu-kvm directly, here is my command line and steps.

host:RHEL6.4
kernel:2.6.32-358.el6.x86_64
qemu:qemu-kvm-0.12.1.2-2.355.el6.x86_64
guest:win7-64

1 boot up guest from hostA (10.66.83.38)
/usr/libexec/qemu-kvm \
-M pc \
-cpu SandyBridge \
-m 2G  \
-smp 2,sockets=1,cores=2,threads=1 \
-name rhel6u4 \
-uuid 990ea161-6b67-47b2-b803-19fb01d30d12 \
-smbios type=1,manufacturer='Red Hat',product='RHEV Hypervisor',version=el6,serial=koTUXQrb,uuid=feebc8fd-f8b0-4e75-abc3-e63fcdb67170 \
-k en-us \
-rtc base=localtime,clock=host,driftfix=slew \
-enable-kvm \
-monitor stdio \
-boot menu=on \
-qmp tcp:0:6666,server,nowait \
-drive file=/mnt/windows/win7.qcow2,if=none,id=drive-ide0-0-0,format=qcow2,cache=none,aio=threads \
-device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 \
-vga qxl \
-spice port=5900,disable-ticketing,seamless-migration=on \
-nodefaults \
-netdev tap,id=hostnet0,downscript=no,vhost=on \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:2e:28:1c,bootindex=3 \
-device intel-hda,id=sound0,bus=pci.0 \
-device hda-duplex \
-device qxl,id=video1,vram_size=33554432,bus=pci.0,addr=0x5 \
-device qxl,id=video2,vram_size=33554432,bus=pci.0,addr=0x6 \

2 boot up qemu in hostB(10.66.82.123) waiting for migration.

3 connect guest with remote-viewer, play a song and record it by sound record.

4 migration
(qemu) __com.redhat_spice_migrate_info 10.66.82.123 5900
(qemu) migrate -d tcp:10.66.82.123:5800

5 after migrate finished , stop record application and play the record file.

result:
no gap or loss data found.
Comment 11 mazhang 2013-04-09 04:13:31 EDT
Try both play video and record audio, can not reproduce this issue, could you have a look at my steps, if any problem let me know.
Comment 12 Yonit Halperin 2013-04-09 07:59:57 EDT
(In reply to comment #11)
> Try both play video and record audio, can not reproduce this issue, could
> you have a look at my steps, if any problem let me know.

As I mention in comment #8, it was reproducible only via Rhvem, and not by using qemu-kvm directly. Also, use just audio playback to avoid other problems.
Comment 13 Qunfang Zhang 2013-04-09 22:39:25 EDT
(In reply to comment #12)
> (In reply to comment #11)
> > Try both play video and record audio, can not reproduce this issue, could
> > you have a look at my steps, if any problem let me know.
> 
> As I mention in comment #8, it was reproducible only via Rhvem, and not by
> using qemu-kvm directly. Also, use just audio playback to avoid other
> problems.

Thanks for the info and QE will try to reproduce it via rhevm environment.
Comment 14 mazhang 2013-04-10 07:02:25 EDT
Try reproduce this bug on RHEVM3.2 sf13 , but guest can't found sound device, and no sound device selection on dashboard, could you tell me how do I add it.

Here is the command line that I boot by rhev-m:

/usr/libexec/qemu-kvm -name test -S -M rhel6.4.0 -cpu Nehalem -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -uuid 02be3a64-d89b-4dd9-8109-a64560d345ae -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=6Server-6.4.0.4.el6,serial=36363136-3935-4E43-4731-323854484D59,uuid=02be3a64-d89b-4dd9-8109-a64560d345ae -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/test.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2013-04-10T18:16:25,driftfix=slew -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/rhev/data-center/348d11fb-040d-41bd-a55a-840456e98339/e7f165f8-c118-4f94-a257-36e173819cab/images/baa983e9-4272-4741-8b09-0f40983360d6/cf871bcf-786d-449c-b795-4a8a900b569b,if=none,id=drive-ide0-0-0,format=raw,serial=baa983e9-4272-4741-8b09-0f40983360d6,cache=none,werror=stop,rerror=stop,aio=threads -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:42:48:aa,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/test.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/test.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -spice port=5900,tls-port=5901,addr=0,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -k en-us -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
Comment 15 Marian Krcmarik 2013-04-10 07:41:17 EDT
(In reply to comment #14)
> Try reproduce this bug on RHEVM3.2 sf13 , but guest can't found sound
> device, and no sound device selection on dashboard, could you tell me how do
> I add it.
> 
> Here is the command line that I boot by rhev-m:
> 
> /usr/libexec/qemu-kvm -name test -S -M rhel6.4.0 -cpu Nehalem -enable-kvm -m
> 1024 -smp 1,sockets=1,cores=1,threads=1 -uuid
> 02be3a64-d89b-4dd9-8109-a64560d345ae -smbios type=1,manufacturer=Red
> Hat,product=RHEV
> Hypervisor,version=6Server-6.4.0.4.el6,serial=36363136-3935-4E43-4731-
> 323854484D59,uuid=02be3a64-d89b-4dd9-8109-a64560d345ae -nodefconfig
> -nodefaults -chardev
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/test.monitor,server,nowait
> -mon chardev=charmonitor,id=monitor,mode=control -rtc
> base=2013-04-10T18:16:25,driftfix=slew -no-shutdown -device
> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
> virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive
> file=/rhev/data-center/348d11fb-040d-41bd-a55a-840456e98339/e7f165f8-c118-
> 4f94-a257-36e173819cab/images/baa983e9-4272-4741-8b09-0f40983360d6/cf871bcf-
> 786d-449c-b795-4a8a900b569b,if=none,id=drive-ide0-0-0,format=raw,
> serial=baa983e9-4272-4741-8b09-0f40983360d6,cache=none,werror=stop,
> rerror=stop,aio=threads -device
> ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1
> -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,serial=
> -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev
> tap,fd=25,id=hostnet0,vhost=on,vhostfd=26 -device
> virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:42:48:aa,bus=pci.0,
> addr=0x3 -chardev
> socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/test.com.redhat.
> rhevm.vdsm,server,nowait -device
> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,
> name=com.redhat.rhevm.vdsm -chardev
> socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/test.org.qemu.
> guest_agent.0,server,nowait -device
> virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,
> name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent
> -device
> virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,
> name=com.redhat.spice.0 -spice
> port=5900,tls-port=5901,addr=0,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-
> channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-
> channel=playback,tls-channel=record,tls-channel=smartcard,tls-
> channel=usbredir,seamless-migration=on -k en-us -vga qxl -global
> qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -device
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

When creating the test vm, have you chosen a VM OS type in Admin portal? If so which one, If not don't leave it with "Unassigned".
Comment 16 mazhang 2013-04-11 06:31:05 EDT
1 boot win7_64 guest with two qxl devices, seamless migration with play audio/vidio, it just pause within 1 second, after migration finished audio and record work well.

2 Version-Release
(mine):
qemu-kvm-rhev-0.12.1.2-2.355
spice-server-0.12.0-12.el6.x86_64
libvirt-0.10.2-18.el6_4.3.x86_64
(mkrcmari)
qemu-kvm-rhev-0.12.1.2-2.348 
spice-server-0.12.0-12 
libvirt-0.10.2-18.el6_4.2

3 To do:
a. wait developer to build qemu-kvm-348 and re-test.
b. set up network like comment #1 mentioned "client is three hops away from hosts".
Comment 17 mazhang 2013-04-11 07:00:33 EDT
downgrade to qemu-kvm-348, spice client will alternate quit and black(still got sound) when migration finished.
Comment 18 mazhang 2013-04-12 00:43:01 EDT
Just reproduce this bug. spice client and host must in different net section, audio playback will pause about 10s while migration.
Comment 19 juzhang 2013-04-12 00:53:32 EDT
(In reply to comment #18)
> Just reproduce this bug. spice client and host must in different net
> section, audio playback will pause about 10s while migration.
Who can tell why "spice client and host must in different net section" is the key factor for reproducing this issue?

Best Regards,
Junyi
Comment 20 mazhang 2013-04-12 06:26:11 EDT
1 CLIENTS
Two host fc17 and rhel6.4 as client connect rhel-m dashboard with firefox.

FC17:
10.66.6.51
spice-xpi-2.7-2.fc17.x86_64
spice-gtk3-0.12-5.fc17.x86_64
spice-glib-0.12-5.fc17.x86_64
spice-gtk-python-0.12-5.fc17.x86_64
spice-client-0.10.1-5.fc17.x86_64
spice-gtk-tools-0.12-5.fc17.x86_64
spice-server-0.10.1-5.fc17.x86_64
spice-vdagent-0.10.1-1.fc17.x86_64
spice-gtk-0.12-5.fc17.x86_64
firefox-16.0.2-1.fc17.x86_64

RHEL6.4:
10.66.6.86
spice-gtk-0.14-7.el6.x86_64
spice-client-debuginfo-0.8.2-15.el6.x86_64
spice-xpi-2.7-15.el6.x86_64
spice-server-debuginfo-0.12.0-12.el6.x86_64
spice-client-0.8.2-15.el6.x86_64
spice-vdagent-0.12.0-4.el6_4.1.x86_64
spice-glib-0.14-7.el6.x86_64
spice-server-0.12.0-12.el6.x86_64
firefox-17.0.5-1.el6_4.x86_64

2 HOSTS
Two machine as rhel-m host in the same cluster for guest migration
rhel6.4(10.66.104.101) and rhel6.4(10.66.104.103)
the same configuration
qemu-img-rhev-0.12.1.2-2.348.el6.x86_64
qemu-kvm-rhev-tools-0.12.1.2-2.348.el6.x86_64
gpxe-roms-qemu-0.9.7-6.9.el6.noarch
qemu-kvm-rhev-0.12.1.2-2.348.el6.x86_64
spice-gtk-0.14-7.el6.x86_64
spice-glib-0.14-7.el6.x86_64
spice-gtk-python-0.14-7.el6.x86_64
spice-server-0.12.0-12.el6.x86_64

3 NETWORK
From FC17(10.66.6.51) to rhel6.4(10.66.104.101/103)
[root@mazhang redhat]# traceroute 10.66.104.101
traceroute to 10.66.104.101 (10.66.104.101), 30 hops max, 60 byte packets
 1  10.66.7.254 (10.66.7.254)  0.987 ms  1.402 ms  1.815 ms
 2  10.66.126.10 (10.66.126.10)  0.434 ms  0.497 ms  0.498 ms
 3  10.66.126.9 (10.66.126.9)  0.519 ms  0.586 ms  0.611 ms
 4  hp-dl388g7-02.qe.lab.eng.nay.redhat.com (10.66.104.101)  0.232 ms  0.177 ms  0.115 ms

From RHEL6.4(10.66.6.86) to rhel6.4(10.66.104.101/103)
[root@localhost ~]# traceroute 10.66.104.101
traceroute to 10.66.104.101 (10.66.104.101), 30 hops max, 60 byte packets
 1  10.66.7.254 (10.66.7.254)  1.099 ms  1.491 ms  1.963 ms
 2  10.66.126.10 (10.66.126.10)  1.297 ms  1.372 ms  1.422 ms
 3  10.66.126.9 (10.66.126.9)  0.579 ms  0.653 ms  0.719 ms
 4  hp-dl388g7-02.qe.lab.eng.nay.redhat.com (10.66.104.101)  0.210 ms  0.198 ms  0.190 ms

4 RESULT
a) In FC17(10.66.6.51) connect rhel-m dashboard with firefox, and open vm console, play audio,then do migration between 10.66.104.101 and 10.66.104.103, audio just pause within 1 second,sometimes display will flash with black,but after migration finished audio play well

b) In RHEL6.4(10.66.6.86) connect rhel-m dashboard with firefox, and open vm console, play audio,then do migration between 10.66.104.101 and 10.66.104.103,
audio playback will pause about 10s while migration.
Comment 21 Yonit Halperin 2013-04-12 10:12:01 EDT
Did you reproduce it directly with qemu-kvm, or via rhevm? If with rhevm, can you now try to reproduce it with qemu-kvm, and a similar network setting?

Thanks,
Yonit.

(In reply to comment #20)
> 1 CLIENTS
> Two host fc17 and rhel6.4 as client connect rhel-m dashboard with firefox.
> 
> FC17:
> 10.66.6.51
> spice-xpi-2.7-2.fc17.x86_64
> spice-gtk3-0.12-5.fc17.x86_64
> spice-glib-0.12-5.fc17.x86_64
> spice-gtk-python-0.12-5.fc17.x86_64
> spice-client-0.10.1-5.fc17.x86_64
> spice-gtk-tools-0.12-5.fc17.x86_64
> spice-server-0.10.1-5.fc17.x86_64
> spice-vdagent-0.10.1-1.fc17.x86_64
> spice-gtk-0.12-5.fc17.x86_64
> firefox-16.0.2-1.fc17.x86_64
> 
> RHEL6.4:
> 10.66.6.86
> spice-gtk-0.14-7.el6.x86_64
> spice-client-debuginfo-0.8.2-15.el6.x86_64
> spice-xpi-2.7-15.el6.x86_64
> spice-server-debuginfo-0.12.0-12.el6.x86_64
> spice-client-0.8.2-15.el6.x86_64
> spice-vdagent-0.12.0-4.el6_4.1.x86_64
> spice-glib-0.14-7.el6.x86_64
> spice-server-0.12.0-12.el6.x86_64
> firefox-17.0.5-1.el6_4.x86_64
> 
> 2 HOSTS
> Two machine as rhel-m host in the same cluster for guest migration
> rhel6.4(10.66.104.101) and rhel6.4(10.66.104.103)
> the same configuration
> qemu-img-rhev-0.12.1.2-2.348.el6.x86_64
> qemu-kvm-rhev-tools-0.12.1.2-2.348.el6.x86_64
> gpxe-roms-qemu-0.9.7-6.9.el6.noarch
> qemu-kvm-rhev-0.12.1.2-2.348.el6.x86_64
> spice-gtk-0.14-7.el6.x86_64
> spice-glib-0.14-7.el6.x86_64
> spice-gtk-python-0.14-7.el6.x86_64
> spice-server-0.12.0-12.el6.x86_64
> 
> 3 NETWORK
> From FC17(10.66.6.51) to rhel6.4(10.66.104.101/103)
> [root@mazhang redhat]# traceroute 10.66.104.101
> traceroute to 10.66.104.101 (10.66.104.101), 30 hops max, 60 byte packets
>  1  10.66.7.254 (10.66.7.254)  0.987 ms  1.402 ms  1.815 ms
>  2  10.66.126.10 (10.66.126.10)  0.434 ms  0.497 ms  0.498 ms
>  3  10.66.126.9 (10.66.126.9)  0.519 ms  0.586 ms  0.611 ms
>  4  hp-dl388g7-02.qe.lab.eng.nay.redhat.com (10.66.104.101)  0.232 ms  0.177
> ms  0.115 ms
> 
> From RHEL6.4(10.66.6.86) to rhel6.4(10.66.104.101/103)
> [root@localhost ~]# traceroute 10.66.104.101
> traceroute to 10.66.104.101 (10.66.104.101), 30 hops max, 60 byte packets
>  1  10.66.7.254 (10.66.7.254)  1.099 ms  1.491 ms  1.963 ms
>  2  10.66.126.10 (10.66.126.10)  1.297 ms  1.372 ms  1.422 ms
>  3  10.66.126.9 (10.66.126.9)  0.579 ms  0.653 ms  0.719 ms
>  4  hp-dl388g7-02.qe.lab.eng.nay.redhat.com (10.66.104.101)  0.210 ms  0.198
> ms  0.190 ms
> 
> 4 RESULT
> a) In FC17(10.66.6.51) connect rhel-m dashboard with firefox, and open vm
> console, play audio,then do migration between 10.66.104.101 and
> 10.66.104.103, audio just pause within 1 second,sometimes display will flash
> with black,but after migration finished audio play well
> 
> b) In RHEL6.4(10.66.6.86) connect rhel-m dashboard with firefox, and open vm
> console, play audio,then do migration between 10.66.104.101 and
> 10.66.104.103,
> audio playback will pause about 10s while migration.
Comment 22 Yonit Halperin 2013-04-12 10:46:53 EDT
iiuc, from libvirt migration code, libvirt is the one who is responsible to start the cpu on the destination. However, even though the vm can be started immediately after migration has finished, it doesn't happen, since libvirt first waits till spice migration on the src side has completed. This is not necessary - the waiting in the src side is essential only for not terminating src-spice before it completed its migration operations.

So if the above is correct, there are 2 issues here
1) why is spice migration completion gets slower when the client is on a different subnet, even though we are still on a low latency connection?
2) starting the dest vm should not be synchronized with spice migration.

Michal, does this make sense?

Yonit.
Comment 23 Michal Privoznik 2013-04-12 12:43:34 EDT
Yonit,

I think as we agreed on Bug 836135 libvirt should not kill source before spice migrates its internal data. That is, as soon as qemu tells libvirt "now it's safe to kill me", we send a cookie to destination which translates into 'cont' on monitor command. So yes, there can be a gap where libvirt is waiting on qemu which is just migrating some spice internal state but CPUs are not running. So just to make sure I understand this correctly, following flow is safe:
1) libvirt starts a paused guest on dst
2) libvirt starts migration on src
3) libvirt waits for migration to complete (query-migrate reports "completed" even though "query-spice" reports 'not yet')
4) immediately sends notification to dst so guest can 'cont'
5) waits until "query-spice" reports "completed"
6) kills qemu on src

Currently, steps 4 and 5 are switched.
Comment 24 Yonit Halperin 2013-04-12 13:33:33 EDT
(In reply to comment #23)
> Yonit,
> 
> I think as we agreed on Bug 836135 libvirt should not kill source before
> spice migrates its internal data. That is, as soon as qemu tells libvirt
> "now it's safe to kill me", we send a cookie to destination which translates
> into 'cont' on monitor command. So yes, there can be a gap where libvirt is
> waiting on qemu which is just migrating some spice internal state but CPUs
> are not running. So just to make sure I understand this correctly, following
> flow is safe:
> 1) libvirt starts a paused guest on dst
> 2) libvirt starts migration on src
> 3) libvirt waits for migration to complete (query-migrate reports
> "completed" even though "query-spice" reports 'not yet')
> 4) immediately sends notification to dst so guest can 'cont'
> 5) waits until "query-spice" reports "completed"
> 6) kills qemu on src
> 
> Currently, steps 4 and 5 are switched.

Yes, this flow should be safe. So, the remaining question is why spice migration is slower on some settings, even if they are low latency ones, and whether it happens when running qemu-kvm directly, or just with rhevm.
Comment 25 mazhang 2013-04-13 03:48:00 EDT
(In reply to comment #21)
> Did you reproduce it directly with qemu-kvm, or via rhevm? If with rhevm,
> can you now try to reproduce it with qemu-kvm, and a similar network setting?
> 
Yes, I also try to reproduce this problem with qemu-kvm directly in the same network, it didn't happen.
Comment 26 Yonit Halperin 2013-04-15 15:51:40 EDT
I'm moving the bug to libvirt + I opened a new bug for spice-gtk, bug #952375, for finding why the src spice-server doesn't identify spice-gtk disconnection (or alternatively, why spice-gtk doesn't close the sockets to the src).
Comment 27 Huang Wenlong 2013-04-17 01:31:03 EDT
Hi,

I am a libvirt QE I can reproduce this bug with 
RHEVM:
IP: 10.66.6.60
rhevm-3.2.0-10.18.beta2.el6ev.noarch
spice-gtk-0.14-7.el6.x86_64
spice-gtk-python-0.14-7.el6.x86_64
spice-xpi-2.7-22.el6.x86_64
spice-vdagent-0.12.0-4.el6_4.1.x86_64
spice-server-0.12.0-12.el6.x86_64
spice-glib-0.14-7.el6.x86_64
rhevm-spice-client-x86-cab-3.2-7.el6ev.noarch
rhevm-spice-client-x64-cab-3.2-7.el6ev.noarch

Web-Client:
IP: 10.66.6.60
spice-gtk-0.14-7.el6.x86_64
spice-gtk-python-0.14-7.el6.x86_64
spice-xpi-2.7-22.el6.x86_64
spice-vdagent-0.12.0-4.el6_4.1.x86_64
spice-server-0.12.0-12.el6.x86_64
spice-glib-0.14-7.el6.x86_64
rhevm-spice-client-x86-cab-3.2-7.el6ev.noarch
rhevm-spice-client-x64-cab-3.2-7.el6ev.noarch

TWO hosts: 
IP: 10.66.82.249 and 10.66.82.251
libvirt-0.10.2-18.el6_4.4.x86_64
libvirt-client-0.10.2-18.el6_4.4.x86_64
libvirt-devel-0.10.2-18.el6_4.4.x86_64
libvirt-lock-sanlock-0.10.2-18.el6_4.4.x86_64
libvirt-python-0.10.2-18.el6_4.4.x86_64
qemu-kvm-rhev-0.12.1.2-2.355.el6_4.2.x86_64
qemu-kvm-rhev-tools-0.12.1.2-2.355.el6_4.2.x86_64
spice-client-0.8.2-15.el6.x86_64
spice-glib-0.14-7.el6.x86_64
spice-gtk-0.14-7.el6.x86_64
spice-gtk-python-0.14-7.el6.x86_64
spice-server-0.12.0-12.el6.x86_64
spice-vdagent-0.12.0-4.el6_4.1.x86_64
vdsm-4.10.2-15.0.el6ev.x86_64
vdsm-cli-4.10.2-15.0.el6ev.noarch
vdsm-python-4.10.2-15.0.el6ev.x86_64
vdsm-xmlrpc-4.10.2-15.0.el6ev.noarch


Steps : 
1) add 2 hosts in one cluster via RHEVM 
2) install one *Desktop* Win7-64 vm via RHEVM
3) play one video and one sound recorder 
4) do live migration 
5) the sound recorder will lost about 10s data when migration finish and vm start


Wenlong
Comment 28 Huang Wenlong 2013-04-17 02:04:50 EDT
Sorry I change the bug status by mistake !
Comment 29 Michal Privoznik 2013-06-10 10:51:46 EDT
I've just proposed the patch on upstream list:

https://www.redhat.com/archives/libvir-list/2013-June/msg00446.html
Comment 30 Michal Privoznik 2013-06-18 09:09:18 EDT
The patch has been just pushed upstream:

commit 9da7b11bcd3e9732dd881a9e6158a0c98bafd9fe
Author:     Michal Privoznik <mprivozn@redhat.com>
AuthorDate: Mon Jun 10 15:35:03 2013 +0200
Commit:     Michal Privoznik <mprivozn@redhat.com>
CommitDate: Tue Jun 18 14:32:52 2013 +0200

    qemu_migration: Move waiting for SPICE migration
    
    Currently, we wait for SPICE to migrate in the very same loop where we
    wait for qemu to migrate. This has a disadvantage of slowing seamless
    migration down. One one hand, we should not kill the domain until all
    SPICE data has been migrated.  On the other hand, there is no need to
    wait in the very same loop and hence slowing down 'cont' on the
    destination. For instance, if users are watching a movie, they can
    experience the movie to be stopped for a couple of seconds, as
    processors are not running nor on src nor on dst as libvirt waits for
    SPICE to migrate. We should move the waiting phase to migration CONFIRM
    phase.

v1.0.6-90-g9da7b11
Comment 32 Jiri Denemark 2013-07-09 08:05:10 EDT
The patch seems to be safe but migration needs to be thoroughly tested to make sure this did not break anything.
Comment 33 Eric Blake 2013-07-09 16:55:56 EDT
Back to assigned; on-list review pointed out a need for a v2 that doesn't touch .gnulib submodule.
Comment 35 Michal Privoznik 2013-07-12 03:07:57 EDT
Eric posted a new version since my patch from comment 34 conflicts when patch for a different bug is applied first:

http://post-office.corp.redhat.com/archives/rhvirt-patches/2013-July/msg00259.html
Comment 38 Hu Jianwei 2013-07-17 02:45:12 EDT
Created attachment 774642 [details]
libvirtd_rhevm_log

guest migration failed.
Comment 39 Hu Jianwei 2013-07-17 02:46:27 EDT
Created attachment 774644 [details]
vdsm_rhevm_log

guest migration failed.
Comment 40 Hu Jianwei 2013-07-17 02:49:11 EDT
I met migration fail on libvirt-0.10.2-20.el6.x86_64 in rhevm,so can not verify this bug, after checking the logs, maybe this migration fail issue is same as bug 984793,
https://bugzilla.redhat.com/show_bug.cgi?id=984793#c12

Hosts packages version:
libvirt-0.10.2-20.el6.x86_64
libvirt-client-0.10.2-20.el6.x86_64
libvirt-debuginfo-0.10.2-20.el6.x86_64
libvirt-devel-0.10.2-20.el6.x86_64
libvirt-lock-sanlock-0.10.2-20.el6.x86_64
libvirt-python-0.10.2-20.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.355.el6_4.6.x86_64
qemu-kvm-rhev-debuginfo-0.12.1.2-2.355.el6_4.2.x86_64
qemu-kvm-rhev-tools-0.12.1.2-2.355.el6_4.2.x86_64
spice-client-0.8.2-15.el6.x86_64
spice-glib-0.14-7.el6.x86_64
spice-gtk-0.14-7.el6.x86_64
spice-gtk-python-0.14-7.el6.x86_64
spice-server-0.12.0-12.el6.x86_64
spice-vdagent-0.12.0-4.el6.x86_64
vdsm-4.10.2-23.0.el6ev.x86_64
vdsm-cli-4.10.2-23.0.el6ev.noarch
vdsm-python-4.10.2-23.0.el6ev.x86_64
vdsm-xmlrpc-4.10.2-23.0.el6ev.noarch

Also, I attached the logs, libvirtd_rhevm_log and vdsm_rhevm_log.Could you please help me confirm this?
Comment 42 Hu Jianwei 2013-07-17 06:04:24 EDT
Because of Bug 984793, libvirtd crash when guest do migration. I'll verify it after that bug fixed.
Comment 45 Hu Jianwei 2013-07-23 04:31:22 EDT
Hi,

Using remote-viewer to connect VM from rhevm, playing audio/video in VM, no long time pause or interrupts during migration, just 1~2 seconds pause, I think it's acceptable.

PC list:
1.my_pc, 10.66.7.130
2.rhevm, 10.66.6.169
3.host1, 10.66.106.23
4.host2, 10.66.106.22

But, I found one issue, after migration finished, the volume of remote-viewer application in my_pc(Firefox access rhevm server using my_pc 10.66.7.130) will change to minimal. So, we need change the volume of remote-viewer to back manally, or nothing will be output. Please see the attached images, could you help check whether it's default setting or one defect? I remembered no this issue in comment 27's versions. 

Versions:

rhevm, 10.66.6.169:
[root@rhevm ~]# rpm -qa | grep -E "rhevm|spice"
rhevm-spice-client-x64-cab-3.2-10.el6ev.noarch
rhevm-setup-3.2.0-11.30.el6ev.noarch
rhevm-webadmin-portal-3.2.0-11.30.el6ev.noarch
rhevm-iso-uploader-3.2.2-1.el6ev.noarch
rhevm-cli-3.2.0.9-1.el6ev.noarch
rhevm-genericapi-3.2.0-11.30.el6ev.noarch
rhevm-dbscripts-3.2.0-11.30.el6ev.noarch
rhevm-config-3.2.0-11.30.el6ev.noarch
rhevm-3.2.0-11.30.el6ev.noarch
spice-vdagent-0.12.0-4.el6.x86_64
rhevm-image-uploader-3.2.2-1.el6ev.noarch
rhevm-backend-3.2.0-11.30.el6ev.noarch
rhevm-spice-client-x86-cab-3.2-10.el6ev.noarch
rhevm-notification-service-3.2.0-11.30.el6ev.noarch
rhevm-sdk-3.2.0.11-1.el6ev.noarch
rhevm-restapi-3.2.0-11.30.el6ev.noarch
rhevm-tools-common-3.2.0-11.30.el6ev.noarch
rhevm-log-collector-3.2.2-3.el6ev.noarch
rhevm-doc-3.2.0-4.el6eng.noarch
rhevm-userportal-3.2.0-11.30.el6ev.noarch

Two hosts(host1 and host2):
[root@dell-per415-04 ~]# rpm -qa | grep -E "libvirt|^qemu|spice|vdsm" | sort
libvirt-0.10.2-21.el6.x86_64
libvirt-client-0.10.2-21.el6.x86_64
libvirt-debuginfo-0.10.2-21.el6.x86_64
libvirt-devel-0.10.2-21.el6.x86_64
libvirt-lock-sanlock-0.10.2-21.el6.x86_64
libvirt-python-0.10.2-21.el6.x86_64
qemu-img-rhev-0.12.1.2-2.378.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.378.el6.x86_64
qemu-kvm-rhev-debuginfo-0.12.1.2-2.378.el6.x86_64
qemu-kvm-rhev-tools-0.12.1.2-2.378.el6.x86_64
spice-client-0.8.2-15.el6.x86_64
spice-glib-0.14-7.el6.x86_64
spice-gtk-0.14-7.el6.x86_64
spice-gtk-python-0.14-7.el6.x86_64
spice-server-0.12.3-1.el6.x86_64
spice-vdagent-0.12.0-4.el6.x86_64
vdsm-4.10.2-23.0.el6ev.x86_64
vdsm-cli-4.10.2-23.0.el6ev.noarch
vdsm-python-4.10.2-23.0.el6ev.x86_64
vdsm-xmlrpc-4.10.2-23.0.el6ev.noarch

BR,
Jianwei
Comment 52 mazhang 2013-07-25 05:18:15 EDT
I cant reproduce this problem with qemu-kvm-0.12.1.2-2.379.el6.x86_64 qemu-kvm-0.12.1.2-2.355.el6.x86_64 and qemu-kvm-rhev-0.12.1.2-2.375.el6.x86_64 

here is my steps:
1. boot up guest with:
/usr/libexec/qemu-kvm -M rhel6.4.0 -cpu SandyBridge -m 2G -smp 2,sockets=1,cores=2,threads=1,maxcpus=16 -enable-kvm -name rhel64 -uuid 990ea161-6b67-47b2-b803-19fb01d30d12 -smbios type=1,manufacturer='Red Hat',product='RHEV Hypervisor',version=el6,serial=koTUXQrb,uuid=feebc8fd-f8b0-4e75-abc3-e63fcdb67170 -k en-us -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -monitor stdio -qmp tcp:0:6666,server,nowait -boot menu=on -bios /usr/share/seabios/bios.bin -netdev tap,id=hostnet0,downscript=no,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:2e:28:1c,bus=pci.0,addr=0x4,bootindex=2 -chardev socket,path=/tmp/isa-serial,server,nowait,id=isa1 -device isa-serial,chardev=isa1,id=isa-serial1 -vga qxl -spice port=5900,disable-ticketing -device virtio-balloon-pci,id=balloon,bus=pci.0,addr=0x3 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -device intel-hda,id=sound0,bus=pci.0 -device hda-duplex -drive file=/home/win7-64.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1

2. connect guest use remote-viewer, play a audio.

3. migrate guest to destination host

actual result:
after migrate finished, sound volume on des host is the same as src.
both spice client and remote-viewer cant reproduce this problem.


I'll try reproduce this issue on rhev-m
Comment 53 mazhang 2013-07-25 07:03:04 EDT
also can not reproduce on rhev-m with qemu-kvm-rhev-0.12.1.2-2.375.el6.x86_64, after migration finished sound device works well.

host configuration:
[root@m1 ~]# rpm -qa |grep libvirt
libvirt-lock-sanlock-0.10.2-19.el6.x86_64
libvirt-0.10.2-19.el6.x86_64
libvirt-python-0.10.2-19.el6.x86_64
libvirt-client-0.10.2-19.el6.x86_64
[root@m1 ~]# rpm -qa |grep vdsm
vdsm-python-4.10.2-23.0.el6ev.x86_64
vdsm-cli-4.10.2-23.0.el6ev.noarch
vdsm-4.10.2-23.0.el6ev.x86_64
vdsm-xmlrpc-4.10.2-23.0.el6ev.noarch
[root@m1 ~]# rpm -qa |grep spice
spice-gtk-0.20-1.el6.x86_64
spice-glib-0.20-1.el6.x86_64
spice-gtk-python-0.20-1.el6.x86_64
spice-vdagent-0.14.0-1.el6.x86_64
spice-client-0.8.2-15.el6.x86_64
spice-server-0.12.4-1.el6.x86_64
[root@m1 ~]# rpm -qa |grep qemu
gpxe-roms-qemu-0.9.7-6.9.el6.noarch
qemu-kvm-rhev-0.12.1.2-2.375.el6.x86_64
qemu-kvm-rhev-tools-0.12.1.2-2.375.el6.x86_64
qemu-kvm-rhev-debuginfo-0.12.1.2-2.375.el6.x86_64
qemu-img-rhev-0.12.1.2-2.375.el6.x86_64
Comment 54 mazhang 2013-07-26 00:07:29 EDT
Reproduce this problem with spice-client-0.8.2-15.el6.x86_64, spice-client-0.10.1-5.fc17.x86_64 can not hit it.


Here is my test matrix.

                                  |    qemu-kvm-rhev-355 | qemu-kvm-rhev-375  
----------------------------------|----------------------|----------------------
spice-client-0.10.1-5.fc17.x86_64 |        pass          |      pass
spice-client-0.8.2-15.el6.x86_64  |        fail          |      fail

It could be a spice client bug.
Comment 56 mazhang 2013-07-26 04:15:20 EDT
sorry, it should be virt-viewer-0.5.3-1.fc17.x86_64 and virt-viewer-0.5.6-2.el6.x86_64
Comment 61 mazhang 2013-07-30 03:26:57 EDT
file a bug 989912 to trace the issue of remote-viewer.
Comment 66 Hu Jianwei 2013-08-08 01:01:59 EDT
As comment 45's result, changed to verified.

For that issue in comment 45 mentioned, there is a bug 989912 to trace the issue of remote-viewer, it's nothing with merged patch in this bug.
Comment 68 errata-xmlrpc 2013-11-21 03:49:18 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1581.html
Comment 69 Eric Blake 2014-03-19 16:04:47 EDT
The fix for this bug caused CVE-2013-7336; which was later fixed by bug 1009886 in libvirt-0.10.2-27.el6.

Note You need to log in before you can comment on or make changes to this bug.