Bug 1298776
Summary: | DPDK Live migration using virsh introduced >500ms downtime | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Peter Xu <peterx> | ||||||
Component: | qemu-kvm-rhev | Assignee: | Peter Xu <peterx> | ||||||
Status: | CLOSED NOTABUG | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 7.3 | CC: | berrange, dgilbert, dyuan, hhuang, huding, jean-mickael.guerin, jsuchane, juzhang, knoel, lhuang, mgandolf, peterx, pezhang, rbalakri, samuel.gauthier, thibaut.collet, vincent.jardin, virt-maint, weliao, xfu, zpeng | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2016-09-02 02:44:19 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1175463, 1193173, 1305606, 1313485 | ||||||||
Attachments: |
|
Libvirt doesn't set any downtime unless explicitly asked to, so the QEMU default is applied here. The default speed set by libvirt is INT64_MAX on x86_64, which is 8P if I counted it correctly. I think if we're benchmarking downtime then it's best to set the bandwidth to something sensible; I'm not sure it makes a difference but it feels right to do it. I *think* qemu's default downtime is 300ms, so while it doesn't get you 500ms it does get you most of it! (In reply to Dr. David Alan Gilbert from comment #21) > I think if we're benchmarking downtime then it's best to set the bandwidth > to something sensible; I'm not sure it makes a difference but it feels > right to do it. Yes it sounds making sense. My old tests didn't take these parameters into account (all with default ones). That might be the reason why libvirt got different results (libvirt is setting speed to MAX, thanks Jiri for providing this info). From now on I will play with sensible values for these two. > I *think* qemu's default downtime is 300ms, so while it doesn't get you > 500ms it does get you most of it! The problem is why I was getting 500ms even I set downtime to 100ms. One thing I want to do is enhance my mig_mon tool to at least use host time for measuring downtime, rather than use the time in the migrating guest, to avoid the possiblility that guest time may not be stable in some way. One question that is totally not related to this bz: do we support postcopy for vhost-user migration? I played with it a bit and I got this: qemu-kvm: postcopy_ram_discard_range MADV_DONTNEED: Invalid argument qemu-kvm: load of migration failed: Operation not permitted qemu-kvm: socket_writev_buffer: Got err=32 for (131788/18446744073709551615) QEMU parameter is: $qemu -enable-kvm -m 1024 \ -monitor telnet::333${index},server,nowait \ -chardev socket,id=char0,path=/usr/local/var/run/openvswitch/vhost-user1 \ -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \ -device virtio-net-pci,netdev=mynet1,mac=52:54:00:12:34:58 \ -object memory-backend-file,id=mem,size=1024M,mem-path=/dev/hugepages,share=on \ -spice port=590${index},disable-ticketing \ -numa node,memdev=mem -mem-prealloc \ /root/remote/vm1.img \ Please just hint if there is quick answer (guest memory is on huge pages, and share enabled). Otherwise I'll check it out after I could figure out why precopy is getting these 500ms downs (I hope after I enhance my tool, the spike goes away). (In reply to Peter Xu from comment #22) > (In reply to Dr. David Alan Gilbert from comment #21) > > I think if we're benchmarking downtime then it's best to set the bandwidth > > to something sensible; I'm not sure it makes a difference but it feels > > right to do it. > > Yes it sounds making sense. My old tests didn't take these parameters into > account (all with default ones). That might be the reason why libvirt got > different results (libvirt is setting speed to MAX, thanks Jiri for > providing this info). > > From now on I will play with sensible values for these two. > > > I *think* qemu's default downtime is 300ms, so while it doesn't get you > > 500ms it does get you most of it! > > The problem is why I was getting 500ms even I set downtime to 100ms. > > One thing I want to do is enhance my mig_mon tool to at least use host time > for measuring downtime, rather than use the time in the migrating guest, to > avoid the possiblility that guest time may not be stable in some way. Oh yes, I wouldn't trust guest time for that. > One question that is totally not related to this bz: do we support postcopy > for vhost-user migration? I played with it a bit and I got this: > > qemu-kvm: postcopy_ram_discard_range MADV_DONTNEED: Invalid argument > qemu-kvm: load of migration failed: Operation not permitted > qemu-kvm: socket_writev_buffer: Got err=32 for (131788/18446744073709551615) I've not tried vhost-user, but.... > QEMU parameter is: > > $qemu -enable-kvm -m 1024 \ > -monitor telnet::333${index},server,nowait \ > -chardev > socket,id=char0,path=/usr/local/var/run/openvswitch/vhost-user1 \ > -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \ > -device virtio-net-pci,netdev=mynet1,mac=52:54:00:12:34:58 \ > -object > memory-backend-file,id=mem,size=1024M,mem-path=/dev/hugepages,share=on \ We don't support huge page mapping in postcopy; so that's the most likely cause of that error. > -spice port=590${index},disable-ticketing \ > -numa node,memdev=mem -mem-prealloc \ > /root/remote/vm1.img \ > > Please just hint if there is quick answer (guest memory is on huge pages, > and share enabled). Otherwise I'll check it out after I could figure out why > precopy is getting these 500ms downs (I hope after I enhance my tool, the > spike goes away). It looks like this issue can be reproduced even without libvirt and the investigation is ongoing there anyway... moving to qemu-kvm-rhev. I enhanced my testing script on measuring downtime: https://github.com/xzpeter/clibs/blob/master/bsd/mig_mon/mig_mon.c And provided a new way to measure the downtime in commit: https://github.com/xzpeter/clibs/commit/81e6570c04c4d934e5b6165287e6a246bd5fadb3 After using the new tool, the spikes are gone. ---------------------------------------------- Here are the changed steps to run the test: 1. on two hosts, install latest ovs (dd52de45b719da1e52cc6894e245198fda5a748e, 2016-08-10). Need to download dpdk-16.07.zip first, compile DPDK (commenting out *KNI* entries in .config), compile OVS, and install OVS. 2. Install all the testing programs on host1 and guest (scripts will be uploaded later, mig_mon should be compiled from above source). 3. Make sure each of the two hosts have a 10G card, two ports (p2p1, p2p2) are connected directly. In this test, I am using p2p1 to connect to OVS vswitch, and using p2p2 to transfer live migration data (I need to pre-configure IP for p2p2, in my case 1.2.4.10/24 and 1.2.4.11/24 on two hosts correspondingly). 4. Run "prepare_migration.sh" on each of the two hosts: this will setup OVS vswitchs on each host. Also, do the NFS mounting, etc. 5. Run "start_migration.sh" on host1, wait for guest to boot up 6. In the guest, run: # ./mig_mon server_rr 7. In host 1, run # ./mig_mon client_rr 1.2.3.4 30 Here 1.2.3.4 is guest IP, 30 (ms) is interval to send UDP package (also, the timeout for each UDP receive) 8. Hit enter in "start_migration.sh" to let the test continue. It will do ping-pong migration between two hosts, while downtime is measured using mig_mon along the way. Using "server_rr" and "client_rr" command of mig_mon, no spike is observed (it will capture all spike > 30ms*2=60ms). Actually what I saw is that maximum downtime is 33ms. This satisfy our basic need. So basically I am 99% sure that the bz is caused by incorrect measurement on downtime (e.g., sampling timestamp in the moving guest, instead, I should sample the time in a stable host). The only thing missing is to confirm the problem, and why time shifted. However, that's another story (and I actually do not sure that we can provide a very stable timing in migrating guests if without the help of NTP or something alike). So if no one disagree, I would like to mark this bz as NOTABUG. Created attachment 1196620 [details]
All the scripts used to verify the bz (with mig_mon client_rr and server_rr commands)
|
Created attachment 1115384 [details] /var/log/libvirt/qemu/migrate_vm.log for both hosts