Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1298776

Summary:

DPDK Live migration using virsh introduced >500ms downtime

Product:

Red Hat Enterprise Linux 7

Reporter:

Peter Xu <peterx>

Component:

qemu-kvm-rhev

Assignee:

Peter Xu <peterx>

Status:

CLOSED NOTABUG

QA Contact:

Virtualization Bugs <virt-bugs>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

7.3

CC:

berrange, dgilbert, dyuan, hhuang, huding, jean-mickael.guerin, jsuchane, juzhang, knoel, lhuang, mgandolf, peterx, pezhang, rbalakri, samuel.gauthier, thibaut.collet, vincent.jardin, virt-maint, weliao, xfu, zpeng

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-09-02 02:44:19 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1175463, 1193173, 1305606, 1313485

Attachments:

Description	Flags
/var/log/libvirt/qemu/migrate_vm.log for both hosts	none
All the scripts used to verify the bz (with mig_mon client_rr and server_rr commands)	none

Comment 7 Peter Xu 2016-01-16 04:41:50 UTC

Created attachment 1115384 [details]
/var/log/libvirt/qemu/migrate_vm.log for both hosts

Comment 20 Jiri Denemark 2016-08-24 03:42:26 UTC

Libvirt doesn't set any downtime unless explicitly asked to, so the QEMU default is applied here.

The default speed set by libvirt is INT64_MAX on x86_64, which is 8P if I counted it correctly.

Comment 21 Dr. David Alan Gilbert 2016-08-24 19:17:17 UTC

I think if we're benchmarking downtime then it's best to set the bandwidth to something sensible;  I'm not sure it makes a difference but it feels right to do it.
I *think* qemu's default downtime is 300ms, so while it doesn't get you 500ms it does get you most of it!

Comment 22 Peter Xu 2016-08-25 03:40:35 UTC

(In reply to Dr. David Alan Gilbert from comment #21)
> I think if we're benchmarking downtime then it's best to set the bandwidth
> to something sensible;  I'm not sure it makes a difference but it feels
> right to do it.

Yes it sounds making sense. My old tests didn't take these parameters into account (all with default ones). That might be the reason why libvirt got different results (libvirt is setting speed to MAX, thanks Jiri for providing this info).

From now on I will play with sensible values for these two.

> I *think* qemu's default downtime is 300ms, so while it doesn't get you
> 500ms it does get you most of it!

The problem is why I was getting 500ms even I set downtime to 100ms.

One thing I want to do is enhance my mig_mon tool to at least use host time for measuring downtime, rather than use the time in the migrating guest, to avoid the possiblility that guest time may not be stable in some way.

One question that is totally not related to this bz: do we support postcopy for vhost-user migration? I played with it a bit and I got this:

qemu-kvm: postcopy_ram_discard_range MADV_DONTNEED: Invalid argument
qemu-kvm: load of migration failed: Operation not permitted
qemu-kvm: socket_writev_buffer: Got err=32 for (131788/18446744073709551615)

QEMU parameter is:

$qemu -enable-kvm -m 1024 \
      -monitor telnet::333${index},server,nowait \
      -chardev socket,id=char0,path=/usr/local/var/run/openvswitch/vhost-user1  \
      -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
      -device virtio-net-pci,netdev=mynet1,mac=52:54:00:12:34:58 \
      -object memory-backend-file,id=mem,size=1024M,mem-path=/dev/hugepages,share=on \
      -spice port=590${index},disable-ticketing \
      -numa node,memdev=mem -mem-prealloc \
      /root/remote/vm1.img \

Please just hint if there is quick answer (guest memory is on huge pages, and share enabled). Otherwise I'll check it out after I could figure out why precopy is getting these 500ms downs (I hope after I enhance my tool, the spike goes away).

Comment 23 Dr. David Alan Gilbert 2016-08-25 10:22:49 UTC

(In reply to Peter Xu from comment #22)
> (In reply to Dr. David Alan Gilbert from comment #21)
> > I think if we're benchmarking downtime then it's best to set the bandwidth
> > to something sensible;  I'm not sure it makes a difference but it feels
> > right to do it.
> 
> Yes it sounds making sense. My old tests didn't take these parameters into
> account (all with default ones). That might be the reason why libvirt got
> different results (libvirt is setting speed to MAX, thanks Jiri for
> providing this info).
> 
> From now on I will play with sensible values for these two.
> 
> > I *think* qemu's default downtime is 300ms, so while it doesn't get you
> > 500ms it does get you most of it!
> 
> The problem is why I was getting 500ms even I set downtime to 100ms.
> 
> One thing I want to do is enhance my mig_mon tool to at least use host time
> for measuring downtime, rather than use the time in the migrating guest, to
> avoid the possiblility that guest time may not be stable in some way.

Oh yes, I wouldn't trust guest time for that.

> One question that is totally not related to this bz: do we support postcopy
> for vhost-user migration? I played with it a bit and I got this:
> 
> qemu-kvm: postcopy_ram_discard_range MADV_DONTNEED: Invalid argument
> qemu-kvm: load of migration failed: Operation not permitted
> qemu-kvm: socket_writev_buffer: Got err=32 for (131788/18446744073709551615)

I've not tried vhost-user, but....

> QEMU parameter is:
> 
> $qemu -enable-kvm -m 1024 \
>       -monitor telnet::333${index},server,nowait \
>       -chardev
> socket,id=char0,path=/usr/local/var/run/openvswitch/vhost-user1  \
>       -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
>       -device virtio-net-pci,netdev=mynet1,mac=52:54:00:12:34:58 \
>       -object
> memory-backend-file,id=mem,size=1024M,mem-path=/dev/hugepages,share=on \

We don't support huge page mapping in postcopy; so that's the most likely cause of that error.

>       -spice port=590${index},disable-ticketing \
>       -numa node,memdev=mem -mem-prealloc \
>       /root/remote/vm1.img \
> 
> Please just hint if there is quick answer (guest memory is on huge pages,
> and share enabled). Otherwise I'll check it out after I could figure out why
> precopy is getting these 500ms downs (I hope after I enhance my tool, the
> spike goes away).

Comment 24 Jiri Denemark 2016-08-31 13:20:47 UTC

It looks like this issue can be reproduced even without libvirt and the investigation is ongoing there anyway... moving to qemu-kvm-rhev.

Comment 25 Peter Xu 2016-09-01 08:16:23 UTC

I enhanced my testing script on measuring downtime:

https://github.com/xzpeter/clibs/blob/master/bsd/mig_mon/mig_mon.c

And provided a new way to measure the downtime in commit:

https://github.com/xzpeter/clibs/commit/81e6570c04c4d934e5b6165287e6a246bd5fadb3

After using the new tool, the spikes are gone.

----------------------------------------------

Here are the changed steps to run the test:

1. on two hosts, install latest ovs (dd52de45b719da1e52cc6894e245198fda5a748e, 2016-08-10). Need to download dpdk-16.07.zip first, compile DPDK (commenting out *KNI* entries in .config), compile OVS, and install OVS.

2. Install all the testing programs on host1 and guest (scripts will be uploaded later, mig_mon should be compiled from above source).

3. Make sure each of the two hosts have a 10G card, two ports (p2p1, p2p2) are connected directly. In this test, I am using p2p1 to connect to OVS vswitch, and using p2p2 to transfer live migration data (I need to pre-configure IP for p2p2, in my case 1.2.4.10/24 and 1.2.4.11/24 on two hosts correspondingly).

4. Run "prepare_migration.sh" on each of the two hosts: this will setup OVS vswitchs on each host. Also, do the NFS mounting, etc.

5. Run "start_migration.sh" on host1, wait for guest to boot up

6. In the guest, run:

  # ./mig_mon server_rr

7. In host 1, run

  # ./mig_mon client_rr 1.2.3.4 30

  Here 1.2.3.4 is guest IP, 30 (ms) is interval to send UDP package (also, the timeout for each UDP receive)

8. Hit enter in "start_migration.sh" to let the test continue. It will do ping-pong migration between two hosts, while downtime is measured using mig_mon along the way. 

Using "server_rr" and "client_rr" command of mig_mon, no spike is observed (it will capture all spike > 30ms*2=60ms). Actually what I saw is that maximum downtime is 33ms. This satisfy our basic need.

Comment 26 Peter Xu 2016-09-01 08:20:56 UTC

So basically I am 99% sure that the bz is caused by incorrect measurement on downtime (e.g., sampling timestamp in the moving guest, instead, I should sample the time in a stable host). The only thing missing is to confirm the problem, and why time shifted. 

However, that's another story (and I actually do not sure that we can provide a very stable timing in migrating guests if without the help of NTP or something alike). So if no one disagree, I would like to mark this bz as NOTABUG.

Comment 27 Peter Xu 2016-09-01 08:24:18 UTC

Created attachment 1196620 [details]
All the scripts used to verify the bz (with mig_mon client_rr and server_rr commands)