Bug 1541881
Summary: | When doing live migration over dpdk/vhost-user, dpdk errors can cause qemu hang at recvmsg () | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Pei Zhang <pezhang> |
Component: | openvswitch | Assignee: | Matteo Croce <mcroce> |
Status: | CLOSED ERRATA | QA Contact: | Pei Zhang <pezhang> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 7.5 | CC: | ailan, atragler, chayang, dgilbert, juzhang, knoel, maxime.coquelin, michen, peterx, virt-maint |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-05-03 14:23:57 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1538953, 1560628 |
Description
Pei Zhang
2018-02-05 06:30:33 UTC
Patches posted upstream and merged in to DPDK master, will be in upcoming v18.02 release, and queued for next v17.11 LTS release: commit 82b9c1540348b6be7996203065e10421e953cea9 Author: Maxime Coquelin <maxime.coquelin> Date: Mon Feb 5 16:04:57 2018 +0100 vhost: remove pending IOTLB entry if miss request failed In case vhost_user_iotlb_miss returns an error, the pending IOTLB entry has to be removed from the list as no IOTLB update will be received. Fixes: fed67a20ac94 ("vhost: introduce guest IOVA to backend VA helper") Cc: stable Suggested-by: Tiwei Bie <tiwei.bie> Signed-off-by: Maxime Coquelin <maxime.coquelin> commit 37771844a05c7b0a7b039dcae1b4b0a69b4acced Author: Maxime Coquelin <maxime.coquelin> Date: Mon Feb 5 16:04:56 2018 +0100 vhost: fix IOTLB pool out-of-memory handling In the unlikely case the IOTLB memory pool runs out of memory, an issue may happen if all entries are used by the IOTLB cache, and an IOTLB miss happen. If the iotlb pending list is empty, then no memory is freed and allocation fails a second time. This patch fixes this by doing an IOTLB cache random evict if the IOTLB pending list is empty, ensuring the second allocation try will succeed. In the same spirit, the opposite is done when inserting an IOTLB entry in the IOTLB cache fails due to out of memory. In this case, the IOTLB pending is flushed if the IOTLB cache is empty to ensure the new entry can be inserted. Fixes: d012d1f293f4 ("vhost: add IOTLB helper functions") Fixes: f72c2ad63aeb ("vhost: add pending IOTLB miss request list and helpers") Cc: stable Signed-off-by: Maxime Coquelin <maxime.coquelin> Assigning to Matteo for backport. Thanks, Maxime Got it! Regards, This issue has been fixed well. Versions: 3.10.0-862.el7.x86_64 qemu-kvm-rhev-2.10.0-21.el7_5.2.x86_64 libvirt-3.9.0-14.el7.x86_64 dpdk-17.11-9.el7fdb.x86_64 openvswitch-2.9.0-15.el7fdp.x86_64 tuned-2.9.0-1.el7.noarch (1)PVP live migration: PASS =========================Stream Rate: 1Mpps===================== No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss 0 1Mpps 132 17985 17 13905081.0 1 1Mpps 125 19738 15 12693592.0 2 1Mpps 128 18629 14 6022531.0 3 1Mpps 125 18623 15 12730592.0 4 1Mpps 129 19396 15 14879368.0 5 1Mpps 126 18717 16 6628282.0 6 1Mpps 120 19164 14 12151658.0 7 1Mpps 132 18893 15 12075355.0 8 1Mpps 124 19046 14 12115482.0 9 1Mpps 130 19258 16 12263311.0 <------------------------Summary------------------------> Max 1Mpps 132 19738 17 14879368 Min 1Mpps 120 17985 14 6022531 Mean 1Mpps 127 18944 15 11546525 Median 1Mpps 127 18969 15 12207484 Stdev 0 3.81 490.83 0.99 2897803.22 (2) Live migration over Open vSwitch: PASS =======================Stream Rate: 1Mpps========================= No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss 0 1Mpps 132 16140 15 789681.0 1 1Mpps 121 14862 13 5363237.0 2 1Mpps 121 17878 114 10558667.0 3 1Mpps 119 18357 114 11016673.0 4 1Mpps 121 15874 14 8146823.0 5 1Mpps 119 17038 112 6472610.0 6 1Mpps 114 14882 13 6177523.0 7 1Mpps 120 17951 114 7082521.0 8 1Mpps 123 17391 114 3009150.0 9 1Mpps 117 16279 112 2434152.0 <------------------------Summary------------------------> Max 1Mpps 132 18357 114 11016673 Min 1Mpps 114 14862 13 789681 Mean 1Mpps 120 16665 73 6105103 Median 1Mpps 120 16658 112 6325066 Stdev 0 4.69 1253.19 51.43 3351399.49 (Beaker job: https://beaker.engineering.redhat.com/recipes/5045063#tasks) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:1267 *** Bug 1533408 has been marked as a duplicate of this bug. *** |