Bug 1738768
Summary: | Guest fails to recover receiving packets after vhost-user reconnect | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Pei Zhang <pezhang> | |
Component: | qemu-kvm | Assignee: | Adrián Moreno <amorenoz> | |
Status: | CLOSED ERRATA | QA Contact: | Pei Zhang <pezhang> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 8.1 | CC: | aadam, ailan, amorenoz, cfontain, chayang, jinzhao, juzhang, mtessun, virt-maint | |
Target Milestone: | rc | Keywords: | Regression | |
Target Release: | --- | Flags: | amorenoz:
needinfo-
|
|
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | qemu-kvm-4.1.0-16.module+el8.1.1+4917+752cfd65 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1780498 (view as bug list) | Environment: | ||
Last Closed: | 2020-02-04 18:28:48 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1780498, 1791983 |
Description
Pei Zhang
2019-08-08 06:14:22 UTC
Below latest version still can hit this issue: 4.18.0-132.el8.x86_64 qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64 Reproducing it with testpmd yields: > VHOST_CONFIG: negotiated Virtio features: 0x40000000 So clearly the virtio featuers are not properly saved / restored. That narrows the possible regression to the following commit: commit 6ab79a20af3a7b3bf610ba9aebb446a9f0b05930 Author: Dan Streetman <ddstreet> Date: Tue Apr 16 14:46:24 2019 -0400 do not call vhost_net_cleanup() on running net from char user event Buglink: https://launchpad.net/bugs/1823458 Currently, a user CHR_EVENT_CLOSED event will cause net_vhost_user_event() to call vhost_user_cleanup(), which calls vhost_net_cleanup() for all its queues. However, vhost_net_cleanup() must never be called like this for fully-initialized nets; when other code later calls vhost_net_stop() - such as from virtio_net_vhost_status() - it will try to access the already-cleaned-up fields and fail with assertion errors or segfaults. The vhost_net_cleanup() will eventually be called from qemu_cleanup_net_client(). Signed-off-by: Dan Streetman <ddstreet> Message-Id: <20190416184624.15397-3-dan.streetman> Reviewed-by: Michael S. Tsirkin <mst> Signed-off-by: Michael S. Tsirkin <mst> diff --git a/net/vhost-user.c b/net/vhost-user.c index 5a26a24708..51921de443 100644 --- a/net/vhost-user.c +++ b/net/vhost-user.c @@ -236,7 +236,6 @@ static void chr_closed_bh(void *opaque) s = DO_UPCAST(NetVhostUserState, nc, ncs[0]); qmp_set_link(name, false, &err); - vhost_user_stop(queues, ncs); qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, net_vhost_user_event, NULL, opaque, NULL, true); Indeed, vhost_user_stop is no longer called: static void vhost_user_stop(int queues, NetClientState *ncs[]) { NetVhostUserState *s; int i; for (i = 0; i < queues; i++) { assert(ncs[i]->info->type == NET_CLIENT_DRIVER_VHOST_USER); s = DO_UPCAST(NetVhostUserState, nc, ncs[i]); if (s->vhost_net) { /* save acked features */ uint64_t features = vhost_net_get_acked_features(s->vhost_net); if (features) { s->acked_features = features; } vhost_net_cleanup(s->vhost_net); } } } A patch has been sent upstream [1] but it missed the AV8.1. [1] https://patchew.org/QEMU/20190924162044.11414-1-amorenoz@redhat.com/ Moving it to AV8.1.1 Hi Adrian, This issue still exits with qemu-kvm-4.1.0-16.module+el8.1.1+4917+752cfd65.x86_64. Testpmd network still can not be covered. Could you check, please? Move bug status from 'ON_QA' to 'ASSIGNED'. Best regards, Pei It must be related to a new issue. Can you please attach logs from qemu and ovs side? Thanks (In reply to Adrián Moreno from comment #11) > It must be related to a new issue. > Can you please attach logs from qemu and ovs side? > Thanks (1)qemu-kvm-4.1.0-15.module+el8.1.1+4700+209eec8f.x86_64 (reproduce version) (2)qemu-kvm-4.1.0-20.module+el8.1.1+5309+6d656f05.x86_64 (with fixed patch version) After comparing version (1) and (2), there are differences: With version (1), testpmd both TX and RX stop receiving/sending packets after ovs vhost-user reconnect: testpmd> show port stats all ######################## NIC statistics for port 0 ######################## RX-packets: 1422265 RX-missed: 0 RX-bytes: 85335900 RX-errors: 0 RX-nombuf: 0 TX-packets: 1420379 TX-errors: 0 TX-bytes: 85222740 Throughput (since last show) Rx-pps: 0 Rx-bps: 0 Tx-pps: 0 Tx-bps: 0 ############################################################################ ######################## NIC statistics for port 1 ######################## RX-packets: 1421558 RX-missed: 0 RX-bytes: 85293480 RX-errors: 0 RX-nombuf: 0 TX-packets: 1421110 TX-errors: 0 TX-bytes: 85266600 Throughput (since last show) Rx-pps: 0 Rx-bps: 0 Tx-pps: 0 Tx-bps: 0 ############################################################################ With version (2), testpmd RX can receive packets. But all receiving packets show errors after ovs vhost-user reconnect: testpmd> show port stats all ######################## NIC statistics for port 0 ######################## RX-packets: 1139121 RX-missed: 0 RX-bytes: 68347260 RX-errors: 1476988 RX-nombuf: 0 TX-packets: 1137417 TX-errors: 0 TX-bytes: 68245020 Throughput (since last show) Rx-pps: 0 Rx-bps: 0 Tx-pps: 0 Tx-bps: 0 ############################################################################ ######################## NIC statistics for port 1 ######################## RX-packets: 1138563 RX-missed: 0 RX-bytes: 68313780 RX-errors: 1479063 RX-nombuf: 0 TX-packets: 1137968 TX-errors: 0 TX-bytes: 68278080 Throughput (since last show) Rx-pps: 0 Rx-bps: 0 Tx-pps: 0 Tx-bps: 0 ############################################################################ I prefer this bug has been fixed. And Adrián has filed a new bz (Bug 1782528 - qemu-kvm: event flood when vhost-user backed virtio netdev is unexpectedly closed while guest is transmitting) to track the new issue which is the RX-errors issue. Move this bug to 'VERIFIED'. Hi Adrián, Feel free to correct me if you disagree. Thanks. (And sorry for my late response) Best regards, Pei *** Bug 1791904 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0404 |