Bug 1666601
Summary: | [q35] dst qemu core dumped when do rdma migration with Mellanox IB QDR card | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Yiqian Wei <yiwei> |
Component: | qemu-kvm | Assignee: | Dr. David Alan Gilbert <dgilbert> |
Status: | CLOSED ERRATA | QA Contact: | Li Xiaohui <xiaohli> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 8.0 | CC: | chayang, ddepaula, fjin, jinzhao, juzhang, lvivier, peterx, quintela, rbalakri, ribarry, virt-maint, xianwang, yiwei, yuhuang |
Target Milestone: | rc | Keywords: | Regression |
Target Release: | 8.0 | Flags: | pm-rhel:
mirror+
|
Hardware: | x86_64 | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-3.1.0-9.module+el8+2731+e40e7b84 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-05-29 16:05:29 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Yiqian Wei
2019-01-16 07:47:14 UTC
I can't reproduce this bug with slow train. host version: qemu-kvm-2.12.0-57.module+el8+2683+02b3b955.x86_64 kernel-4.18.0-60.el8.x86_64 seabios-1.11.1-3.module+el8+2529+a9686a4d.x86_64 Hi, Does this happen only for windows guests, or does it also happen on a Linux guest? Please attach a full backtrace for crashing bugs. Thanks. (In reply to Dr. David Alan Gilbert from comment #4) > Hi, > Does this happen only for windows guests, or does it also happen on a > Linux guest? No,it also happen on a Linux guest(rhel8 guest) > Please attach a full backtrace for crashing bugs. backtrace: (gdb) bt #0 0x00007f72318bbfcc in rdma_get_cm_event.part () from /lib64/librdmacm.so.1 #1 0x00005617cd22ced4 in rdma_cm_poll_handler () #2 0x00005617cd340d22 in aio_dispatch_handlers () #3 0x00005617cd34162c in aio_dispatch () #4 0x00005617cd33e1d2 in aio_ctx_dispatch () #5 0x00007f7231f3989d in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #6 0x00005617cd3408a8 in main_loop_wait () #7 0x00005617cd133e99 in main_loop () #8 0x00005617ccff43f4 in main () Yes, reproduced here going 7->8 on virtlab 414->413: /usr/libexec/qemu-kvm -M pc-q35-rhel7.6.0,accel=kvm,kernel-irqchip=split -device intel-iommu,intremap=on -cpu host -m 4G -smp 2 -enable-kvm -vga qxl -device pcie-root-port,bus=pcie.0,id=root0,slot=1 -object secret,id=sec0,data=redhat -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=root0 -drive if=none,file=/home/vms/f27.qcow2,cache=none,id=disk -device scsi-hd,drive=disk,bus=virtio_scsi_pci0.0 -monitor stdio (gdb) bt full #0 0x00007ffff7034fcc in rdma_get_cm_event.part () at /lib64/librdmacm.so.1 #1 0x0000555555a75ed4 in rdma_cm_poll_handler (opaque=0x7fffe806b010) at migration/rdma.c:3236 rdma = 0x7fffe806b010 ret = <optimized out> cm_event = 0x5555564cc3e0 mis = 0x5555564e1ee0 (gdb) p mis->state $3 = 8 which I think is 'completed' (gdb) p rdma->channel $5 = (struct rdma_event_channel *) 0x0 Broke somewhere between 3.0.0 and 3.1.0 upstream git bisect says: 6ef3771c0d070e8f16e12f21e4fbf1ec6459eff6 fails (double check) 6c97ec5f5ad6f65f8a6a9be044c2b875972406e4 good (double check) and I've double checked them; so this points to: 6ef3771c0d070e8f16e12f21e4fbf1ec6459eff6 is the first bad commit commit 6ef3771c0d070e8f16e12f21e4fbf1ec6459eff6 Author: Xiao Guangrong <xiaoguangrong> Date: Tue Aug 21 16:10:23 2018 +0800 migration: drop the return value of do_compress_ram_page It is not used and cleans the code up a little Reviewed-by: Peter Xu <peterx> Signed-off-by: Xiao Guangrong <xiaoguangrong> Reviewed-by: Juan Quintela <quintela> Signed-off-by: Juan Quintela <quintela> but the patch looks fine to me. hmm. It's nothing to do with where that bisect ended up, it's a race so a lot of things can change it, so the bisect isn't valid; fix posted upstream: Subject: [PATCH] migration/rdma: unegister fd handler Merged upstream as fbbaacab2758cb3f32a07524710533b1d6422be4 Defining ITR as 8.0.0.0 please change this in case it's not accurate. Fix included in qemu-kvm-3.1.0-9.module+el8+2731+e40e7b84 Verify: qemu-kvm-3.1.0-15.module+el8+2792+e33e01a0 Guest works well after rdma migration, no core dumped. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1293 |