Bug 1666601
| Summary: | [q35] dst qemu core dumped when do rdma migration with Mellanox IB QDR card | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Yiqian Wei <yiwei> |
| Component: | qemu-kvm | Assignee: | Dr. David Alan Gilbert <dgilbert> |
| Status: | CLOSED ERRATA | QA Contact: | Li Xiaohui <xiaohli> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 8.0 | CC: | chayang, ddepaula, fjin, jinzhao, juzhang, lvivier, peterx, quintela, rbalakri, ribarry, virt-maint, xianwang, yiwei, yuhuang |
| Target Milestone: | rc | Keywords: | Regression |
| Target Release: | 8.0 | Flags: | pm-rhel:
mirror+
|
| Hardware: | x86_64 | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | qemu-kvm-3.1.0-9.module+el8+2731+e40e7b84 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-05-29 16:05:29 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Yiqian Wei
2019-01-16 07:47:14 UTC
I can't reproduce this bug with slow train. host version: qemu-kvm-2.12.0-57.module+el8+2683+02b3b955.x86_64 kernel-4.18.0-60.el8.x86_64 seabios-1.11.1-3.module+el8+2529+a9686a4d.x86_64 Hi, Does this happen only for windows guests, or does it also happen on a Linux guest? Please attach a full backtrace for crashing bugs. Thanks. (In reply to Dr. David Alan Gilbert from comment #4) > Hi, > Does this happen only for windows guests, or does it also happen on a > Linux guest? No,it also happen on a Linux guest(rhel8 guest) > Please attach a full backtrace for crashing bugs. backtrace: (gdb) bt #0 0x00007f72318bbfcc in rdma_get_cm_event.part () from /lib64/librdmacm.so.1 #1 0x00005617cd22ced4 in rdma_cm_poll_handler () #2 0x00005617cd340d22 in aio_dispatch_handlers () #3 0x00005617cd34162c in aio_dispatch () #4 0x00005617cd33e1d2 in aio_ctx_dispatch () #5 0x00007f7231f3989d in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #6 0x00005617cd3408a8 in main_loop_wait () #7 0x00005617cd133e99 in main_loop () #8 0x00005617ccff43f4 in main () Yes, reproduced here going 7->8 on virtlab 414->413:
/usr/libexec/qemu-kvm -M pc-q35-rhel7.6.0,accel=kvm,kernel-irqchip=split -device intel-iommu,intremap=on -cpu host -m 4G -smp 2 -enable-kvm -vga qxl -device pcie-root-port,bus=pcie.0,id=root0,slot=1 -object secret,id=sec0,data=redhat -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=root0 -drive if=none,file=/home/vms/f27.qcow2,cache=none,id=disk -device scsi-hd,drive=disk,bus=virtio_scsi_pci0.0 -monitor stdio
(gdb) bt full
#0 0x00007ffff7034fcc in rdma_get_cm_event.part () at /lib64/librdmacm.so.1
#1 0x0000555555a75ed4 in rdma_cm_poll_handler (opaque=0x7fffe806b010) at migration/rdma.c:3236
rdma = 0x7fffe806b010
ret = <optimized out>
cm_event = 0x5555564cc3e0
mis = 0x5555564e1ee0
(gdb) p mis->state
$3 = 8
which I think is 'completed'
(gdb) p rdma->channel
$5 = (struct rdma_event_channel *) 0x0
Broke somewhere between 3.0.0 and 3.1.0 upstream git bisect says:
6ef3771c0d070e8f16e12f21e4fbf1ec6459eff6 fails (double check)
6c97ec5f5ad6f65f8a6a9be044c2b875972406e4 good (double check)
and I've double checked them; so this points to:
6ef3771c0d070e8f16e12f21e4fbf1ec6459eff6 is the first bad commit
commit 6ef3771c0d070e8f16e12f21e4fbf1ec6459eff6
Author: Xiao Guangrong <xiaoguangrong>
Date: Tue Aug 21 16:10:23 2018 +0800
migration: drop the return value of do_compress_ram_page
It is not used and cleans the code up a little
Reviewed-by: Peter Xu <peterx>
Signed-off-by: Xiao Guangrong <xiaoguangrong>
Reviewed-by: Juan Quintela <quintela>
Signed-off-by: Juan Quintela <quintela>
but the patch looks fine to me. hmm.
It's nothing to do with where that bisect ended up, it's a race so a lot of things can change it, so the bisect isn't valid; fix posted upstream: Subject: [PATCH] migration/rdma: unegister fd handler Merged upstream as fbbaacab2758cb3f32a07524710533b1d6422be4 Defining ITR as 8.0.0.0 please change this in case it's not accurate. Fix included in qemu-kvm-3.1.0-9.module+el8+2731+e40e7b84 Verify: qemu-kvm-3.1.0-15.module+el8+2792+e33e01a0 Guest works well after rdma migration, no core dumped. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1293 |