RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1724576 - Qemu core dump when do migration with rdma and multifd both enabled
Summary: Qemu core dump when do migration with rdma and multifd both enabled
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: unspecified
Hardware: Unspecified
OS: Unspecified
medium
unspecified
Target Milestone: rc
: ---
Assignee: Juan Quintela
QA Contact: Li Xiaohui
URL:
Whiteboard:
Depends On:
Blocks: 1753522
TreeView+ depends on / blocked
 
Reported: 2019-06-27 10:55 UTC by Li Xiaohui
Modified: 2021-10-15 07:27 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-15 07:27:04 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Li Xiaohui 2019-06-27 10:55:07 UTC
Description of problem:
Qemu core dump when do migration with rdma and multifs both enabled


Version-Release number of selected component (if applicable):
host info: 
kernel-4.18.0-108.el8.x86_64 & qemu-img-4.0.0-4.module+el8.1.0+3356+cda7f1ee.x86_64

Mellanox cards:
# lspci
01:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

How reproducible:
3/3


Steps to Reproduce:
1.boot guest on src host with commands:
/usr/libexec/qemu-kvm \
-enable-kvm \
-machine q35  \
-m 4096 \
-smp 4 \
-cpu Skylake-Client \
-name debug-threads=on \
-device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
-device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
-device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0-root-port-2,addr=0x0 \
-blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/mnt/nfs/rhel810-64-virtio-scsi-2.qcow2,node-name=my_file \
-blockdev driver=qcow2,node-name=my_disk,file=my_file \
-device scsi-hd,drive=my_disk,bus=virtio_scsi_pci0.0 \
-netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,queues=4 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=70:5a:0f:38:cd:1c,bus=pcie.0-root-port-3,vectors=10,mq=on \
-vnc :0 \
-device VGA \
-monitor stdio \
-qmp tcp:0:1234,server,nowait \
2.start listening port on dst host with commands:
/usr/libexec/qemu-kvm \
-enable-kvm \
-machine q35  \
-m 4096 \
-smp 4 \
-cpu Skylake-Client \
-name debug-threads=on \
-device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
-device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
-device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0-root-port-2,addr=0x0 \
-blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/mnt/nfs/rhel810-64-virtio-scsi-2.qcow2,node-name=my_file \
-blockdev driver=qcow2,node-name=my_disk,file=my_file \
-device scsi-hd,drive=my_disk,bus=virtio_scsi_pci0.0 \
-netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,queues=4 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=70:5a:0f:38:cd:1c,bus=pcie.0-root-port-3,vectors=10,mq=on \
-vnc :0 \
-device VGA \
-monitor stdio \
-qmp tcp:0:1234,server,nowait \
-incoming rdma:0:5555 \
3.enable rdma on src host qemu, enable multifd on both src and dst host qemu:
(src qemu) migrate_set_capability rdma-pin-all on
(src qemu) migrate_set_capability multifd on
(dst qemu) migrate_set_capability multifd on


Actual results:
qemu core dump on src and dst host.
(src host):
(qemu) migrate rdma:192.168.0.21:5555
source_resolve_host RDMA Device opened: kernel name mlx4_0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/mlx4_0, transport: (1) Infiniband
./start2.sh: line 21: 14612 Segmentation fault      (core dumped) /usr/libexec/qemu-kvm -enable-kvm ...

(dst host):
(qemu) dest_init RDMA Device opened: kernel name mlx4_0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/mlx4_0, transport: (1) Infiniband
qemu-kvm: receive cm event, cm event is 10
qemu-kvm: rdma migration: recv polling control error!
qemu-kvm: RDMA is in an error state waiting migration to abort!
qemu-kvm: Not a migration stream
qemu-kvm: load of migration failed: Invalid argument
qemu-kvm: Early error. Sending error.
qemu-kvm: rdma migration: send polling control error
./start2.sh: line 22: 18542 Segmentation fault      (core dumped) /usr/libexec/qemu-kvm ...


Expected results:
no qemu core dump.


Additional info:
when do separate tests for rdma and multifd, they all work well, no qemu core dump.

Comment 1 Li Xiaohui 2019-06-27 10:58:02 UTC
add test step 4 for Steps to Reproduce:
4.do migration on src host qemu:
(src qemu) migrate rdma:192.168.0.21:5555

Comment 6 Juan Quintela 2019-07-29 11:12:40 UTC
Hi

multifd + rdma don't work.  Improving the error message.
But not a lot that we can do about.

Comment 7 Juan Quintela 2019-11-19 14:37:18 UTC
Hi

Multifd + rdma don't work.  I will improve the error message upstream, but that is everything that we can do for now.

Later, Juan.

Comment 8 Ademar Reis 2020-02-05 22:59:46 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 11 RHEL Program Management 2021-03-15 07:37:07 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 12 Li Xiaohui 2021-04-15 13:22:07 UTC
Still hit this bz on rhelav 8.4.0(kernel-4.18.0-304.el8.x86_64&qemu-kvm-5.2.0-14.module+el8.4.0+10425+ad586fa5.x86_64).
So reopen this bz.

Src qemu core dump and dst qemu quit normally:

(src qemu) migrate -d rdma:192.168.0.21:1234
source_resolve_host RDMA Device opened: kernel name mlx4_0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/mlx4_0, transport: (1) Infiniband
(qemu) 2.sh: line 40: 18662 Segmentation fault      (core dumped)

(dst qemu) dest_init RDMA Device opened: kernel name mlx4_0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/mlx4_0, transport: (1) Infiniband
qemu-kvm: receive cm event, cm event is 10
qemu-kvm: rdma migration: recv polling control error!
qemu-kvm: RDMA is in an error state waiting migration to abort!
qemu-kvm: Not a migration stream
qemu-kvm: load of migration failed: Invalid argument

Comment 14 Juan Quintela 2021-09-09 11:26:25 UTC
Patches for this are on a PULL request upstream, as soon as the pull gets integrated I will backport them.

Comment 15 John Ferlan 2021-09-09 15:14:04 UTC
Bulk update: Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release.

Comment 17 RHEL Program Management 2021-10-15 07:27:04 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.


Note You need to log in before you can comment on or make changes to this bug.