Description of problem: Qemu core dump when do migration with rdma and multifs both enabled Version-Release number of selected component (if applicable): host info: kernel-4.18.0-108.el8.x86_64 & qemu-img-4.0.0-4.module+el8.1.0+3356+cda7f1ee.x86_64 Mellanox cards: # lspci 01:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3] How reproducible: 3/3 Steps to Reproduce: 1.boot guest on src host with commands: /usr/libexec/qemu-kvm \ -enable-kvm \ -machine q35 \ -m 4096 \ -smp 4 \ -cpu Skylake-Client \ -name debug-threads=on \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0-root-port-2,addr=0x0 \ -blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/mnt/nfs/rhel810-64-virtio-scsi-2.qcow2,node-name=my_file \ -blockdev driver=qcow2,node-name=my_disk,file=my_file \ -device scsi-hd,drive=my_disk,bus=virtio_scsi_pci0.0 \ -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,queues=4 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=70:5a:0f:38:cd:1c,bus=pcie.0-root-port-3,vectors=10,mq=on \ -vnc :0 \ -device VGA \ -monitor stdio \ -qmp tcp:0:1234,server,nowait \ 2.start listening port on dst host with commands: /usr/libexec/qemu-kvm \ -enable-kvm \ -machine q35 \ -m 4096 \ -smp 4 \ -cpu Skylake-Client \ -name debug-threads=on \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0-root-port-2,addr=0x0 \ -blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/mnt/nfs/rhel810-64-virtio-scsi-2.qcow2,node-name=my_file \ -blockdev driver=qcow2,node-name=my_disk,file=my_file \ -device scsi-hd,drive=my_disk,bus=virtio_scsi_pci0.0 \ -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,queues=4 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=70:5a:0f:38:cd:1c,bus=pcie.0-root-port-3,vectors=10,mq=on \ -vnc :0 \ -device VGA \ -monitor stdio \ -qmp tcp:0:1234,server,nowait \ -incoming rdma:0:5555 \ 3.enable rdma on src host qemu, enable multifd on both src and dst host qemu: (src qemu) migrate_set_capability rdma-pin-all on (src qemu) migrate_set_capability multifd on (dst qemu) migrate_set_capability multifd on Actual results: qemu core dump on src and dst host. (src host): (qemu) migrate rdma:192.168.0.21:5555 source_resolve_host RDMA Device opened: kernel name mlx4_0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/mlx4_0, transport: (1) Infiniband ./start2.sh: line 21: 14612 Segmentation fault (core dumped) /usr/libexec/qemu-kvm -enable-kvm ... (dst host): (qemu) dest_init RDMA Device opened: kernel name mlx4_0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/mlx4_0, transport: (1) Infiniband qemu-kvm: receive cm event, cm event is 10 qemu-kvm: rdma migration: recv polling control error! qemu-kvm: RDMA is in an error state waiting migration to abort! qemu-kvm: Not a migration stream qemu-kvm: load of migration failed: Invalid argument qemu-kvm: Early error. Sending error. qemu-kvm: rdma migration: send polling control error ./start2.sh: line 22: 18542 Segmentation fault (core dumped) /usr/libexec/qemu-kvm ... Expected results: no qemu core dump. Additional info: when do separate tests for rdma and multifd, they all work well, no qemu core dump.
add test step 4 for Steps to Reproduce: 4.do migration on src host qemu: (src qemu) migrate rdma:192.168.0.21:5555
Hi multifd + rdma don't work. Improving the error message. But not a lot that we can do about.
Hi Multifd + rdma don't work. I will improve the error message upstream, but that is everything that we can do for now. Later, Juan.
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.
Still hit this bz on rhelav 8.4.0(kernel-4.18.0-304.el8.x86_64&qemu-kvm-5.2.0-14.module+el8.4.0+10425+ad586fa5.x86_64). So reopen this bz. Src qemu core dump and dst qemu quit normally: (src qemu) migrate -d rdma:192.168.0.21:1234 source_resolve_host RDMA Device opened: kernel name mlx4_0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/mlx4_0, transport: (1) Infiniband (qemu) 2.sh: line 40: 18662 Segmentation fault (core dumped) (dst qemu) dest_init RDMA Device opened: kernel name mlx4_0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/mlx4_0, transport: (1) Infiniband qemu-kvm: receive cm event, cm event is 10 qemu-kvm: rdma migration: recv polling control error! qemu-kvm: RDMA is in an error state waiting migration to abort! qemu-kvm: Not a migration stream qemu-kvm: load of migration failed: Invalid argument
Patches for this are on a PULL request upstream, as soon as the pull gets integrated I will backport them.
Bulk update: Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release.