Created attachment 1587335 [details] libvirtd and qemu log Description of problem: Src qemu crashed if do parallel migration with parallel connection=2 after a failed migration with parallel connection=-1 Version-Release number of selected component (if applicable): libvirt-5.4.0-2.module+el8.1.0+3523+b348b848.x86_64 qemu-kvm-4.0.0-4.module+el8.1.0+3523+b348b848.x86_64 kernel-4.18.0-107.el8.x86_64 How reproducible: 100% Steps to Reproduce: 1.Start a vm 2.Do parallel migration with connection number=-1(although I don't understand what does connection number=-1 actually mean), it will fail # virsh migrate nfs qemu+ssh://intel-5130-16-1.englab.nay.redhat.com/system --live --p2p --parallel --parallel-connections -1 error: operation failed: migration out job: Unable to write to socket: Connection reset by peer 3.After step2, do parallel migration with connection number=1, src qemu crashed: # virsh migrate nfs qemu+ssh://intel-5130-16-1.englab.nay.redhat.com/system --live --p2p --parallel --parallel-connections 1 error: operation failed: domain is not running Actual results: In step3, src qemu crashed Expected results: In step3, migration should succeed Additional info:
Created attachment 1587348 [details] qemu backtrace
Hi parallel_connections needs to be >= 1. Improving the error message. Migration after the failure should work through, Looking into that.
Hi all, met problem when set multifd-channels -1, check multifd-channels value, found it's 255, please fix together, thanks (qemu) migrate_set_parameter multifd-channels -1 (qemu) info migrate_parameters announce-initial: 50 ms announce-max: 550 ms announce-rounds: 5 announce-step: 100 ms compress-level: 1 compress-threads: 8 compress-wait-thread: on decompress-threads: 2 cpu-throttle-initial: 20 cpu-throttle-increment: 10 max-cpu-throttle: 99 tls-creds: '' tls-hostname: '' max-bandwidth: 33554432 bytes/second downtime-limit: 300 milliseconds x-checkpoint-delay: 20000 block-incremental: off multifd-channels: 255 xbzrle-cache-size: 67108864 max-postcopy-bandwidth: 0 tls-authz: '(null)'
This is not urgent, we will improve the error handling upstream. -1 is not a valid value, and it should be detected as that.
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks
(In reply to Juan Quintela from comment #5) > This is not urgent, we will improve the error handling upstream. -1 is not > a valid value, and it should be detected as that. Has anybody already improved the error handling upstream?
Hi Not yet, will try to take a look as soon as I have time (not really soon).
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.
Test this bz again via libvirt, found migration succeed and vm works well according to Comment 0 on rhelav 8.4.0(kernel-4.18.0-304.el8.x86_64&qemu-kvm-5.2.0-14.module+el8.4.0+10425+ad586fa5.x86_64&libvirt-client-7.0.0-10.module+el8.4.0+10417+37f6984d.x86_64), so close this bz as CurrentRelease. And mark qe_test_coverage- as it's a negative test in this bz. BTW, still found we could set multifd-channels to -1 both on qemu and libvirt side, and it's 255 in fact after setting to -1. Juan, do you plan to give nice warning to avoid such wrong setting? Or still keep the current status?
Hi xiaohui will give one error when channels are set to -1. Just closing the need info.