Description of problem: migration failed with enable compress Version-Release number of selected component (if applicable): Host version: qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64.rpm kernel-4.18.0-57.el8.x86_64 seabios-1.11.1-3.module+el8+2603+0a5231c4.x86_64 Guest:rhel8 How reproducible: 5/5 Steps to Reproduce: 1.Boot a guest on src end 2.Boot incoming guest on dst end 3.Enable compress both src and dst end (qemu) migrate_set_capability compress on 4.In guest, do "stress" #stress --cpu 1 --io 1 --vm 4 --vm-bytes 128M 5.do migration (qemu) migrate -d tcp:10.73.72.88:1234 Actual results: migration failed and qemu quit on dst end In src: migration failed (qemu) info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on decompress-error-check: on capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: on events: off postcopy-ram: off x-colo: off release-ram: off return-path: off pause-before-switchover: off x-multifd: off dirty-bitmaps: off postcopy-blocktime: off late-block-activate: off Migration status: failed total time: 0 milliseconds In dst:qemu quit (qemu) qemu-kvm: decompress data failed qemu-kvm: error while loading state section id 1(ram) qemu-kvm: load of migration failed: Operation not permitted Expected results: migration completed and vm works well Additional info: (1)with compress off,I can't reproduce this issue,migration status is always active and can't completed, vm runs well on source end (2)boot a guest with cmd /usr/libexec/qemu-kvm \ -M q35,accel=kvm,kernel-irqchip=split \ -device intel-iommu,intremap=on \ -cpu Haswell-noTSX,enforce \ -nodefaults -rtc base=utc \ -name debug-threads=on \ -m 8G \ -smp 4,sockets=4,cores=1,threads=1 \ -enable-kvm \ -uuid 990ea161-6b67-47b2-b803-19fb01d30d12 \ -k en-us \ -nodefaults \ -boot menu=on \ -qmp tcp:0:6667,server,nowait \ -vga qxl \ -device pcie-root-port,bus=pcie.0,id=root0,slot=1 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=root0 \ -blockdev driver=qcow2,cache.direct=off,cache.no-flush=on,file.filename=/mnt/rhel80-64-virtio-scsi.qcow2,node-name=my_disk,file.driver=file \ -device scsi-hd,drive=my_disk,bus=virtio_scsi_pci0.0 \ -device pcie-root-port,bus=pcie.0,id=root1,slot=2 \ -device virtio-net-pci,netdev=tap10,mac=9a:6a:6b:6c:6d:6e,bus=root1 -netdev tap,id=tap10 \ -device pcie-root-port,bus=pcie.0,id=root2,slot=3 \ -device e1000e,netdev=tap11,mac=9a:6a:6b:6c:6d:6a,bus=root2 -netdev tap,id=tap11 \ -device pcie-root-port,bus=pcie.0,id=root3,slot=4 \ -blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/mnt/block.qcow2,node-name=virtio_block \ -device virtio-blk-pci,drive=virtio_block,bus=root3 \ -device pcie-root-port,bus=pcie.0,id=root4,slot=5 \ -device nec-usb-xhci,id=usb1,bus=root4 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -monitor stdio \ -vnc :1 \
Hi Juan, I reproduce this bz on rhel8.0.1 and rhel8.1.0 host sometimes, not always reproduce. Need I clone this bz for rhel8.0.1 or rhel8.1.0 since this bz is reported in qemu-kvm, adn the versions about qemu-kvm on rhel8.0, rhel8.0.1 and rhel8.1.0 are different Best regards, Li Xiaohui
hi multifd + compress don't work. I posted upstream a new way to do compression on top of multifd. Will improve the error message.
Hi, I missread the previous commit. This has nothing to do with multifd, investigating what happens with compression.
Hi all, sometimes reproduce this bz on rhel8.1-av(kernel-4.18.0-129.el8.x86_64 & qemu-img-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64), guest is kernel-4.18.0-130.el8.x86_64, thanks.
Hi compression is really difficult to support, and as far as we know we don't use it (the current implementation is only useful if you are migrating over a really, really slow link), otherwise the amount of traffic that is saved is small, and the ammount of CPU that is needed to make it work don't help here.
Notice that the compression that RHV uses is "XBZRLE". (capability xbzrle on info migrate). compress capability is a completelly different beast, based on zlib that we don't support.
There are two compression methods in qemu: - xbzrle - zlib (this is older, so it got the "compression" name) In xbzrle (the one that we support on RHV), we got a big cache of memory, and we save a copy of (some) of the transferred pages. If the page is dirty again, we just send the difference with the previous page sent, so, we transmit less bits. With the zlib compression (that we don't support), we copy the memory to other place, start a thread to do the compression (that is a slow operation in itself), and copy back to the main thread. This is really very slow. It was introduced because at some point there was going to be intel processors that were able to do this compression fast, but they haven't appeared. So we don't support it, we know that it is very slow in its current incarnation and that is why we don't support it. There are partches posted on qemu list that will be integrated on upstream qemu that use zlib (and zstd) on top of multifd and that they are faster, and we can support. but that is for future versions. One line summary: We don't support zlib compression because we know that it is not reliable.
Hi Juan& Hai, I can reproduce this bz on the latest rhel8.1.1-av test. From Juan's above comments, if don't support zlib compression, could disable it at all? Then QE won't test and trace related problems. Thanks.
Hi We will do it upstream. But not for 8.1.1. Later, Juan.
(In reply to Juan Quintela from comment #9) > Hi > > We will do it upstream. But not for 8.1.1. > Will for rhel8.2.0? > Later, Juan.
Yes.
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks
(In reply to Li Xiaohui from comment #10) > (In reply to Juan Quintela from comment #9) > > Hi > > > > We will do it upstream. But not for 8.1.1. > > > Will for rhel8.2.0? Has this upstream work ever been included in 8.2 or 8.3? ... if yes, could we move this bug forward now?
No product uses compression on RHEL. No solution upstream, as said, we have a compression solution on top of multifd that is easier to maintain and much faster. So postpone it.
Will try in the beginning of January 2021 since recently busy with other things and will be PTO in next week
Hi Amnon, I have tested this issue on RHEL-8.4.0-AV(kernel-4.18.0-262.el8.dt3.x86_64&qemu-img-5.2.0-2.module+el8.4.0+9186+ec44380f.x86_64), still can reproduce(not always). If we plan to close it as won'tfix, could you or Juan give QE a confirmation that QE needn't test multi-thread-compression anymore and needn't track related bzs? Thank you.
Sorry, thank you Ariel.
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.
Close this bz as WONTFIX since deprecate multi-thread-compression from migration test plan.