Bug 1744530
Summary: | Migration failed with enabling postcopy and multifd, qemu crash on destination and guest hang on source end | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | xianwang <xianwang> |
Component: | qemu-kvm | Assignee: | Dr. David Alan Gilbert <dgilbert> |
qemu-kvm sub component: | Live Migration | QA Contact: | Li Xiaohui <xiaohli> |
Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | aadam, chayang, dgibson, dgilbert, jinzhao, juzhang, lvivier, ngu, peterx, quintela, qzhang, smitterl, virt-maint, xiaohli, yafu |
Version: | 8.1 | Keywords: | Triaged |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-03-15 07:38:37 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1753522, 1758964, 1771318 |
Description
xianwang
2019-08-22 10:51:03 UTC
I will update hardware later after I tried it on x86_64. I hit core dump twice while testing this scenario: build information and steps are same with bug report, qemu cli is as following: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox off \ -nodefaults \ -machine pseries-rhel8.1.0 \ -uuid 8aeab7e2-f341-4f8c-80e8-59e2968d85c2 \ -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=0x3 \ -object iothread,id=iothread0 \ -chardev socket,id=console0,path=/tmp/console0,server,nowait \ -device spapr-vty,chardev=console0,reg=0x30000000 \ -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x5 \ -device pci-bridge,chassis_nr=1,id=bridge1,bus=pci.0,addr=0x6 \ -device pci-bridge,chassis_nr=2,id=bridge2,bus=pci.0,addr=0x8 \ -device virtio-scsi-pci,id=scsi1,bus=bridge1,addr=0x7 \ -drive file=/home/xianwang/rhel810-ppc64le-virtio-scsi.qcow2.bak,format=qcow2,if=none,cache=none,id=drive_scsi1,werror=stop,rerror=stop \ -device scsi-hd,drive=drive_scsi1,id=scsi-disk1,bus=scsi1.0,channel=0,scsi-id=0x6,lun=0x3,bootindex=0 \ -device virtio-scsi-pci,id=scsi_add,bus=pci.0,addr=0x9 \ -device virtio-net-pci,mac=9a:7b:7c:7d:7e:72,id=id9HRc5V,vectors=4,netdev=idjlQN53,bus=pci.0,addr=0xa \ -netdev tap,id=idjlQN53,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ -m 2048,slots=4,maxmem=32G \ -smp 4 \ -vga std \ -vnc :11 \ -cpu host \ -device usb-kbd \ -incoming tcp:0:5801 \ -device usb-mouse \ -qmp tcp:0:8881,server,nowait \ -msg timestamp=on \ -rtc base=localtime,clock=vm,driftfix=slew \ -monitor stdio \ -boot order=cdn,once=n,menu=on,strict=off \ -enable-kvm \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xc \ -device i6300esb,id=wdt0 \ -watchdog-action pause \ result: on source: (qemu) info status VM status: paused (finish-migrate) (qemu) 2019-08-22T07:34:46.172349Z qemu-kvm: multifd_send_pages: channel 0 has already quit! 2019-08-22T07:34:46.172386Z qemu-kvm: multifd_send_pages: channel 1 has already quit! 2019-08-22T07:34:46.172400Z qemu-kvm: multifd_send_sync_main: multifd_send_pages fail 2019-08-22T07:34:46.203558Z qemu-kvm: Unable to write to socket: Connection timed out (qemu) info status VM status: paused (postmigrate) (qemu) info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on decompress-error-check: on clear-bitmap-shift: 18 capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: on x-colo: off release-ram: off return-path: off pause-before-switchover: off multifd: on dirty-bitmaps: off postcopy-blocktime: off late-block-activate: off x-ignore-shared: off Migration status: failed (Unable to write to socket: Connection timed out) total time: 0 milliseconds on destination: (qemu) 2019-08-22T06:53:04.432056Z qemu-kvm: Non-sequential target page 0x7fff0d316000/0x7fff0d14f000 2019-08-22T06:53:04.432090Z qemu-kvm: error while loading state section id 1(ram) 2019-08-22T06:53:04.432104Z qemu-kvm: postcopy_ram_listen_thread: loadvm failed: -22 2019-08-22T06:53:04.579058Z qemu-kvm: CMD_POSTCOPY_RUN in wrong postcopy state (5) 2019-08-22T06:53:04.579123Z qemu-kvm: postcopy_fault_thread_notify: incrementing failed: Bad file descriptor 2019-08-22T06:53:04.579141Z qemu-kvm: Detected IO failure for postcopy. Migration paused. boot.sh: line 37: 27019 Segmentation fault (core dumped) /usr/libexec/qemu-kvm -name 'avocado-vt-vm1' -sandbox off -nodefaults -machine pseries-rhel8.1.0 -uuid 8aeab7e2-f341-4f8c-80e8-59e2968d85c2 -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=0x3 -object iothread,id=iothread0 -chardev socket,id=console0,path=/tmp/console0,server,nowait -device spapr-vty,chardev=console0,reg=0x30000000 -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x5 -device pci-bridge,chassis_nr=1,id=bridge1,bus=pci.0,addr=0x6 -device pci-bridge,chassis_nr=2,id=bridge2,bus=pci.0,addr=0x8 -device virtio-scsi-pci,id=scsi1,bus=bridge1,addr=0x7 -drive file=/home/xianwang/rhel810-ppc64le-virtio-scsi.qcow2.bak,format=qcow2,if=none,cache=none,id=drive_scsi1,werror=stop,rerror=stop -device scsi-hd,drive=drive_scsi1,id=scsi-disk1,bus=scsi1.0,channel=0,scsi-id=0x6,lun=0x3,bootindex=0 -device virtio-scsi-pci,id=scsi_add,bus=pci.0,addr=0x9 -device virtio-net-pci,mac=9a:7b:7c:7d:7e:72,id=id9HRc5V,vectors=4,netdev=idjlQN53,bus=pci.0,addr=0xa -netdev tap,id=idjlQN53,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -m 2048,slots=4,maxmem=32G -smp 4 -vga std -vnc :11 -cpu host -device usb-kbd -incoming tcp:0:5801 -device usb-mouse -qmp tcp:0:8881,server,nowait -msg timestamp=on -rtc base=localtime,clock=vm,driftfix=slew -monitor stdio -boot order=cdn,once=n,menu=on,strict=off -enable-kvm -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xc -device i6300esb,id=wdt0 -watchdog-action pause (In reply to xianwang from comment #1) > I will update hardware later after I tried it on x86_64. This issue also exists on x86_64 platform, so, I will update hardware to "all". Because "postcopy" migration is an important function, and we hit core dump, so, I think "severity" should also be "high", anyone should change it if you think it is incorrect. Build information: 4.18.0-129.el8.x86_64 qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64 Looks like BZ 1738451 (In reply to Laurent Vivier from comment #5) > Looks like BZ 1738451 At first, I also think it is similar to BZ 1738451, but there are something about postcopy in its error message, and this scenario didn't change multifd channel and didn't execute "migrate_cancel", and this core dump error is on destination end while that bz core dump is on src end, what's more, the error message of them are different. So, I am not sure whether their root cause are same, I report it to track this issue and this scenario. Multifd + postcopy are not supported simustanously upstream. We will enable it once that we support it. I am doing a patch series upstream that will give one error when you try to enable both capabilities. Thanks, Juan. QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks Hi We don't support this combination. The plan is not get one error if one tries this combination. Adding Triaged keyword and resetting to NEW for placement on the backlog for future assignment (although reading comment 9 it would seem it could be CLOSEd as NOTABUG) After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. Multifd + postcopy work well on rhelav 8.4.0(kernel-4.18.0-302.el8.x86_64&qemu-kvm-5.2.0-14.module+el8.4.0+10425+ad586fa5.x86_64), vm works well after multifd+postcopy migration. So close this bz as CurrentRelease. BTW, do we support multifd+postcopy migration now? Just closing the needinfo |