Bug 1525899
| Summary: | Migrate to an error destination ip ->"migrate_cancel"->info migrate, there will be segmentation fault | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | xianwang <xianwang> |
| Component: | qemu-kvm-rhev | Assignee: | Dr. David Alan Gilbert <dgilbert> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Yumei Huang <yuhuang> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.5 | CC: | chayang, dgilbert, jinzhao, juzhang, knoel, michen, qzhang, virt-maint, xianwang |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-05-15 08:34:54 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1558351 | ||
Confirmed (and with upstream 2.11); the crucial thing is that the IP address doesn't reject the connection, but just hangs during the connect. Status is 'cancelling'. This bug is not a regression, it also exist on qemu-kvm-rhev-2.9.0-16.el7_4.1, although the result of qemu-kvm-rhev-2.9.0-16.el7_4.1.ppc64le is not totally same with qemu-kvm-rhev-2.10.0-12.el7.ppc64le, the result is as Dave said in comment2, the status of migration is "cancelling" as following: version: Host: 3.10.0-693.el7.ppc64le qemu-kvm-rhev-2.9.0-16.el7_4.1.ppc64le SLOF-20170724-2.git89f519f.el7.noarch steps are same with bug report. result: (qemu) migrate -d tcp:10.16.110.120:5801 (qemu) info migrate capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off Migration status: setup total time: 0 milliseconds (qemu) migrate_cancel (qemu) info migrate capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off Migration status: cancelling (qemu) info migrate Migration status: cancelling ......... endless "cancelling" Yep, there's actually two bugs: a) The seg, for which I've just posted upstream: migration: Guard ram_bytes_remaining against early call b) The endless cancelling, which I've got an idea how to fix - it's related to the error path through the socket code. and posted upstream fixes for (b): [PATCH 1/2] migration: Allow migrate_fd_connect to take an Error * [PATCH 2/2] migration: Route errors down through a) just got merged upstream as bae416e5ba65701d3c5238164517158066d615e5 bumped to 7.6 b) got merged upstream as: 688a3dcba980bf01344a cce8040bb0ea6ff56d88 Also needs: migration: Fix early failure cleanup posted 2018-02-12 and should include the: tests/migration: Add test for migration to bad destination included with it. Also needs: Migration+TLS: Fix crash due to double cleanup |
Description of problem: Migrate a vm to an error destination ip, then in HMP(qemu)migrate_cancel,(qemu)info migrate, there will be segmentation fault, vm hang, qemu crash and quit automatically.this issue both exist on x86 and ppc. Version-Release number of selected component (if applicable): Host: 3.10.0-823.el7.x86_64 qemu-kvm-rhev-2.10.0-12.el7.x86_64 seabios-bin-1.11.0-1.el7.noarch How reproducible: 4/5 Steps to Reproduce: 1.Boot a guest with qemu cli: gdb --args /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox off \ -machine pc \ -nodefaults \ -vga std \ -rtc base=utc,clock=host,driftfix=slew \ -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=03,disable-legacy=off,disable-modern=on \ -chardev socket,id=console0,path=/tmp/console0,server,nowait \ -device virtserialport,chardev=console0,name=console0,id=console0,bus=virtio_serial_pci0.0 \ -chardev socket,id=serial0,path=/tmp/serial0,server,nowait \ -device isa-serial,chardev=serial0,id=serial0 \ -device nec-usb-xhci,id=usb1,multifunction=on,bus=pci.0,addr=11 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=04,disable-legacy=off,disable-modern=on,iothread=iothread0 \ -object iothread,id=iothread0 \ -drive id=drive_image1,if=none,cache=none,format=qcow2,snapshot=off,file=/home/xianwang/rhel75-64-virtio-scsi.qcow2 \ -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,bootindex=0 \ -netdev tap,vhost=on,id=idlkwV8e,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ -device virtio-net-pci,mac=9a:7b:7c:7d:7e:7f,id=idtlLxAk,vectors=4,netdev=idlkwV8e,bus=pci.0,addr=05,disable-legacy=off,disable-modern=on \ -m 4G \ -smp 4 \ -cpu SandyBridge \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=2 \ -device usb-kbd,id=usb-kbd1,bus=usb1.0,port=3 \ -device usb-mouse,id=usb-mouse1,bus=usb1.0,port=4 \ -qmp tcp:0:6666,server,nowait \ -vnc :9 \ -rtc base=localtime,clock=vm,driftfix=slew \ -boot order=cdn,once=c,menu=off,strict=off \ -monitor stdio \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=06 \ 2.Migrate vm to an error destination ip and cancel migration (gdb) r (qemu) migrate -d tcp:10.66.101.144:5801 ****(this ip and port doesn't exist) (qemu) info migrate globals: store-global-state=1, only_migratable=0, send-configuration=1, send-section-footer=1 capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off return-path: off pause-before-switchover: off Migration status: setup 3.check the status of migration (qemu) migrate_cancel (qemu) info migrate Actual results: vm hang, segmentation fault,and qemu crash (qemu) migrate_cancel (qemu) info migrate Program received signal SIGSEGV, Segmentation fault. 0x00005555557f27a7 in ram_bytes_remaining () at /usr/src/debug/qemu-2.10.0/migration/ram.c:207 207 return ram_state->migration_dirty_pages * TARGET_PAGE_SIZE; (gdb) bt #0 0x00005555557f27a7 in ram_bytes_remaining () at /usr/src/debug/qemu-2.10.0/migration/ram.c:207 #1 0x000055555599bdf6 in populate_ram_info (info=info@entry=0x555556d150e0, s=0x555556d30280, s=0x555556d30280) at migration/migration.c:523 #2 0x000055555599c760 in qmp_query_migrate (errp=errp@entry=0x0) at migration/migration.c:567 #3 0x00005555558c8008 in hmp_info_migrate (mon=0x555556db0240, qdict=<optimized out>) at hmp.c:165 #4 0x00005555557ded0f in handle_hmp_command (mon=mon@entry=0x555556db0240, cmdline=0x55555715600c "") at /usr/src/debug/qemu-2.10.0/monitor.c:3151 #5 0x00005555557e038a in monitor_command_cb (opaque=0x555556db0240, cmdline=<optimized out>, readline_opaque=<optimized out>) at /usr/src/debug/qemu-2.10.0/monitor.c:3954 #6 0x0000555555ace6a8 in readline_handle_byte (rs=0x555557156000, ch=<optimized out>) at util/readline.c:393 #7 0x00005555557def12 in monitor_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-2.10.0/monitor.c:3937 #8 0x0000555555a6602f in fd_chr_read (chan=0x555556ce1d40, cond=<optimized out>, opaque=0x555556d14fa0) at chardev/char-fd.c:66 #9 0x00007fffef4c98f9 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #10 0x0000555555abc19c in glib_pollfds_poll () at util/main-loop.c:213 #11 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261 #12 main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:515 #13 0x000055555579d8ca in main_loop () at vl.c:1917 #14 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4805 (gdb) q Expected results: migrate status is "Migration status: cancelled", and vm is running on src host. Additional info: the result of ppc is same with x86 platform, version is as following: 3.10.0-820.el7.ppc64le qemu-kvm-rhev-2.10.0-12.el7.ppc64le SLOF-20170724-2.git89f519f.el7.noarch