Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1525899

Summary:	Migrate to an error destination ip ->"migrate_cancel"->info migrate, there will be segmentation fault
Product:	Red Hat Enterprise Linux 7	Reporter:	xianwang <xianwang>
Component:	qemu-kvm-rhev	Assignee:	Dr. David Alan Gilbert <dgilbert>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Yumei Huang <yuhuang>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	7.5	CC:	chayang, dgilbert, jinzhao, juzhang, knoel, michen, qzhang, virt-maint, xianwang
Target Milestone:	rc
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-05-15 08:34:54 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1558351

Description xianwang 2017-12-14 10:41:34 UTC

Description of problem:
Migrate a vm to an error destination ip, then in HMP(qemu)migrate_cancel,(qemu)info migrate, there will be segmentation fault, vm hang, qemu crash and quit automatically.this issue both exist on x86 and ppc. 

Version-Release number of selected component (if applicable):
Host:
3.10.0-823.el7.x86_64
qemu-kvm-rhev-2.10.0-12.el7.x86_64
seabios-bin-1.11.0-1.el7.noarch


How reproducible:
4/5

Steps to Reproduce:
1.Boot a guest with qemu cli:
gdb --args /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine pc  \
    -nodefaults  \
    -vga std \
    -rtc base=utc,clock=host,driftfix=slew \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=03,disable-legacy=off,disable-modern=on  \
    -chardev socket,id=console0,path=/tmp/console0,server,nowait \
    -device virtserialport,chardev=console0,name=console0,id=console0,bus=virtio_serial_pci0.0  \
    -chardev socket,id=serial0,path=/tmp/serial0,server,nowait \
    -device isa-serial,chardev=serial0,id=serial0 \
    -device nec-usb-xhci,id=usb1,multifunction=on,bus=pci.0,addr=11 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=04,disable-legacy=off,disable-modern=on,iothread=iothread0 \
    -object iothread,id=iothread0 \
    -drive id=drive_image1,if=none,cache=none,format=qcow2,snapshot=off,file=/home/xianwang/rhel75-64-virtio-scsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,bootindex=0 \
    -netdev tap,vhost=on,id=idlkwV8e,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
    -device virtio-net-pci,mac=9a:7b:7c:7d:7e:7f,id=idtlLxAk,vectors=4,netdev=idlkwV8e,bus=pci.0,addr=05,disable-legacy=off,disable-modern=on  \
    -m 4G  \
    -smp 4  \
    -cpu SandyBridge \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=2  \
    -device usb-kbd,id=usb-kbd1,bus=usb1.0,port=3 \
    -device usb-mouse,id=usb-mouse1,bus=usb1.0,port=4 \
    -qmp tcp:0:6666,server,nowait \
    -vnc :9 \
    -rtc base=localtime,clock=vm,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -monitor stdio \
    -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=06 \
2.Migrate vm to an error destination ip and cancel migration
(gdb) r
(qemu) migrate -d tcp:10.66.101.144:5801 ****(this ip and port doesn't exist)
(qemu) info migrate
globals: store-global-state=1, only_migratable=0, send-configuration=1, send-section-footer=1
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off return-path: off pause-before-switchover: off 
Migration status: setup

3.check the status of migration
(qemu) migrate_cancel 
(qemu) info migrate


Actual results:
vm hang, segmentation fault,and qemu crash

(qemu) migrate_cancel 
(qemu) info migrate

Program received signal SIGSEGV, Segmentation fault.
0x00005555557f27a7 in ram_bytes_remaining () at /usr/src/debug/qemu-2.10.0/migration/ram.c:207
207	    return ram_state->migration_dirty_pages * TARGET_PAGE_SIZE;

(gdb) bt
#0  0x00005555557f27a7 in ram_bytes_remaining () at /usr/src/debug/qemu-2.10.0/migration/ram.c:207
#1  0x000055555599bdf6 in populate_ram_info (info=info@entry=0x555556d150e0, s=0x555556d30280, s=0x555556d30280)
    at migration/migration.c:523
#2  0x000055555599c760 in qmp_query_migrate (errp=errp@entry=0x0) at migration/migration.c:567
#3  0x00005555558c8008 in hmp_info_migrate (mon=0x555556db0240, qdict=<optimized out>) at hmp.c:165
#4  0x00005555557ded0f in handle_hmp_command (mon=mon@entry=0x555556db0240, cmdline=0x55555715600c "")
    at /usr/src/debug/qemu-2.10.0/monitor.c:3151
#5  0x00005555557e038a in monitor_command_cb (opaque=0x555556db0240, cmdline=<optimized out>, readline_opaque=<optimized out>)
    at /usr/src/debug/qemu-2.10.0/monitor.c:3954
#6  0x0000555555ace6a8 in readline_handle_byte (rs=0x555557156000, ch=<optimized out>) at util/readline.c:393
#7  0x00005555557def12 in monitor_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>)
    at /usr/src/debug/qemu-2.10.0/monitor.c:3937
#8  0x0000555555a6602f in fd_chr_read (chan=0x555556ce1d40, cond=<optimized out>, opaque=0x555556d14fa0) at chardev/char-fd.c:66
#9  0x00007fffef4c98f9 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#10 0x0000555555abc19c in glib_pollfds_poll () at util/main-loop.c:213
#11 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
#12 main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:515
#13 0x000055555579d8ca in main_loop () at vl.c:1917
#14 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4805
(gdb) q


Expected results:
migrate status is "Migration status: cancelled", and vm is running on src host.

Additional info:
the result of ppc is same with x86 platform, version is as following:
3.10.0-820.el7.ppc64le
qemu-kvm-rhev-2.10.0-12.el7.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch

Comment 2 Dr. David Alan Gilbert 2017-12-15 09:40:57 UTC

Confirmed (and with upstream 2.11); the crucial thing is that the IP address doesn't reject the connection, but just hangs during the connect.

Status is 'cancelling'.

Comment 3 xianwang 2017-12-15 11:40:09 UTC

This bug is not a regression, it also exist on qemu-kvm-rhev-2.9.0-16.el7_4.1, although the result of qemu-kvm-rhev-2.9.0-16.el7_4.1.ppc64le is not totally same with qemu-kvm-rhev-2.10.0-12.el7.ppc64le, the result is as Dave said in comment2, the status of migration is "cancelling" as following:

version:
Host:
3.10.0-693.el7.ppc64le
qemu-kvm-rhev-2.9.0-16.el7_4.1.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch

steps are same with bug report.

result:
(qemu) migrate -d tcp:10.16.110.120:5801
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off 
Migration status: setup
total time: 0 milliseconds
(qemu) migrate_cancel 
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off 
Migration status: cancelling
(qemu) info migrate
Migration status: cancelling
.........
endless "cancelling"

Comment 4 Dr. David Alan Gilbert 2017-12-15 11:54:03 UTC

Yep, there's actually two bugs:
  a) The seg, for which I've just posted upstream: migration: Guard ram_bytes_remaining against early call
  b) The endless cancelling, which I've got an idea how to fix - it's related to the error path through the socket code.

Comment 5 Dr. David Alan Gilbert 2017-12-15 17:19:07 UTC

and posted  upstream fixes for (b):
[PATCH 1/2] migration: Allow migrate_fd_connect to take an Error *
[PATCH 2/2] migration: Route errors down  through

Comment 6 Dr. David Alan Gilbert 2018-01-15 16:42:36 UTC

a) just got merged upstream as bae416e5ba65701d3c5238164517158066d615e5

Comment 7 Dr. David Alan Gilbert 2018-02-01 15:04:25 UTC

bumped to 7.6

Comment 8 Dr. David Alan Gilbert 2018-02-07 18:06:23 UTC

b) got merged upstream as:
688a3dcba980bf01344a
cce8040bb0ea6ff56d88

Comment 9 Dr. David Alan Gilbert 2018-02-13 09:08:02 UTC

Also needs:
  migration: Fix early failure cleanup
posted 2018-02-12 and should include the:
  tests/migration: Add test for migration to bad destination
included with it.

Comment 10 Dr. David Alan Gilbert 2018-04-30 19:01:39 UTC

Also needs:
Migration+TLS: Fix crash due to double cleanup