RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1355683 - qemu core dump when do postcopy migration again after canceling a migration in postcopy phase
Summary: qemu core dump when do postcopy migration again after canceling a migration i...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Dr. David Alan Gilbert
QA Contact: Qianqian Zhu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-12 08:51 UTC by Qianqian Zhu
Modified: 2016-11-07 21:23 UTC (History)
6 users (show)

Fixed In Version: qemu-kvm-rhev-2.6.0-17.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-07 21:23:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:2673 0 normal SHIPPED_LIVE qemu-kvm-rhev bug fix and enhancement update 2016-11-08 01:06:13 UTC

Description Qianqian Zhu 2016-07-12 08:51:13 UTC
Description of problem:
Qemu core dump when do postcopy migration again after canceling a migration in postcopy phase.

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.6.0-12.el7.x86_64
kernel-3.10.0-461.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Launch src guest:
gdb /usr/libexec/qemu-kvm
(gdb) run -name linux -cpu Westmere,check -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 7bef3814-631a-48bb-bae8-2b1de75f7a13 -nodefaults -monitor stdio -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot order=c,menu=on -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/nfsmount/RHEL-Server-7.3-64-virtio.qcow2,if=none,cache=writeback,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on -spice port=5901,disable-ticketing -vga qxl -global qxl-vga.revision=3 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=3C:D9:2B:09:AB:44,bus=pci.0,addr=0x3

2.Launch guest on dest host with same cmd
3.Start postcopy migration then cancel it immediately
(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate -d tcp:10.73.72.55:1234
(qemu) migrate_start_postcopy
(qemu) migrate_cancel

4.Launch guest on dest host again.
5.Start postcopy migration again
(qemu) migrate -d tcp:10.73.72.55:1234
(qemu) migrate_start_postcopy

Actual results:
Qemu core dump:
(qemu) 2016-07-12T08:42:34.819057Z qemu-kvm: invalid runstate transition: 'finish-migrate' -> 'finish-migrate'

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fff567fe700 (LWP 28314)]
0x00007fffec5041d7 in raise () from /lib64/libc.so.6

Expected results:
Postcopy migration succeed

Additional info:

Comment 2 Dr. David Alan Gilbert 2016-07-15 10:12:36 UTC
Yes, I can recreate this.

It should be an unusual circumstance in practice; cancelling after postcopy has started is unsafe unless you control the destination.  If the destination hasn't started running it's OK to restart the source and try again, so libvirt could potentially do that - however, it would issue a continue to the source before retrying the migration so wouldn't hit this case.

I'll look into it.

Comment 4 Qianqian Zhu 2016-07-20 08:18:16 UTC
Test with:
qemu-kvm-rhev-2.6.0-13.el7.1355683a.x86_64
kernel-3.10.0-461.el7.x86_64

Steps:
1.Launch src guest
2.Launch guest on dest host with same cmd
3.Start postcopy migration then cancel it immediately
(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate -d tcp:10.73.72.55:1234
(qemu) migrate_start_postcopy
(qemu) migrate_cancel

4.Launch guest on dest host again.
5.Start postcopy migration again
(qemu) migrate -d tcp:10.73.72.55:1234
(qemu) migrate_start_postcopy

Results:
No core dump, postcopy migration succeed and guest works well After step5.


Normal migration cancelling, succeed, but with below error:
(qemu) migrate_cancel 
(qemu) 2016-07-20T08:14:09.855908Z qemu-kvm: socket_writev_buffer: Got err=32 for (73885/18446744073709551615)

Cancelling in postcopy phase:
(qemu) 2016-07-20T08:06:34.581064Z qemu-kvm: socket_writev_buffer: Got err=32 for (131337/18446744073709551615)
2016-07-20T08:06:34.581090Z qemu-kvm: RP: Received invalid message 0x0000 length 0x0000

Comment 6 Miroslav Rezanina 2016-07-29 09:12:12 UTC
Fix included in qemu-kvm-rhev-2.6.0-17.el7

Comment 8 Qianqian Zhu 2016-08-23 05:44:48 UTC
Verified with:
qemu-kvm-rhev-2.6.0-20.el7.x86_64
kernel-3.10.0-491.el7.x86_64

Steps same as comment 4.
cli:
/usr/libexec/qemu-kvm -name linux -cpu SandyBridge -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 7bef3814-631a-48bb-bae8-2b1de75f7a13 -nodefaults -monitor stdio -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot order=c,menu=on -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/mntnfs/RHEL-Server-7.3-64-virtio.qcow2,if=none,cache=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on -spice port=5901,disable-ticketing -vga qxl -global qxl-vga.revision=3 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=3C:D9:2B:09:AB:44,bus=pci.0,addr=0x3 -qmp tcp::5555,server,nowait

Result:
Postcopy migration succeed and guest works well.(qemu) 
Cancelling with the same warning:
2016-07-20T08:06:34.581064Z qemu-kvm: socket_writev_buffer: Got err=32 for (131337/18446744073709551615)
2016-07-20T08:06:34.581090Z qemu-kvm: RP: Received invalid message 0x0000 length 0x0000

Comment 9 Qianqian Zhu 2016-08-23 05:45:38 UTC
Moving to VERIFIED as per comment 8

Comment 11 errata-xmlrpc 2016-11-07 21:23:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2673.html


Note You need to log in before you can comment on or make changes to this bug.