Bug 1355662

Summary: 'qemu-kvm: socket_writev_buffer: Got err=104' when canceling migration
Product: Red Hat Enterprise Linux 7 Reporter: Qianqian Zhu <qizhu>
Component: qemu-kvm-rhevAssignee: Amit Shah <amit.shah>
Status: CLOSED NOTABUG QA Contact: Qianqian Zhu <qizhu>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.3CC: chayang, dgilbert, jinzhao, knoel, qizhu, quintela, virt-maint
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-07-29 09:02:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Qianqian Zhu 2016-07-12 07:51:47 UTC
Description of problem:
qemu prompt an error when canceling migration,
source qemu output:
(qemu) 2016-07-12T05:39:45.219891Z qemu-kvm: RP: Received invalid message 0x0000 length 0x0000
2016-07-12T05:39:45.243909Z qemu-kvm: socket_writev_buffer: Got err=104 for (73885/18446744073709551615)

And destination qemu output:
(qemu) 2016-07-12T05:51:22.255333Z qemu-kvm: Unknown combination of migration flags: 0
2016-07-12T05:51:22.255401Z qemu-kvm: error while loading state section id 2(ram)
2016-07-12T05:51:22.255909Z qemu-kvm: load of migration failed: Invalid argument

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.6.0-12.el7.x86_64
kernel-3.10.0-461.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Launch guest with:
/usr/libexec/qemu-kvm -name linux -cpu Westmere,check -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 7bef3814-631a-48bb-bae8-2b1de75f7a13 -nodefaults -monitor stdio -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot order=c,menu=on -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/nfsmount/RHEL-Server-7.3-64-virtio.qcow2,if=none,cache=writeback,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on -spice port=5901,disable-ticketing -vga qxl -global qxl-vga.revision=3 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=3C:D9:2B:09:AB:44,bus=pci.0,addr=0x3

2.Launch guest on destination host with same cmd plus -incoming tcp:0:1234
3.(qemu) migrate -d tcp:$dest_ip:1234
4.(qemu) migrate_cancel

Actual results:
migration canceled, but with errors.

Expected results:
migration canceled, without any error.

Additional info:
qemu-kvm-rhev-10:2.3.0-31.el7_2.4.x86_64 works well
qemu-kvm-rhev-10:2.5.0-1.el7.x86_64 reproduces

Comment 3 Amit Shah 2016-07-29 08:36:09 UTC
Does a second migration succeed after this?

If it does, even if this is a regression, it's not a major cause for concern, as:
1) migration was cancelled, and it does get cancelled
2) there's no effect on future migrations

The only side-effect of the bug is an unexpected error message, which doesn't cause any harm in actual functionality.

However, if further migrations are indeed affected, this should be treated as a blocker.

Comment 5 Qianqian Zhu 2016-07-29 08:56:32 UTC
Hi Amit,

I have just tested this on qemu-kvm-rhev-2.6.0-16.el7.x86_64.
The second migration after cancelling will succeed, it indeed does not harm functionality, it's just showing an error message. But I think it would be better if we could hide this error.

Qianqian

Comment 6 Amit Shah 2016-07-29 09:02:05 UTC
Thanks for checking.

It's really an informative message which is emitted by newer versions of QEMU.  These messages help us debug problems.  Earlier, such messages were not emitted, but now they are.  This doesn't indicate an error, just mentions that an abnormal situation was reached.  There's 'err=104' in there, which might suggest it's an error, but that 'err' is just the name of the variable used in code.

In summary, nothing has changed functionally, just more logging for easier debugging.

I'm closing this as NOTABUG as a result.