Bug 1033432

Summary: HMP on src host print "nbd.c:nbd_receive_reply():L746: read failed " after storage vm migration
Product: Red Hat Enterprise Linux 7 Reporter: Jun Li <juli>
Component: qemu-kvmAssignee: Ademar Reis <areis>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: low Docs Contact:
Priority: low    
Version: 7.0CC: acathrow, hhuang, juli, juzhang, knoel, michen, pbonzini, qiguo, shu, virt-maint, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-25 11:04:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Jun Li 2013-11-22 05:13:47 UTC
Description of problem:
Do storage vm migration as the following steps described. After migration, stop NBD server on des host, then HMP on src host will print "nbd.c:nbd_receive_reply():L746: read failed" now and then.

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-1.5.3-19.el7.x86_64
3.10.0-48.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Boot guest in src and dst host with command line like the followings(src and dst are the same host):
src host:
# /usr/libexec/qemu-kvm -S -M pc-i440fx-rhel7.0.0 -cpu SandyBridge -enable-kvm -m 4G -smp 4,sockets=2,cores=2,threads=1 -name juli -uuid 355a2475-4e03-4cdd-bf7b-5d6a59edaa68 -rtc base=localtime,clock=host,driftfix=slew \
-device virtio-scsi-pci,bus=pci.0,addr=0x5,id=scsi0 -drive file=/home/sn1,if=none,id=drive-scsi0-0-0,media=disk,cache=none,format=qcow2,werror=stop,rerror=stop,aio=native  -device scsi-hd,drive=drive-scsi0-0-0,bus=scsi0.0,scsi-id=0,lun=0,id=juli,bootindex=0 \
-drive file=/home/ISO/RHEL6.3-20120613.2-Server-i386-DVD1.iso,if=none,media=cdrom,format=raw,aio=native,id=drive-ide1-0-0 -device ide-drive,drive=drive-ide1-0-0,id=ide1-0-0,bus=ide.0,unit=0,bootindex=4 \
-device virtio-balloon-pci,id=ballooning \
-global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 \
-k en-us -boot menu=on,reboot-timeout=-1,strict=on -qmp tcp:0:4477,server,nowait -serial unix:/tmp/ttyS0,server,nowait -vnc :3 -spice port=5939,disable-ticketing  -vga qxl -global qxl-vga.revision=3 -monitor stdio -monitor tcp:0:7777,server,nowait -monitor unix:/tmp/monitor1,server,nowait \
-netdev tap,id=tap1,vhost=on,queues=4,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,ifname=tap-juli -device virtio-net-pci,netdev=tap1,id=nic1,mq=on,vectors=17,mac=1a:59:0a:4b:aa:94
dst host:
# /usr/libexec/qemu-kvm -S -M pc-i440fx-rhel7.0.0 -cpu SandyBridge -enable-kvm -m 4G -smp 4,sockets=2,cores=2,threads=1 -name juli -uuid 355a2475-4e03-4cdd-bf7b-5d6a59edaa68 -rtc base=localtime,clock=host,driftfix=slew \
-device virtio-scsi-pci,bus=pci.0,addr=0x5,id=scsi0 -drive file=/home/sn2,if=none,id=drive-scsi0-0-0,media=disk,cache=none,format=qcow2,werror=stop,rerror=stop,aio=native  -device scsi-hd,drive=drive-scsi0-0-0,bus=scsi0.0,scsi-id=0,lun=0,id=juli,bootindex=0 \
-drive file=/home/ISO/RHEL6.3-20120613.2-Server-i386-DVD1.iso,if=none,media=cdrom,format=raw,aio=native,id=drive-ide1-0-0 -device ide-drive,drive=drive-ide1-0-0,id=ide1-0-0,bus=ide.0,unit=0,bootindex=4 \
-device virtio-balloon-pci,id=ballooning \
-global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 \
-k en-us -boot menu=on,reboot-timeout=-1,strict=on -qmp tcp:0:6677,server,nowait -serial unix:/tmp/ttyS0,server,nowait -vnc :6 -spice port=5938,disable-ticketing  -vga qxl -global qxl-vga.revision=3 -monitor stdio -monitor tcp:0:7788,server,nowait -monitor unix:/tmp/monitor1,server,nowait \
-netdev tap,id=tap1,vhost=on,queues=4,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,ifname=tap-juli-1 -device virtio-net-pci,netdev=tap1,id=nic1,mq=on,vectors=17,mac=1a:59:0a:4b:aa:94 -incoming tcp::5800
2.on des host, create NBD server, and export the empty disk:
$ telnet 10.66.106.4 6677
{"execute":"qmp_capabilities"}
{ "execute": "nbd-server-start", "arguments": { "addr": { "type": "inet", "data": { "host": "10.66.106.4", "port": "3333" } } } }
{ "execute": "nbd-server-add", "arguments": { "device": "drive-scsi0-0-0", "writable": true } }

3.on src host, start guest installation, and after formating disk, start mirroring to the remote NBD disk in step 2.
$ telnet 10.66.106.4 4477
{"execute":"qmp_capabilities"}
{ "execute": "drive-mirror", "arguments": { "device": "drive-scsi0-0-0", "target": "nbd://10.66.106.4:3333/drive-scsi0-0-0", "sync": "full", "format": "raw", "mode": "existing" } }{"timestamp": {"seconds": 1385084423, "microseconds": 563049}, "event": "RESET"}

4.after mirroring reaching steady state, do migration frome src to dst.
{"timestamp": {"seconds": 1385086406, "microseconds": 122417}, "event": "BLOCK_JOB_READY", "data": {"device": "drive-scsi0-0-0", "len": 21474836480, "offset": 21474836480, "speed": 32212254720, "type": "mirror"}}
(qemu) migrate -d tcp:127.0.0.1:5800

5.After migration finish, stop NBD server on des host.
{ "execute": "nbd-server-stop", "arguments": { } }

6.Check the HMP on src host.

Actual results:
nbd.c:nbd_receive_reply():L746: read failed
nbd.c:nbd_receive_reply():L746: read failed
nbd.c:nbd_receive_reply():L746: read failed
nbd.c:nbd_receive_reply():L746: read failed
nbd.c:nbd_receive_reply():L746: read failed
nbd.c:nbd_receive_reply():L746: read failed
nbd.c:nbd_receive_reply():L746: read failed
nbd.c:nbd_receive_reply():L746: read failed
...

Expected results:
After success migration, check the HMP on src host do not print any error.

Additional info:

Comment 2 Paolo Bonzini 2013-11-25 11:03:57 UTC
You need to do block-job-cancel on the source before starting the destination.

Is storage VM migration tested with libvirt too (by you or by someone else)?  If not, I suggest that you do.  libvirt's correct sequencing of steps is very important, as this bug shows.

Comment 6 FuXiangChun 2013-11-28 10:12:59 UTC
(In reply to Paolo Bonzini from comment #2)
> You need to do block-job-cancel on the source before starting the
> destination.

You are right.
> 
> Is storage VM migration tested with libvirt too (by you or by someone else)?
> If not, I suggest that you do.  libvirt's correct sequencing of steps is
> very important, as this bug shows.

Ignore comment5 first. QE tested this issue with libvirt.  Can get "block-job-cancel" even in src when migration will finish. so libvirt executed block-job-cancel command automatically.

2013-11-28 09:56:55.783+0000: 26822: debug : qemuMonitorSend:955 : QEMU_MONITOR_SEND_MSG: mon=0x7f308c00b210 msg={"execute":"block-job-cancel","arguments":{"device":"drive-scsi0-0-0-0"},"id":"libvirt-280"}^M
 fd=-1
2013-11-28 09:56:55.783+0000: 26818: debug : qemuMonitorIOWrite:502 : QEMU_MONITOR_IO_WRITE: mon=0x7f308c00b210 buf={"execute":"block-job-cancel","arguments":{"device":"drive-scsi0-0-0-0"},"id":"libvirt-280"}