Bug 1425273

Summary:	[Q35] migration failed after hotplug e1000e device
Product:	Red Hat Enterprise Linux 7	Reporter:	jingzhao <jinzhao>
Component:	qemu-kvm-rhev	Assignee:	Paolo Bonzini <pbonzini>
Status:	CLOSED ERRATA	QA Contact:	jingzhao <jinzhao>
Severity:	high	Docs Contact:
Priority:	high
Version:	7.4	CC:	chayang, dgilbert, hhuang, jinchen, jinzhao, juzhang, knoel, lvivier, marcel, mrezanin, pbonzini, virt-maint
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	qemu-kvm-rhev-2.9.0-1.el7	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-08-01 23:44:45 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description jingzhao 2017-02-21 03:09:06 UTC

Description of problem:
[Q35] migration failed after unplug e1000e device 

Version-Release number of selected component (if applicable):
kernel-3.10.0-563.el7.x86_64
qemu-kvm-rhev-2.8.0-4.el7.x86_64

How reproducible:
3/3

Steps to Reproduce:
1.Boot guest with e1000e device [1]
(qemu) info network 
net1: index=0,type=nic,model=e1000e,macaddr=9a:6a:6b:6c:6d:6a
 \ dev1: index=0,type=tap,ifname=tap0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown

2.Unplug e1000e device and then migrate
(qemu) device_del net1 
netdev_add  netdev_del  
(qemu) netdev_del dev1 
(qemu) info network 
(qemu) migrate -d tcp:10.66.6.246:5800

Actual results:
migrated failed
src status:
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off 
Migration status: failed
total time: 0 milliseconds

dest status:
(qemu) qemu-kvm: Unknown ramblock "", cannot accept migration
qemu-kvm: error while loading state for instance 0x0 of device 'ram'
qemu-kvm: load of migration failed: Invalid argument


Expected results:
migrate successfully

Additional info:
Hot-plug e1000e and then migrated, the same result with above

Also, didn't reproduce the issue on pc machine type with e1000 device (test steps are same with above)

[1]
src command:
/usr/libexec/qemu-kvm \
-M q35 \
-cpu SandyBridge \
-m 4G \
-smp 4,sockets=2,cores=2,threads=1 \
-enable-kvm \
-boot menu=on \
-bios /usr/share/seabios/bios.bin \
-chardev file,path=/home/seabios.log,id=seabios \
-vga qxl \
-spice port=5931,disable-ticketing \
-qmp tcp:0:4445,server,nowait \
-device ioh3420,id=root.0,slot=1 \
-device e1000e,netdev=dev1,mac=9a:6a:6b:6c:6d:6a,id=net1,bus=root.0 \
-netdev tap,id=dev1,vhost=on \
-device ioh3420,id=root.1,slot=2 \
-drive file=/home/test/q35-seabios.qcow2,if=none,id=drive0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=threads \
-device virtio-blk-pci,id=virtio-disk0,drive=drive0,bus=root.1,bootindex=0 \
-monitor stdio \

dest command:
/usr/libexec/qemu-kvm \
-M q35 \
-cpu SandyBridge \
-m 4G \
-smp 4,sockets=2,cores=2,threads=1 \
-enable-kvm \
-boot menu=on \
-bios /usr/share/seabios/bios.bin \
-chardev file,path=/home/seabios1.log,id=seabios \
-vga qxl \
-spice port=5932,disable-ticketing \
-qmp tcp:0:4446,server,nowait \
-device ioh3420,id=root.0,slot=1 \
-device ioh3420,id=root.1,slot=2 \
-drive file=/home/test/q35-seabios.qcow2,if=none,id=drive0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=threads \
-device virtio-blk-pci,id=virtio-disk0,drive=drive0,bus=root.1,bootindex=0 \
-monitor stdio \
-incoming tcp:0:5800 \


Following is the pc command:
/usr/libexec/qemu-kvm \
-M pc \
-cpu SandyBridge \
-m 4G \
-smp 4,sockets=2,cores=2,threads=1 \
-enable-kvm \
-boot menu=on \
-bios /usr/share/seabios/bios.bin \
-chardev file,path=/home/seabios.log,id=seabios \
-vga qxl \
-spice port=5931,disable-ticketing \
-qmp tcp:0:4445,server,nowait \
-device e1000,netdev=dev1,mac=9a:6a:6b:6c:6d:6a,id=net1 \
-netdev tap,id=dev1,vhost=on \
-drive file=/home/test/q35-seabios.qcow2,if=none,id=drive0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=threads \
-device virtio-blk-pci,id=virtio-disk0,drive=drive0,bootindex=0 \
-monitor stdio \

Comment 1 jingzhao 2017-02-21 03:14:09 UTC

Also checked q35 machine type with block device, didn't reproduce the issue

1.Boot guest with block device
2.unplug block device through hmp
(qemu) device_del virtio-disk1
3.do the local migration
4.migrated successfully

[1]/usr/libexec/qemu-kvm \
-M q35 \
-cpu SandyBridge \
-m 4G \
-smp 4,sockets=2,cores=2,threads=1 \
-enable-kvm \
-boot menu=on \
-bios /usr/share/seabios/bios.bin \
-chardev file,path=/home/seabios.log,id=seabios \
-vga qxl \
-spice port=5931,disable-ticketing \
-qmp tcp:0:4445,server,nowait \
-device ioh3420,id=root.0,slot=1 \
-drive file=/home/test/block1.qcow2,if=none,id=drive1,format=qcow2,cache=none,werror=stop,rerror=stop,aio=threads \
-device virtio-blk-pci,id=virtio-disk1,drive=drive1,bus=root.0 \
-device e1000e,netdev=dev1,mac=9a:6a:6b:6c:6d:6a,id=net1 \
-netdev tap,id=dev1,vhost=on \
-device ioh3420,id=root.1,slot=2 \
-drive file=/home/test/q35-seabios.qcow2,if=none,id=drive0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=threads \
-device virtio-blk-pci,id=virtio-disk0,drive=drive0,bus=root.1,bootindex=0 \
-monitor stdio \

Thanks
Jing Zhao

Comment 2 jinchen 2017-02-21 04:31:17 UTC

with the version of qemu-kvm-rhev-2.6.0-28.el7_3.6.x86_64

migration failed after unplug virtio-net-pci
migration failed after plug virtio-net-pci

Comment 3 jinchen 2017-02-21 06:52:54 UTC

with the version of qemu-kvm-rhev-debuginfo-2.8.0-5.el7.x86_64

migration failed after unplug virtio-net-pci deivce
migration failed after plug virtio-net-pci device

Comment 5 Marcel Apfelbaum 2017-02-26 14:55:25 UTC

Hi,

Can you please provide more information:
When you removed the e1000e device, did you have another NIC so the migration process can use?
When you hot-plugged the e1000e device, was it the only NIC in the system?
Can you provide the exact steps the failed migration for virtio-net-pci device?
Did you check PC, Q35 or both for virtio-net-pci?

Thanks,
Marcel

Comment 6 jinchen 2017-03-02 03:27:39 UTC

(In reply to Marcel Apfelbaum from comment #5)
> Hi,
> 
> Can you please provide more information:
> When you removed the e1000e device, did you have another NIC so the
> migration process can use?

  I only use one NIC,but when i tried to use two NIC ,the results is still failed
> When you hot-plugged the e1000e device, was it the only NIC in the system?

  Yes,it was the only NIC in the system,if it has a NIC before hot plugged in the system,the results is successful
> Can you provide the exact steps the failed migration for virtio-net-pci
> device?

1 Boot guest with virtio-net-pci device [1]
2 Hot plug virtio-net-pci device and then migrate
(qemu) info network 
hub 0
 \ hub0port1: user.0: index=0,type=user,net=10.0.2.0,restrict=off
 \ hub0port0: e1000.0: index=0,type=nic,model=e1000,macaddr=52:54:00:12:34:56
(qemu) netdev_add tap,vhost=on,id=dev1
(qemu) device_add virtio-net-pci,netdev=dev1,id=net1,mac=9a:6a:6b:6c:6d:6a,bus=root.0
(qemu) info network 
hub 0
 \ hub0port1: user.0: index=0,type=user,net=10.0.2.0,restrict=off
 \ hub0port0: e1000.0: index=0,type=nic,model=e1000,macaddr=52:54:00:12:34:56
net1: index=0,type=nic,model=virtio-net-pci,macaddr=9a:6a:6b:6c:6d:6a
 \ dev1: index=0,type=tap,ifname=tap1,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown
(qemu) migrate -d tcp:10.66.4.211:5800
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off 
Migration status: failed
total time: 0 milliseconds

dest status:
(qemu) info network 
net1: index=0,type=nic,model=virtio-net-pci,macaddr=9a:6a:6b:6c:6d:6a
 \ dev1: index=0,type=tap,ifname=tap0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown
(qemu) qemu-kvm: Unknown ramblock "0000:00:02.0/e1000.rom", cannot accept migration
qemu-kvm: error while loading state for instance 0x0 of device 'ram'
qemu-kvm: load of migration failed: Invalid argument
red_channel_client_disconnect: rcc=0x7f3d2efae000 (channel=0x7f3d2da5c600 type=2 id=0)
red_channel_client_disconnect: rcc=0x7f3d2dcd0000 (channel=0x7f3d2da4e580 type=4 id=0)

[1]
src command:
/usr/libexec/qemu-kvm \
-M q35 \
-cpu SandyBridge \
-m 4G \
-smp 4,sockets=2,cores=2,threads=1 \
-enable-kvm \
-boot menu=on \
-bios /usr/share/seabios/bios.bin \
-chardev file,path=/home/seabios.log,id=seabios \
-vga qxl \
-spice port=5931,disable-ticketing \
-qmp tcp:0:4445,server,nowait \
-device ioh3420,id=root.0,slot=1 \
-device ioh3420,id=root.1,slot=2 \
-drive file=/home/demo/1.img-seabios,if=none,id=drive0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=threads \
-device virtio-blk-pci,id=virtio-disk0,drive=drive0,bus=root.1,bootindex=0 \
-monitor stdio \

dest command:
/usr/libexec/qemu-kvm \
-M q35 \
-cpu SandyBridge \
-m 4G \
-smp 4,sockets=2,cores=2,threads=1 \
-enable-kvm \
-boot menu=on \
-bios /usr/share/seabios/bios.bin \
-chardev file,path=/home/seabios1.log,id=seabios \
-vga qxl \
-spice port=5932,disable-ticketing \
-qmp tcp:0:4446,server,nowait \
-device ioh3420,id=root.0,slot=1 \
-device virtio-net-pci,netdev=dev1,mac=9a:6a:6b:6c:6d:6a,id=net1,bus=root.0 \
-netdev tap,id=dev1,vhost=on \
-device ioh3420,id=root.1,slot=2 \
-drive file=/home/demo/1.img-seabios,if=none,id=drive0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=threads \
-device virtio-blk-pci,id=virtio-disk0,drive=drive0,bus=root.1,bootindex=0 \
-monitor stdio \
-incoming tcp:0:5800 \

> Did you check PC, Q35 or both for virtio-net-pci?

  Yes.for PC:hot plugged/unplugged e1000e device or virtio-net-pci device,and no matter how many NIC in the system,the results is failed.
      for Q35:hot plugged e1000e device or virtio-net-pci device and it is not the only NIC in the system,the results is successful.hot unplugged a virtio-net-pci device and it has two NIC in the system,the results is still successful.
> Thanks,
> Marcel

Thanks,
jinchen

Comment 7 jinchen 2017-03-02 07:47:33 UTC

Hi,Marcel

  Sorry,due to my mistakes,for the PC,hot plugged/unplugged e1000 device rather than e1000e device,but the results is right,hot plugged/unplugged is failed.

Thanks,
jinchen

Comment 8 Marcel Apfelbaum 2017-03-07 14:48:57 UTC

(In reply to jinchen from comment #7)
> Hi,Marcel
> 
>   Sorry,due to my mistakes,for the PC,hot plugged/unplugged e1000 device
> rather than e1000e device,but the results is right,hot plugged/unplugged is
> failed.
> 
> Thanks,
> jinchen

Thank you for the reply. I am having trouble understanding the results.
Please try to fill the following table:

machine - operation | virtio-nic | e1000   | e1000e  |  virtio block
-------------------------------------------------------------------------
PC - hotplug        |   ok/fail  | ok/fail |  -----  |     ok/fail  
-------------------------------------------------------------------------
PC - hot-unplug     |   ok/fail  | ok/fail |  ----   |     ok/fail 
-------------------------------------------------------------------------
Q35 - hotplug       |   ok/fail  |  -----  | ok/fail |     ok/fail  
-------------------------------------------------------------------------
Q35 - hot-unplug    |   ok/fail  |  ----   | ok/fail |     ok/fail 

Thanks,
Marcel

Comment 9 Marcel Apfelbaum 2017-03-07 14:55:33 UTC

Hi David,

Can you please have a look to the migration command line
and see if the migration parameters are correct with respect
to hot-plug/hot-unplug operations? (e.g what the destination side command line should be if we hot-plug/hot-unplug a device before migration starts.)

Thanks,
Marcel

Comment 10 Dr. David Alan Gilbert 2017-03-07 15:12:45 UTC

Jinchen:
  When you do hotplug, you must always specify the bus and address for *all* PCI/PCIe devices both on the commandline and when you hot-add them.
  If you don't specify it, then when you run the destination with a different command line with the unplugged device missing, other devices will change their auto-allocated slot numbers and the migration will be confused.

Please confirm if you can reproduce the bug when specifying all addresses and busses.

Comment 11 Dr. David Alan Gilbert 2017-03-07 17:38:09 UTC

Marcel: There is a bug here - it looks like the RAMBlock associated with the e1000e isn't being deleted.

see the error: (qemu) qemu-kvm: Unknown ramblock "", cannot accept migration

that empty ramblock name is odd.
I added some debug to dump the list of RAMBlock names at the start of migrate and then did:

./x86_64-softmmu/qemu-system-x86_64 -nographic  -device e1000e,id=foo -m 1G -M pc,accel=kvm my.img

booted Linux
device_del foo

now the e1000e has gone from the 'info pci' but the RAMBlock is still there if I print out the list of RAMBlock's when I start the migrate.

Comment 12 jinchen 2017-03-08 05:35:58 UTC

According to comment 10,with address for *all* PCI/PCIe devices

machine - operation | virtio-nic | e1000   | e1000e  |  virtio block
-------------------------------------------------------------------------
PC - hotplug        |   fail     | fail    |  -----  |     fail  
-------------------------------------------------------------------------
PC - hot-unplug     |   fail     | fail    |  ----   |     fail 
-------------------------------------------------------------------------
Q35 - hotplug       |   fail     |  -----  | fail    |     fail  
-------------------------------------------------------------------------
Q35 - hot-unplug    |   fail     |  ----   | fail    |     fail

Comment 13 Laurent Vivier 2017-03-08 19:56:20 UTC

(In reply to Dr. David Alan Gilbert from comment #11)
> Marcel: There is a bug here - it looks like the RAMBlock associated with the
> e1000e isn't being deleted.
> 
> see the error: (qemu) qemu-kvm: Unknown ramblock "", cannot accept migration
> 
> that empty ramblock name is odd.

Empty ramblock name is set by qemu_ram_unset_idstr():

pci_qdev_unrealize()
-> pci_del_option_rom()
   -> vmstate_unregister_ram()
      -> qemu_ram_unset_idstr()
         -> memset(block->idstr, 0, sizeof(block->idstr));

pci_dev_unrealize() is called on the "device_del", so according to the code, an empty ROM name is what is expected on unpluging a PCI card. I think migration code should not send RAMblock with empty name.

Comment 14 Dr. David Alan Gilbert 2017-03-08 19:59:39 UTC

(In reply to Laurent Vivier from comment #13)
> (In reply to Dr. David Alan Gilbert from comment #11)
> > Marcel: There is a bug here - it looks like the RAMBlock associated with the
> > e1000e isn't being deleted.
> > 
> > see the error: (qemu) qemu-kvm: Unknown ramblock "", cannot accept migration
> > 
> > that empty ramblock name is odd.
> 
> Empty ramblock name is set by qemu_ram_unset_idstr():
> 
> pci_qdev_unrealize()
> -> pci_del_option_rom()
>    -> vmstate_unregister_ram()
>       -> qemu_ram_unset_idstr()
>          -> memset(block->idstr, 0, sizeof(block->idstr));
> 
> pci_dev_unrealize() is called on the "device_del", so according to the code,
> an empty ROM name is what is expected on unpluging a PCI card. I think
> migration code should not send RAMblock with empty name.

The question though is why the RAMBlock isn't deleted rather than just having it's name unset.

There's probably quite a few places we'd have to skip a RAMBlock we wanted to avoid.

Dave

Comment 15 Laurent Vivier 2017-03-09 08:52:09 UTC

This appears in:

commit b0e56e0b63f350691b52d3e75e89bb64143fbeff
Author: Hu Tao <hutao.com>
Date:   Wed Apr 2 15:13:27 2014 +0800

    unset RAMBlock idstr when unregister MemoryRegion
    
    Signed-off-by: Hu Tao <hutao.com>
    Signed-off-by: Paolo Bonzini <pbonzini>

diff --git a/savevm.c b/savevm.c
index da8aa24..7b2c410 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1209,7 +1209,7 @@ void vmstate_register_ram(MemoryRegion *mr, DeviceState *dev)
 
 void vmstate_unregister_ram(MemoryRegion *mr, DeviceState *dev)
 {
-    /* Nothing do to while the implementation is in RAMBlock */
+    qemu_ram_unset_idstr(memory_region_get_ram_addr(mr) & TARGET_PAGE_MASK);
 }
 
From
https://lists.nongnu.org/archive/html/qemu-devel/2014-04/msg00282.html

"When hotplug an memdev that was previously plugged and unplugged,
RAMBlock idstr is not cleared and triggers an assert error in
qemu_ram_set_idstr(). This series fixes it."

Comment 16 Laurent Vivier 2017-03-09 11:09:44 UTC

Unplugging the card calls pci_qdev_unrealize(), which unregister the PCI device memory (with do_pci_unregister_device()).

Then qemu_ram_free() is normally called by memory_region_finalize(). But memory_region_finalize() is not called because obj->ref is not 1 (checked in object_unref()).

Comment 17 Laurent Vivier 2017-03-09 12:30:34 UTC

Paolo has fixed the problem with:

http://patchwork.ozlabs.org/patch/736979/

diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index b0f429b..6e23493 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -306,7 +306,7 @@  e1000e_init_msix(E1000EState *s)
 static void
 e1000e_cleanup_msix(E1000EState *s)
 {
-    if (msix_enabled(PCI_DEVICE(s))) {
+    if (msix_present(PCI_DEVICE(s))) {
         e1000e_unuse_msix_vectors(s, E1000E_MSIX_VEC_NUM);
         msix_uninit(PCI_DEVICE(s), &s->msix, &s->msix);
     }

Comment 18 Marcel Apfelbaum 2017-03-13 13:44:31 UTC

 (In reply to jinchen from comment #12)
> According to comment 10,with address for *all* PCI/PCIe devices
> 
> machine - operation | virtio-nic | e1000   | e1000e  |  virtio block
> -------------------------------------------------------------------------
> PC - hotplug        |   fail     | fail    |  -----  |     fail  
> -------------------------------------------------------------------------
> PC - hot-unplug     |   fail     | fail    |  ----   |     fail 
> -------------------------------------------------------------------------
> Q35 - hotplug       |   fail     |  -----  | fail    |     fail  
> -------------------------------------------------------------------------
> Q35 - hot-unplug    |   fail     |  ----   | fail    |     fail

Can you please tests again with brew:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12746756

Thanks,
Marcel

Comment 19 jingzhao 2017-03-15 02:31:37 UTC

Hi Marcel

 Also failed with qemu-kvm-rhev-2.8.0-6.el7.test.x86_64 that you provided.

Thanks
Jing

Comment 20 Dr. David Alan Gilbert 2017-03-15 09:07:07 UTC

(In reply to jingzhao from comment #19)
> Hi Marcel
> 
>  Also failed with qemu-kvm-rhev-2.8.0-6.el7.test.x86_64 that you provided.
> 
> Thanks
> Jing

With which error?
What exact command line were you using this time?

Comment 21 jingzhao 2017-03-15 09:55:04 UTC

(In reply to Dr. David Alan Gilbert from comment #20)
> (In reply to jingzhao from comment #19)
> > Hi Marcel
> > 
> >  Also failed with qemu-kvm-rhev-2.8.0-6.el7.test.x86_64 that you provided.
> > 
> > Thanks
> > Jing
> 
> With which error?
> What exact command line were you using this time?

The same error with above
(qemu) qemu-kvm: Unknown ramblock "", cannot accept migration
qemu-kvm: error while loading state for instance 0x0 of device 'ram'
qemu-kvm: load of migration failed: Invalid argument

the qemu command line used:
[1] src host:
/usr/libexec/qemu-kvm \
-M q35 \
-cpu SandyBridge \
-m 4G \
-smp 4,sockets=2,cores=2,threads=1 \
-enable-kvm \
-boot menu=on \
-bios /usr/share/seabios/bios.bin \
-chardev file,path=/home/seabios.log,id=seabios \
-vga qxl \
-spice port=5931,disable-ticketing \
-qmp tcp:0:4445,server,nowait \
-device ioh3420,id=root.0,slot=1 \
-device e1000e,netdev=dev1,mac=9a:6a:6b:6c:6d:6a,id=net1,bus=root.0 \
-netdev tap,id=dev1,vhost=on \
-device ioh3420,id=root.1,slot=2 \
-drive file=/mnt/test/q35-seabios.qcow2,if=none,id=drive0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=threads \
-device virtio-blk-pci,id=virtio-disk0,drive=drive0,bus=root.1,bootindex=0 \
-monitor stdio \
-vnc :1 \

[2] delete net in src host 
(qemu) netdev_del dev1  
(qemu) device_del net1 

[3] In src host:
(qemu) migrate -d tcp:10.66.4.211:5800
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off 
Migration status: failed

[4]dest host:
/usr/libexec/qemu-kvm \
-M q35 \
-cpu SandyBridge \
-m 4G \
-smp 4,sockets=2,cores=2,threads=1 \
-enable-kvm \
-boot menu=on \
-bios /usr/share/seabios/bios.bin \
-chardev file,path=/home/seabios.log,id=seabios \
-vga qxl \
-vnc :0 \
-spice port=5931,disable-ticketing \
-qmp tcp:0:4445,server,nowait \
-device ioh3420,id=root.0,slot=1 \
-device ioh3420,id=root.1,slot=2 \
-drive file=/mnt/test/q35-seabios.qcow2,if=none,id=drive0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=threads \
-device virtio-blk-pci,id=virtio-disk0,drive=drive0,bus=root.1,bootindex=0 \
-monitor stdio \
-vnc :1 \
-incoming tcp:0:5800 \


Thanks
Jing

Comment 22 Dr. David Alan Gilbert 2017-03-15 13:44:17 UTC

I can confirm it still fails with Marcel's rpm, but it does seem to work if I take Paolo's patch and apply it to upstream qemu.
Marcel what was in that build?

Comment 25 Marcel Apfelbaum 2017-03-29 13:50:49 UTC

(In reply to Dr. David Alan Gilbert from comment #22)
> I can confirm it still fails with Marcel's rpm, but it does seem to work if
> I take Paolo's patch and apply it to upstream qemu.
> Marcel what was in that build?

Well, it was supposed to be the latest qemu-kvm-rhev with Paolo's patch applied, now I am not sure anymore...

Thanks,
Marcel

Comment 26 Laurent Vivier 2017-04-06 07:40:02 UTC

Set to post as Paolo's patch has landed in v2.9.0-rc0.

Comment 27 jingzhao 2017-04-26 06:23:09 UTC

1.Reproduce the bz on qemu-kvm-rhev-2.8.0-6.el7.x86_64

2.Also failed on qemu-kvm-rhev-2.9.0-1.el7.x86_64 & kernel-3.10.0-657.el7.x86_64

Following is the detailed info:

1) boot guest with qemu command line [1]

2) unplug e1000e device in the src
(qemu) device_del net1  
(qemu) netdev_del dev1 

3) do the migration in the src
(qemu) migrate -d tcp:10.66.6.246:5800
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off 
Migration status: active
total time: 2002 milliseconds
expected downtime: 300 milliseconds
setup: 13 milliseconds
transferred ram: 68846 kbytes
throughput: 268.58 mbps
remaining ram: 3439744 kbytes
total ram: 4325840 kbytes
duplicate: 204795 pages
skipped: 0 pages
normal: 16729 pages
normal bytes: 66916 kbytes
dirty sync count: 1
(qemu) mig
migrate                 migrate_cancel          migrate_incoming        
migrate_set_cache_size  migrate_set_capability  migrate_set_downtime    
migrate_set_parameter   migrate_set_speed       migrate_start_postcopy  
(qemu) migrate_se
migrate_set_cache_size  migrate_set_capability  migrate_set_downtime    
migrate_set_parameter   migrate_set_speed       
(qemu) migrate_set_downtime 1  
(qemu) migrate_set_speed 1G
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off 
Migration status: completed
total time: 9218 milliseconds
downtime: 189 milliseconds
setup: 13 milliseconds
transferred ram: 1331923 kbytes
throughput: 1183.84 mbps
remaining ram: 0 kbytes
total ram: 4325840 kbytes
duplicate: 761749 pages
skipped: 0 pages
normal: 330661 pages
normal bytes: 1322644 kbytes
dirty sync count: 3

4) check the status in dest:
(qemu) red_dispatcher_loadvm_commands: 
qemu-kvm: Unknown savevm section or instance '0000:00:02.0/pcie-root-port' 0
qemu-kvm: load of migration failed: Invalid argument

[1]
/usr/libexec/qemu-kvm \
-M q35 \
-cpu SandyBridge \
-m 4G \
-smp 4,sockets=2,cores=2,threads=1 \
-enable-kvm \
-boot menu=on \
-bios /usr/share/seabios/bios.bin \
-chardev file,path=/home/seabios.log,id=seabios \
-vga qxl \
-vnc :0 \
-qmp tcp:0:4445,server,nowait \
-device pcie-root-port,id=root.0,slot=1 \
-device e1000e,netdev=dev1,mac=9a:6a:6b:6c:6d:6a,id=net1,bus=root.0 \
-netdev tap,id=dev1,vhost=on \
-device pcie-root-port,id=root.1,slot=2 \
-drive file=/home/test/rhel/rhel74.qcow2,if=none,id=drive0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=threads \
-device virtio-blk-pci,id=virtio-disk0,drive=drive0,bus=root.1,bootindex=0 \
-monitor stdio \


According to above info, change the bz status to assign

Comment 28 Laurent Vivier 2017-04-26 08:33:53 UTC

(In reply to jingzhao from comment #27)
> 1.Reproduce the bz on qemu-kvm-rhev-2.8.0-6.el7.x86_64
> 
> 2.Also failed on qemu-kvm-rhev-2.9.0-1.el7.x86_64 &
> kernel-3.10.0-657.el7.x86_64
> 
> Following is the detailed info:
> 
> 1) boot guest with qemu command line [1]
...
> 4) check the status in dest:
> (qemu) red_dispatcher_loadvm_commands: 
> qemu-kvm: Unknown savevm section or instance '0000:00:02.0/pcie-root-port' 0
> qemu-kvm: load of migration failed: Invalid argument

You don't test with the same command line, it looks like another bug, on pcie, not e1000.

Comment 30 Dr. David Alan Gilbert 2017-04-27 09:28:43 UTC

Hi Jing,
  Please test it with the ioh3420 as per the original bug and make sure the original bug is fixed.
  Please test again with the pcie-root-port and file a separate bug if that fails - I can't reproduce it here.

Comment 31 Dr. David Alan Gilbert 2017-04-27 09:37:27 UTC

Actually, I *can* reproduce this - I'll file a separate bz for it

Comment 32 jingzhao 2017-04-27 09:42:34 UTC

(In reply to Dr. David Alan Gilbert from comment #31)
> Actually, I *can* reproduce this - I'll file a separate bz for it

I had filed a new bz for tracking the new bz (bz 1446080). 

Thanks
Jing

Comment 33 jingzhao 2017-04-27 09:54:01 UTC

(In reply to Dr. David Alan Gilbert from comment #30)
> Hi Jing,
>   Please test it with the ioh3420 as per the original bug and make sure the
> original bug is fixed.
>   Please test again with the pcie-root-port and file a separate bug if that
> fails - I can't reproduce it here.

Hi David

Also failed with ioh3420 device, the failed info is same with the pcie-root-port device. following is the failed info

(qemu) red_dispatcher_loadvm_commands: 
qemu-kvm: Unknown savevm section or instance '0000:00:02.0/ioh-3240-express-root-port' 0
qemu-kvm: load of migration failed: Invalid argument

The different failed info, can we conside the bz fixed? 


Thanks
Jing

Comment 34 Dr. David Alan Gilbert 2017-04-27 10:38:45 UTC

(In reply to jingzhao from comment #33)
> (In reply to Dr. David Alan Gilbert from comment #30)
> > Hi Jing,
> >   Please test it with the ioh3420 as per the original bug and make sure the
> > original bug is fixed.
> >   Please test again with the pcie-root-port and file a separate bug if that
> > fails - I can't reproduce it here.
> 
> Hi David
> 
> Also failed with ioh3420 device, the failed info is same with the
> pcie-root-port device. following is the failed info
> 
> (qemu) red_dispatcher_loadvm_commands: 
> qemu-kvm: Unknown savevm section or instance
> '0000:00:02.0/ioh-3240-express-root-port' 0
> qemu-kvm: load of migration failed: Invalid argument
> 
> The different failed info, can we conside the bz fixed? 
> 
> 
> Thanks
> Jing

Jing
  You must specify 'addr=' on all PCI and PCIe devices to ensure they keep the same address on source/destination, e.g. for:

-device pcie-root-port,id=root.0,slot=1 \

you need

-device pcie-root-port,id=root.0,slot=1,addr=4 

(I picked 4, but you need to find a free number).

Comment 35 jingzhao 2017-04-28 01:44:51 UTC

Thanks David's help

According to comment 34 and migrated successfully when pcie with addr parameter, so according to comment 27, comment 34 can be verified the bz on  qemu-kvm-rhev-2.9.0-1.el7.x86_64.

Thanks
Jing

Comment 37 errata-xmlrpc 2017-08-01 23:44:45 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 38 errata-xmlrpc 2017-08-02 01:22:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 39 errata-xmlrpc 2017-08-02 02:14:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 40 errata-xmlrpc 2017-08-02 02:55:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 41 errata-xmlrpc 2017-08-02 03:19:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 42 errata-xmlrpc 2017-08-02 03:37:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392