Bug 1446080

Summary: [Q35] migrate failed after unplug nic device
Product: Red Hat Enterprise Linux 7 Reporter: jingzhao <jinzhao>
Component: qemu-kvm-rhevAssignee: Virtualization Maintenance <virt-maint>
Status: CLOSED NOTABUG QA Contact: jingzhao <jinzhao>
Severity: high Docs Contact:
Priority: high    
Version: 7.4CC: chayang, dgilbert, juzhang, marcel, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-27 10:07:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description jingzhao 2017-04-27 08:30:10 UTC
Description of problem:
migrate failed after unplug nic device
qemu-kvm: Unknown savevm section or instance '0000:00:02.0/pcie-root-port' 0
qemu-kvm: load of migration failed: Invalid argument


Version-Release number of selected component (if applicable):


How reproducible:
3/3

Steps to Reproduce:
1) boot guest with qemu command line [1]

2) unplug e1000e device in the src
(qemu) device_del net1  
(qemu) netdev_del dev1 

3) do the migration in the src
(qemu) migrate -d tcp:10.66.6.246:5800
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off 
Migration status: active
total time: 2002 milliseconds
expected downtime: 300 milliseconds
setup: 13 milliseconds
transferred ram: 68846 kbytes
throughput: 268.58 mbps
remaining ram: 3439744 kbytes
total ram: 4325840 kbytes
duplicate: 204795 pages
skipped: 0 pages
normal: 16729 pages
normal bytes: 66916 kbytes
dirty sync count: 1
(qemu) mig
migrate                 migrate_cancel          migrate_incoming        
migrate_set_cache_size  migrate_set_capability  migrate_set_downtime    
migrate_set_parameter   migrate_set_speed       migrate_start_postcopy  
(qemu) migrate_se
migrate_set_cache_size  migrate_set_capability  migrate_set_downtime    
migrate_set_parameter   migrate_set_speed       
(qemu) migrate_set_downtime 1  
(qemu) migrate_set_speed 1G
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off 
Migration status: completed
total time: 9218 milliseconds
downtime: 189 milliseconds
setup: 13 milliseconds
transferred ram: 1331923 kbytes
throughput: 1183.84 mbps
remaining ram: 0 kbytes
total ram: 4325840 kbytes
duplicate: 761749 pages
skipped: 0 pages
normal: 330661 pages
normal bytes: 1322644 kbytes
dirty sync count: 3
(qemu) info status
VM status: paused (postmigrate)


Actual results:
migration failed
dest prompt error message
QEMU 2.9.0 monitor - type 'help' for more information
(qemu) red_dispatcher_loadvm_commands: 
qemu-kvm: Unknown savevm section or instance '0000:00:02.0/pcie-root-port' 0
qemu-kvm: load of migration failed: Invalid argument


Expected results:
migration successfully

[1]
/usr/libexec/qemu-kvm \
-M q35 \
-cpu SandyBridge \
-m 4G \
-smp 4,sockets=2,cores=2,threads=1 \
-enable-kvm \
-boot menu=on \
-bios /usr/share/seabios/bios.bin \
-chardev file,path=/home/seabios.log,id=seabios \
-vga qxl \
-vnc :0 \
-qmp tcp:0:4445,server,nowait \
-device pcie-root-port,id=root.0,slot=1 \
-device e1000e,netdev=dev1,mac=9a:6a:6b:6c:6d:6a,id=net1,bus=root.0 \
-netdev tap,id=dev1,vhost=on \
-device pcie-root-port,id=root.1,slot=2 \
-drive file=/home/test/rhel/rhel74.qcow2,if=none,id=drive0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=threads \
-device virtio-blk-pci,id=virtio-disk0,drive=drive0,bus=root.1,bootindex=0 \
-monitor stdio \

Additional info:
1. Also failed after unplug virtio-net-pci 
2. Migrate successfully after unplug virtio-blk device
3. Migrate successfully after unplug virtio-net on pc machine type

Comment 2 Dr. David Alan Gilbert 2017-04-27 09:47:06 UTC
*** Bug 1446113 has been marked as a duplicate of this bug. ***

Comment 3 Dr. David Alan Gilbert 2017-04-27 10:07:41 UTC
This isn't actually a bug - you must use 'addr=' on all PCI devices (not just slot=) to ensure they have a consistent numbering, especially when doing hotplug.

The problem is that:
-device pcie-root-port,id=root.0,slot=1 

doesn't specify an address, on the source it gets automatically allocated address 02.0, on the destination, because of the different set of devices it gets automatically allocated the address 03.0.

The error then is correct - there is no 02.0 pcie-root-port on the destination - it's 03.0