Bug 1515173

Summary: Cross migration from rhel6.9 to rhel7.5 failed
Product: Red Hat Enterprise Linux 7 Reporter: yisun
Component: qemu-kvm-rhevAssignee: Dr. David Alan Gilbert <dgilbert>
Status: CLOSED ERRATA QA Contact: jingzhao <jinzhao>
Severity: high Docs Contact:
Priority: high    
Version: 7.5CC: chayang, chhu, fjin, hhuang, jdenemar, juzhang, lmiksik, lvivier, michen, mrezanin, peterx, quintela, rbalakri, virt-maint, xfu, yafu, yanqzhan, zpeng
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.10.0-9.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-11 00:46:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vm1.xml
none
source_host_vm1.log
none
target_host_vm1.log none

Description yisun 2017-11-20 10:28:47 UTC
Created attachment 1355631 [details]
vm1.xml

Description of problem:
Cross migration from rhel6.9 to rhel7.5 failed

Version-Release number of selected component (if applicable):
source host:
qemu-kvm-rhev-0.12.1.2-2.503.el6_9.3.x86_64
libvirt-0.10.2-62.el6.x86_64

target host:
qemu-kvm-rhev-2.10.0-6.el7.x86_64
libvirt-3.9.0-2.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Having a running vm (name=vm1) on source host.
(see its xml in vm1.xml in attachment)

2. Do migration with following cmd
# virsh migrate --live vm1 qemu+ssh://10.66.7.98/system --verbose --unsafe
The authenticity of host '10.66.7.98 (10.66.7.98)' can't be established.
RSA key fingerprint is 3c:60:76:4f:6b:81:ed:1b:a1:39:e2:c2:a6:7e:a9:dc.
Are you sure you want to continue connecting (yes/no)? yes
root.7.98's password: 
error: operation failed: migration job: unexpectedly failed
(qemu logs on both ends be checked in attachments: source_host_vm1.log and target_host_vm1.log )


Actual results:
Migration failed from 6.9 to 7.5

Expected results:
Cross migration should always work from lower version to higher version.

Comment 2 yisun 2017-11-20 10:32:54 UTC
Created attachment 1355632 [details]
source_host_vm1.log

Comment 3 yisun 2017-11-20 10:34:51 UTC
Created attachment 1355633 [details]
target_host_vm1.log

Comment 4 Jiri Denemark 2017-11-20 14:03:37 UTC
The QEMU log on the destination host says:

2017-11-20T10:14:37.617453Z qemu-kvm: Unknown savevm section or instance 'block' 0
copying E and F segments from pc.bios to pc.ram
copying C and D segments from pc.rom to pc.ram
2017-11-20T10:14:37.617799Z qemu-kvm: load of migration failed: Invalid argument


I looked at the two command lines and I can't see any incompatibility there, I'm passing this bug to qemu-kvm for further investigation.

All differences between the two command lines:

< PATH=/sbin:/usr/sbin:/bin:/usr/bin
> PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin

Irrelevant.

< -name vm1
> -name guest=vm1,debug-threads=on

Equivalent.

<
> -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-19-vm1/master-key.aes

7.5 adds a master key, which is invisible to the guest => irrelevant

< -M rhel6.6.0
< -enable-kvm
> -machine rhel6.6.0,accel=kvm,usb=off,dump-guest-core=off

These should be equivalent.

< -nodefconfig
> -no-user-config

Irrelevant.

< -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm1.monitor,server,nowait
> -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-19-vm1/monitor.sock,server,nowait

7.5 stores the monitor socket in a per-domain directory; irrelevant.

< -no-kvm-pit-reinjection
> -global kvm-pit.lost_tick_policy=delay
> -no-hpet

Timer configuration, should be equivalent.

<
> -boot strict=on

Irrelevant on migration.

< -drive file=/var/lib/libvirt/images/RHEL-6.9-x86_64-latest.qcow2,if=none,id=drive-ide0-0-0,format=qcow2
> -drive file=/var/lib/libvirt/images/RHEL-6.9-x86_64-latest.qcow2,format=qcow2,if=none,id=drive-ide0-0-0

Options for -drive passed in different order; irrelevant.

< -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1
> -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1

7.5 uses ide-hd device while 6.9 has ide-drive device.

< -netdev tap,fd=24,id=hostnet0
> -netdev tap,fd=28,id=hostnet0

Irrelevant.

< -vga cirrus
> -device cirrus-vga,id=video0,bus=pci.0,addr=0x2

Equivalent.

<
> -incoming defer

Irrelevant.

Comment 5 Dr. David Alan Gilbert 2017-11-22 12:39:30 UTC
   Unknown savevm section or instance 'block' 

that suggests it was trying to do old-style block migration.

Jiri: Any sign from the libvirt log why it was trying to use old-block migration?

Comment 6 Dr. David Alan Gilbert 2017-11-22 12:51:35 UTC
Oh, I see the problem.
I upstreamed a disable-live-block-migration - but it's not quite the same as our downstream.  In 7.0-7.4 we disable *outgoing* block migration but still allow incoming;  the disable-live-block-migration I sent upstream disables it all.
So 7.5 picked up the upstream code and completely disables it including incoming.

Comment 8 Miroslav Rezanina 2017-11-28 10:52:42 UTC
Fix included in qemu-kvm-rhev-2.10.0-9.el7

Comment 10 huiqingding 2017-12-13 05:54:28 UTC
Reproduce this bug:
RHEL6.9 host:
kernel-2.6.32-696.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.503.el6_9.3.x86_64

RHEL7.5 host:
kernel-3.10.0-799.el7.x86_64
qemu-kvm-rhev-2.10.0-7.el7.x86_64

Reproduce steps:
1. boot a rhel6.9 guest with "ide-drive" system disk on rhel6.9 host
# /usr/libexec/qemu-kvm -name vm1 -S -M rhel6.6.0 -cpu SandyBridge -enable-kvm -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid ddea7e7d-8e6d-46e4-b454-e15114fdb08a -nodefconfig -nodefaults -rtc base=utc,driftfix=slew -no-kvm-pit-reinjection -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/mnt/rhel6.9.raw,if=none,id=drive-ide0-0-0,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=24,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:74:70:75,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc :1 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on -monitor stdio

2. boot the guest with "-incoming tcp:0:5800" and "ide-hd" system disk on rhel7.5 host
# /usr/libexec/qemu-kvm -name guest=vm1,debug-threads=on -S -machine rhel6.6.0,accel=kvm,usb=off,dump-guest-core=off -cpu SandyBridge -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid ddea7e7d-8e6d-46e4-b454-e15114fdb08a -no-user-config -nodefaults -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/mnt/rhel6.9.raw,format=raw,if=none,id=drive-ide0-0-0 -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=28,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:74:70:75,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:1 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on -monitor stdio -incoming tcp:0:5800

3. Do migration

Actual results:
after step3, migration is failed, qemu-kvm on destination host quits with
(qemu) 2017-12-13T05:32:59.900314Z qemu-kvm: Unknown savevm section or instance 'block' 0
copying E and F segments from pc.bios to pc.ram
copying C and D segments from pc.rom to pc.ram
2017-12-13T05:32:59.901232Z qemu-kvm: load of migration failed: Invalid argument

Verify this bug using "qemu-kvm-rhev-2.10.0-12.el7.x86_64", migration can be finished normally.

Comment 11 huiqingding 2017-12-13 05:58:43 UTC
Based on comment #10, set this bug to be verified.

Comment 15 errata-xmlrpc 2018-04-11 00:46:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1104