Bug 768352

Summary: kvm domain migration failure even on supported guest CPU types
Product: Red Hat Enterprise Linux 6 Reporter: Saveliev Peter <peet>
Component: qemu-kvmAssignee: Juan Quintela <quintela>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.2CC: acathrow, agkesos, amit.shah, bsarathy, danken, dgilbert, ehabkost, hhuang, juzhang, michal.skrivanek, michen, mkenneth, quintela, shu, tburke, virt-maint
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-08 17:43:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Saveliev Peter 2011-12-16 12:39:34 UTC
== Description of problem: ==

The VM *sometimes* fails to migrate, when migration goes from one host that has more flags supported by host CPU to the CPU with less flags.


== Version-Release number of selected component (if applicable): ==

# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.2 Beta (Santiago)

# rpm -qa | grep libvirt
libvirt-python-0.9.4-23.el6_2.1.x86_64
libvirt-client-0.9.4-23.el6_2.1.x86_64
libvirt-0.9.4-23.el6_2.1.x86_64

# rpm -qa | grep qemu
qemu-kvm-tools-0.12.1.2-2.209.el6.x86_64
gpxe-roms-qemu-0.9.7-6.7.el6.noarch
qemu-img-0.12.1.2-2.209.el6.x86_64
qemu-kvm-0.12.1.2-2.209.el6.x86_64

# rpm -qa | grep vdsm
vdsm-cli-4.9-112.el6.x86_64
vdsm-4.9-112.el6.x86_64



== Hardware: ==

Host 1: 

# cat /proc/cpuinfo | grep 'model name' | uniq
model name	: Intel(R) Xeon(R) CPU           X3470  @ 2.93GHz

Host 2:

# cat /proc/cpuinfo | grep 'model name' | uniq
model name	: Intel(R) Xeon(R) CPU E31260L @ 2.40GHz

Flags diff:

# diff flags-01 flags-02
1a2
> aes
3a5
> arat
4a7
> avx
15a19
> epb
35a40
> pclmulqdq
38a44
> pln
42a49
> pts
60a68,70
> x2apic
> xsave
> xsaveopt

== How to reprocuce: ==

1. create VM
2. migrate it from Host 1 to Host 2 — OK
3. migrate it from Host 2 to Host 1 — sometimes OK, sometimes FAIL

== Guest CPU types tried (supported/unsupported): ==

* qemu32
* qemu64
* qemu64,-nx
* Penryn
* pentiumpro
* core2duo
* coreduo


== Error log sample: ==

2011-12-15 13:17:12.627: starting up
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -S -M rhel6.2.0 -cpu Penryn -enable-kvm -m 256 -smp 1,sockets=1,cores=1,threads=1 -name vm1 -
uuid c540a13b-5bd6-4207-b3ff-0d29907d88fd -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=6Server-6.2.0.2.el6,serial=4C4C4544-0046-4610-804A-C2C04F33354A_BC:30:5B:DF:9A:43,uuid=c540a13b-5bd6
-4207-b3ff-0d29907d88fd -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2011-12-15T12:17:1
2,driftfix=slew -no-shutdown -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4 -drive file=/rhev/data-center/f8303001-f3f4-4f24-aaec-6bf460c04037/5d825123-aada-4319-a2e8-28e15d849774/i
mages/11121bbb-ce99-4693-a76b-050468ec1948/b9f0a6d8-274e-4d6e-85ed-0b1b2d5a04fd,if=none,id=drive-ide0-0-0,format=raw,serial=93-a76b-050468ec1948,cache=none,werror=stop,rerror=stop,aio=native -device ide-drive,b
us=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,f
d=30,id=hostnet0,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=46:d3:1e:05:86:32,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/vm1.com.redhat.rh
evm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -usb -vnc 0:0,password -vga cirrus -incoming tcp:0.0.0.0:49177
kvm: unhandled exit 80000021
kvm_run returned -22
Domain id=45 is tainted: custom-monitor
qemu: terminating on signal 15 from pid 2078
2011-12-15 13:17:23.921: shutting down

Comment 2 Saveliev Peter 2011-12-16 13:18:39 UTC
Host 1:

# uname -r
2.6.32-207.el6.x86_64

Host 2:

# uname -r
2.6.32-220.el6.x86_64

Comment 3 Dor Laor 2012-01-02 08:48:07 UTC
Please try w/ the exact same host to see if it works.
Also you SHOULD only test SUPPORTED models, not unsupported. Among all the above models, Penryn is the only one you tried.

Please run qemu command line directly and use -cpu Penryn,enforce

It will make sure the Penryn flags exist on the host (I do think they are there).

Comment 4 Orit Wasserman 2012-03-06 08:10:06 UTC
Hi,
Any process on using -cpu Penryn,enforce ?

Comment 5 Orit Wasserman 2012-03-07 12:27:45 UTC
Not sure if the same error I only succeeded reproducing it with upstream qemu.
Avi and Gleb think it can be depended on BIOS version.

1) Migrate from westmere to an older machine .
2) The guest needs to be in real mode . I simulated it by using an empty 
image so the guest can't boot.

The migration fails on the destination: 

KVM: entry failed, hardware error 0x80000021

If you're runnning a guest on an Intel machine without unrestricted mode
support, the failure can be most likely due to the guest entering an invalid
state for Intel VT. For example, the guest maybe running in big real mode
which is not supported on less recent Intel processors.

EAX=000004a9 EBX=00035df0 ECX=000f4240 EDX=00009af2
ESI=00000000 EDI=00000000 EBP=037f0000 ESP=00000040
EIP=0000ff53 EFL=fffbfcff [DOSZAPC] CPL=3 II=1 A20=0 SMM=1 HLT=0
ES =0000 00000000 0000ffff 0000f300
CS =f000 000f0000 0000ffff 0000f300
SS =0000 00000000 0000ffff 0000f300
DS =0000 00000000 0000ffff 0000f300
FS =0000 00000000 0000ffff 0000f300
GS =0000 00000000 0000ffff 0000f300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     00000000 0000ffff
IDT=     00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Comment 7 RHEL Program Management 2012-07-10 08:15:25 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 8 RHEL Program Management 2012-07-11 02:01:58 UTC
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development.  This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.

Comment 9 Alexandros Gkesos 2015-11-24 14:05:19 UTC
Hello,

Re-opening this Bug as there was no ERRATA and a customer faced it again with newer RHEL versions.

The only difference that i have seen between my case is that the CPUs are exactly the same.



2015-11-17 15:47:41.200+0000: starting up
...
qemu parameters
...
main_channel_link: add main channel client
main_channel_handle_parsed: net test: latency 1.629000 ms, bitrate 94482376 bps (90.105415 Mbps)
inputs_connect: inputs channel client create
red_dispatcher_set_cursor_peer: 
KVM: entry failed, hardware error 0x80000021
kvm_run returned -22
rax 0000000000000000 rbx ffffffff8006bdfb rcx 0000000000000000 rdx 0000000000000000
rsi 0000000000000001 rdi ffffffff8031f7b8 rsp ffffffff80467f90 rbp 000000000008fc00
r8  ffffffff80466000 r9  0000000000000038 r10 ffff81081fdf0158 r11 7ffffffffffffcd8
r12 0000000000000000 r13 0000000000000000 r14 0000000000000000 r15 0000000000000000
rip ffffffff8006be24 rflags 00000246
cs 0010 (00000000/ffffffff p 1 dpl 0 db 0 s 1 type b l 1 g 1 avl 0)
ds 0018 (00000000/ffffffff p 1 dpl 0 db 1 s 1 type 3 l 0 g 1 avl 0)
es 0018 (00000000/ffffffff p 1 dpl 0 db 1 s 1 type 3 l 0 g 1 avl 0)
ss 0018 (00000000/ffffffff p 1 dpl 0 db 1 s 1 type 3 l 0 g 1 avl 0)
fs 0000 (00000000/ffffffff p 0 dpl 0 db 1 s 0 type 1 l 0 g 1 avl 0)
gs 0000 (ffffffff80436000/ffffffff p 0 dpl 0 db 1 s 0 type 1 l 0 g 1 avl 0)
tr 0040 (ffff810001001000/0000206f p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl 0)
ldt 0000 (00000000/ffffffff p 0 dpl 0 db 1 s 0 type 0 l 0 g 1 avl 0)
gdt ffffffff80468000/80
idt ffffffff804bc000/fff
cr0 8005003b cr2 2b93cb0fe000 cr3 799521000 cr4 6a0 cr8 0 efer d01
red_channel_client_disconnect: rcc=0x7f144c57a8c0 (channel=0x7f144c766e10 type=3 id=0)
red_channel_client_disconnect: rcc=0x7f0c04243920 (channel=0x7f0c0421f350 type=2 id=0)
red_channel_client_disconnect: rcc=0x7f144c53e210 (channel=0x7f144c75bd50 type=1 id=0)
main_channel_client_on_disconnect: rcc=0x7f144c53e210
red_channel_client_disconnect: rcc=0x7f0c042a0be0 (channel=0x7f0c0421f920 type=4 id=0)
red_client_destroy: destroy client 0x7f1456c07c10 with #channels=4
red_dispatcher_disconnect_cursor_peer: 
red_dispatcher_disconnect_display_peer: 
qemu: terminating on signal 15 from pid 12178
2015-11-17 17:08:54.400+0000: shutting down



Destination Host
 
	 Release    :  Red Hat Enterprise Linux Server release 6.7 (Santiago)
	 Kernel     :  2.6.32-573.7.1.el6.x86_64

	 vdsm	    : 4.16.26-1.el6ev        	 libvirt     : 0.10.2-54.el6         
	 qemu-img   : 0.12.1.2-2.479.el6_7.1 	 qemu-kvm    : 0.12.1.2-2.479.el6_7.1
	 SPICE	    : 0.12.4-12.el6_7.3      	 RHEV Tools  : 0.12.1.2-2.479.el6_7.1

CPU
  80 logical processors (40 CPU cores)
  4 Intel Xeon CPU E7- 4850 @ 2.00GHz (flags: aes,constant_tsc,ht,lm,pae,vmx) 
  └─20 threads / 10 cores each


Source Host

	 Release    :  Red Hat Enterprise Linux Server release 6.7 (Santiago)
	 Kernel     :  2.6.32-573.7.1.el6.x86_64

	 vdsm	    : 4.16.27-1.el6ev        	 libvirt     : 0.10.2-54.el6         
	 qemu-img   : 0.12.1.2-2.479.el6_7.2 	 qemu-kvm    : 0.12.1.2-2.479.el6_7.2
	 SPICE	    : 0.12.4-12.el6_7.3      	 RHEV Tools  : 0.12.1.2-2.479.el6_7.2

CPU
  80 logical processors (40 CPU cores)
  4 Intel Xeon CPU E7- 4850 @ 2.00GHz (flags: aes,constant_tsc,ht,lm,pae,vmx) 
  └─20 threads / 10 cores each



Please let me know what other logs are needed.

Thank you.