Bug 1250357

Summary: KVM: entry failed, hardware error 0x80000021 after migration from SandyBridge to Penryn host.
Product: Red Hat Enterprise Linux 7 Reporter: Qian Guo <qiguo>
Component: qemu-kvm-rhevAssignee: Radim Krčmář <rkrcmar>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.2CC: amit.shah, bdas, dgilbert, juzhang, knoel, michen, qiguo, quintela, rkrcmar, virt-maint, weliao
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1254480 (view as bug list) Environment:
Last Closed: 2015-09-10 14:59:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1254480    

Comment 2 Qian Guo 2015-08-18 08:34:04 UTC
Retest with the 2 hosts, and disable the unrestricted_guest enabled in both hosts, still can hit such issue.


And I see that the ept is different in 2 hosts, then I disabled it in both hosts.
still hit .


I will change this bug subject, seams it is not completely related with unrestricted_guest enabled.

I will paste the cpuinfo of the 2 different hosts:


1.The host with ept and unrestricted_guest disabled by default:
# cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Core(TM)2 Quad CPU    Q9500  @ 2.83GHz
stepping	: 10
microcode	: 0xa0b
cpu MHz		: 2826.223
cache size	: 3072 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm tpr_shadow vnmi flexpriority
bogomips	: 5652.44
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Core(TM)2 Quad CPU    Q9500  @ 2.83GHz
stepping	: 10
microcode	: 0xa0b
cpu MHz		: 2826.223
cache size	: 3072 KB
physical id	: 0
siblings	: 4
core id		: 1
cpu cores	: 4
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm tpr_shadow vnmi flexpriority
bogomips	: 5652.44
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 2
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Core(TM)2 Quad CPU    Q9500  @ 2.83GHz
stepping	: 10
microcode	: 0xa0b
cpu MHz		: 2826.223
cache size	: 3072 KB
physical id	: 0
siblings	: 4
core id		: 2
cpu cores	: 4
apicid		: 2
initial apicid	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm tpr_shadow vnmi flexpriority
bogomips	: 5652.44
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Core(TM)2 Quad CPU    Q9500  @ 2.83GHz
stepping	: 10
microcode	: 0xa0b
cpu MHz		: 2826.223
cache size	: 3072 KB
physical id	: 0
siblings	: 4
core id		: 3
cpu cores	: 4
apicid		: 3
initial apicid	: 3
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm tpr_shadow vnmi flexpriority
bogomips	: 5652.44
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:




2.The host with ept and unrestricted_guest enabled by default:
# cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 58
model name	: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
stepping	: 9
microcode	: 0x1b
cpu MHz		: 2506.437
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt
bogomips	: 6784.38
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 58
model name	: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
stepping	: 9
microcode	: 0x1b
cpu MHz		: 1990.593
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 1
cpu cores	: 4
apicid		: 2
initial apicid	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt
bogomips	: 6784.38
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 2
vendor_id	: GenuineIntel
cpu family	: 6
model		: 58
model name	: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
stepping	: 9
microcode	: 0x1b
cpu MHz		: 1721.648
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 2
cpu cores	: 4
apicid		: 4
initial apicid	: 4
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt
bogomips	: 6784.38
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 58
model name	: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
stepping	: 9
microcode	: 0x1b
cpu MHz		: 3822.210
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 3
cpu cores	: 4
apicid		: 6
initial apicid	: 6
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt
bogomips	: 6784.38
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 4
vendor_id	: GenuineIntel
cpu family	: 6
model		: 58
model name	: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
stepping	: 9
microcode	: 0x1b
cpu MHz		: 3764.570
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt
bogomips	: 6784.38
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 5
vendor_id	: GenuineIntel
cpu family	: 6
model		: 58
model name	: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
stepping	: 9
microcode	: 0x1b
cpu MHz		: 3452.726
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 1
cpu cores	: 4
apicid		: 3
initial apicid	: 3
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt
bogomips	: 6784.38
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 6
vendor_id	: GenuineIntel
cpu family	: 6
model		: 58
model name	: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
stepping	: 9
microcode	: 0x1b
cpu MHz		: 3102.632
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 2
cpu cores	: 4
apicid		: 5
initial apicid	: 5
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt
bogomips	: 6784.38
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 7
vendor_id	: GenuineIntel
cpu family	: 6
model		: 58
model name	: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
stepping	: 9
microcode	: 0x1b
cpu MHz		: 3721.140
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 3
cpu cores	: 4
apicid		: 7
initial apicid	: 7
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt
bogomips	: 6784.38
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

Comment 3 Qian Guo 2015-08-18 08:39:55 UTC
Forgot to say that if migrate from SandyBridge host to the Penryn host, migration works smoothly.

Comment 9 Bandan Das 2015-08-28 18:50:22 UTC
Since Radim is offline and far away in an unknown land, I took a quick look.

So, it seems that by specifying "-cpu Penryn,+fsgsbase", the guest (Windows10) assumes  cpuid[FEAT_7_0_EBX] is valid and trying to use unsupported instructions. Actually, my guess is it might be trying to use clflushopt just like in bug 1223317. 

Does this happen only when migrating ? It seems just running a Windows 10 guest on the Penryn host with "-cpu Penryn,+fsgsbase" should be enough to hit this. Qian, can you please confirm ?

Comment 10 Qian Guo 2015-08-31 02:47:58 UTC
(In reply to Bandan Das from comment #9)
> Since Radim is offline and far away in an unknown land, I took a quick look.
> 
> So, it seems that by specifying "-cpu Penryn,+fsgsbase", the guest
> (Windows10) assumes  cpuid[FEAT_7_0_EBX] is valid and trying to use
> unsupported instructions. Actually, my guess is it might be trying to use
> clflushopt just like in bug 1223317. 
> 
> Does this happen only when migrating ? It seems just running a Windows 10
> guest on the Penryn host with "-cpu Penryn,+fsgsbase" should be enough to
> hit this. Qian, can you please confirm ?

Hi, Bandan

Windows 10 works well in Penryn host and boot with "-cpu Penryn,+fsgsbase", and migration from Penryn to SandyBridge works well.

Thanks,
Qian

Comment 11 Bandan Das 2015-09-02 17:20:54 UTC
(In reply to Qian Guo from comment #10)
> (In reply to Bandan Das from comment #9)
> > Since Radim is offline and far away in an unknown land, I took a quick look.
> > 
> > So, it seems that by specifying "-cpu Penryn,+fsgsbase", the guest
> > (Windows10) assumes  cpuid[FEAT_7_0_EBX] is valid and trying to use
> > unsupported instructions. Actually, my guess is it might be trying to use
> > clflushopt just like in bug 1223317. 
> > 
> > Does this happen only when migrating ? It seems just running a Windows 10
> > guest on the Penryn host with "-cpu Penryn,+fsgsbase" should be enough to
> > hit this. Qian, can you please confirm ?
> 
> Hi, Bandan
> 
> Windows 10 works well in Penryn host and boot with "-cpu Penryn,+fsgsbase",
> and migration from Penryn to SandyBridge works well.
> 
> Thanks,
> Qian

Ok, so it's basically what Dave said in comment 6. Since fsgsbase is valid on SandyBridge, Windows 10 assumes certain instructions as supported and blows out on the Penryn host. Can you try one more test -Migrate from Penryn to SandyBridge and then back. On the return path, the migration from SandyBridge to the Penryn host should succeed.

Comment 12 Qian Guo 2015-09-06 01:56:03 UTC
(In reply to Bandan Das from comment #11)
> (In reply to Qian Guo from comment #10)
> > (In reply to Bandan Das from comment #9)
> > > Since Radim is offline and far away in an unknown land, I took a quick look.
> > > 
> > > So, it seems that by specifying "-cpu Penryn,+fsgsbase", the guest
> > > (Windows10) assumes  cpuid[FEAT_7_0_EBX] is valid and trying to use
> > > unsupported instructions. Actually, my guess is it might be trying to use
> > > clflushopt just like in bug 1223317. 
> > > 
> > > Does this happen only when migrating ? It seems just running a Windows 10
> > > guest on the Penryn host with "-cpu Penryn,+fsgsbase" should be enough to
> > > hit this. Qian, can you please confirm ?
> > 
> > Hi, Bandan
> > 
> > Windows 10 works well in Penryn host and boot with "-cpu Penryn,+fsgsbase",
> > and migration from Penryn to SandyBridge works well.
> > 
> > Thanks,
> > Qian
> 
> Ok, so it's basically what Dave said in comment 6. Since fsgsbase is valid
> on SandyBridge, Windows 10 assumes certain instructions as supported and
> blows out on the Penryn host. Can you try one more test -Migrate from Penryn
> to SandyBridge and then back. On the return path, the migration from
> SandyBridge to the Penryn host should succeed.

Hi, Bandan

When I reported this bug, I was doing this ping-pong migration, so it failed to migrate from SandyBridge to Penryn even first from Penryn to SandyBridge.

Thanks,
Qian

Comment 13 Bandan Das 2015-09-06 02:16:09 UTC
(In reply to Qian Guo from comment #12)
> Hi, Bandan
> 
> When I reported this bug, I was doing this ping-pong migration, so it failed
> to migrate from SandyBridge to Penryn even first from Penryn to SandyBridge.

I am not sure I understand. Are you saying it fails migrating from Penryn to SandyBridge too ? 

Please confirm if the test I mentioned in comment 11 fails with a recent build that contains the fix for bug 1223317.

> Thanks,
> Qian

Comment 14 Qian Guo 2015-09-06 03:01:06 UTC
(In reply to Bandan Das from comment #13)
> (In reply to Qian Guo from comment #12)
> > Hi, Bandan
> > 
> > When I reported this bug, I was doing this ping-pong migration, so it failed
> > to migrate from SandyBridge to Penryn even first from Penryn to SandyBridge.
> 
> I am not sure I understand. Are you saying it fails migrating from Penryn to
> SandyBridge too ? 

No, I means after  migration  from Penryn to SandyBridge(works well), then hit the issue once migration back.
> 
> Please confirm if the test I mentioned in comment 11 fails with a recent
> build that contains the fix for bug 1223317.
> 

Will try again, and updated here.

> > Thanks,
> > Qian

Comment 15 Qian Guo 2015-09-10 10:31:04 UTC
Hi, Bandan

Sorry for long time  response, since the hosts are doing other tests.

Retest with latest builds qemu-kvm-rhev-2.3.0-22.el7.x86_64, and yes you are right, with the latest builds, if fist migration from Penryn to SandyBrdige(works well), then migrate back, guest works well, and I test ping-pong for 10 times, the issue gone.

And if migrate from SandyBridge to Penryn as the first time, it will crash


The cli are same as above comments that with +fsgsbase.


Thanks,
Qian.

Comment 16 Dr. David Alan Gilbert 2015-09-10 14:59:59 UTC
OK, thanks for confirming.

This isn't a bug, because:
   a) Penryn doesn't support fsgsbase - and so trying to use a feature on a CPU that doesn't have it may break if the OS tries to use it.
   b) 'enforce' correctly stops you from doing (a)

Comment 17 Hai Huang 2015-09-16 13:44:44 UTC
*** Bug 1254480 has been marked as a duplicate of this bug. ***