Bug 1069089
| Summary: | L2 guest will restart automatically(continuous) when booting L2 guest with "-cpu Haswell" and "-smp >1" | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | FuXiangChun <xfu> | ||||||
| Component: | kernel | Assignee: | Marcelo Tosatti <mtosatti> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 7.0 | CC: | acathrow, bdas, choma, hhuang, jarod, juzhang, knoel, michen, mtosatti, pbonzini, svenkatr, virt-maint, xfu | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | kernel-3.10.0-105.el7 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2014-06-13 12:50:40 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
FuXiangChun
2014-02-24 06:59:30 UTC
Created attachment 866877 [details]
guest console log when smp=2
Created attachment 866878 [details]
guest console log when smp=1
Patch(es) available on kernel-3.10.0-105.el7 Re-tested this issue with 3.10.0-110.el7.x86_64 and qemu-kvm-1.5.3-52.el7.x86_64 Tested 4 scenarios. S1. Boot L2 guest with "-cpu host" and "-smp 1,sockets=1,cores=1,threads=1"(hit bug 1038427) S2. Boot L2 guest with "-cpu Haswell" and "-smp 1,sockets=1,cores=1,threads=1" S3. Boot L2 guest with "-cpu host" and "-smp 4,sockets=2,cores=2,threads=1" S4. Boot L2 guest with "-cpu Haswell" and "-smp 4,sockets=2,cores=2,threads=1" Result: 1.Guest hang 2.cann't get any message from guest console 3.Guest black screen 4.qemu-kvm monitor output error message (qemu) KVM: entry failed, hardware error 0x0 EAX=00000000 EBX=00000000 ECX=00000000 EDX=000306c1 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=0000e05b EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =f000 000f0000 0000ffff 00009b00 SS =0000 00000000 0000ffff 00009300 DS =0000 00000000 0000ffff 00009300 FS =0000 00000000 0000ffff 00009300 GS =0000 00000000 0000ffff 00009300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 00000000 0000ffff IDT= 00000000 0000ffff CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=00 00 00 00 00 00 00 00 00 00 00 00 00 66 90 66 90 66 90 90 <2e> 66 83 3e 74 d1 00 0f 85 03 e5 31 c0 8e d0 66 bc 00 70 00 00 66 ba d5 41 0f 00 e9 7f e3 5. Host dmesg output error message [270848.121071] nested_vmx_exit_handled failed vm entry 7 Base on this test result, From QE point of view. All scenarios can not work normally. so re-assign this bug. Jarod, If need to test other scenarios, or have any suggestions, pls update to bz. additional info: 1. check L0 host parameter values # cat /sys/module/kvm_intel/parameters/nested Y # cat /sys/module/kvm_intel/parameters/enable_shadow_vmcs Y # cat /sys/module/kvm_intel/parameters/enable_apicv Y # cat /sys/module/kvm_intel/parameters/ept Y 2. host cpuinfo: #cat /proc/cpuinfo ...... processor : 55 vendor_id : GenuineIntel cpu family : 6 model : 63 model name : Genuine Intel(R) CPU @ 2.20GHz stepping : 1 microcode : 0x80000013 cpu MHz : 2147.921 cache size : 35840 KB physical id : 1 siblings : 28 core id : 14 cpu cores : 14 apicid : 61 initial apicid : 61 fpu : yes fpu_exception : yes cpuid level : 15 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm bogomips : 4394.54 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: Update from Bandan: From comment 1 of the bug 1069089, I tried the following on - dell-pet20-01.ml3.eng.bos.redhat.com (Haswell host) : Q1. Booting L2 guest with "-cpu host" Reproduced Bug 1069089 Q2. Booting L2 guest with "-cpu Haswell" & -smp 1 Cannot reproduce, guest boots up fine. Q3. Booting L2 guest with "-cpu Haswell" & -smp >1 Cannot reproduce, guest boots up fine. I am not sure if I am missing something from QE's setup, my cmdline is exactly from the bug report. Bandan FuXiang, I'm not concerned with -cpu host at level 2 right now. It would be nice to come up with a -cpu model that is known to work or a statement that specifying the same CPU model as the host should work. For example -cpu haswell on a haswell host. What is different between QE's setup and Bandan's? (In reply to Karen Noel from comment #12) > FuXiang, > > I'm not concerned with -cpu host at level 2 right now. It would be nice to > come up with a -cpu model that is known to work or a statement that > specifying the same CPU model as the host should work. For example -cpu > haswell on a haswell host. > > What is different between QE's setup and Bandan's? Additionally, I would also suggest that QE retest with a newer kernel, and if possible, on a different host so that if there's any (potential) machine specific strangeness going on, that can be isolated. Re-tested with the latest kernel-3.10.0-115.el7.x86_64(guest and host) and qemu-kvm-1.5.3-58.el7.x86_64. and QE tested 2 Haswell host.
1st host. intel-wildcatpass-04.khw.lab.eng.bos.redhat.com
Tested 4 scenarios
S1. Booting L2 guest with "-cpu host" & -smp 1
S2. Booting L2 guest with "-cpu host" & -smp >1
S3. Booting L2 guest with "-cpu Haswell" & -smp 1
Q4. Booting L2 guest with "-cpu Haswell" & -smp >1
Get the same test result as following for 4 scenarios above.
(qemu) KVM: entry failed, hardware error 0x0
EAX=00000000 EBX=00000000 ECX=00000000 EDX=000306f1
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000e05b EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 000f0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT= 00000000 0000ffff
IDT= 00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 00 00 00 00 00 00 00 00 00 00 00 00 66 90 66 90 66 90 90 <2e> 66 83 3e 74 d1 00 0f 85 03 e5 31 c0 8e d0 66 bc 00 70 00 00 66 ba d5 41 0f 00 e9 7f e3
(qemu) info status
VM status: paused (internal-error)
(qemu) c
Resetting the Virtual Machine is required
2nd host:intel-sharkbay-dh-06.lab.bos.redhat.com
S1. Booting L2 guest with "-cpu host" & -smp 1
S2. Booting L2 guest with "-cpu host" & -smp >1
S3. Booting L2 guest with "-cpu Haswell" & -smp 1
Q4. Booting L2 guest with "-cpu Haswell" & -smp >1
result:
S1 and S2 get the same result as following. guest hang(black screen)
(qemu) KVM: entry failed, hardware error 0x7
RAX=00000000000000ff RBX=ffff88007fc0c9a0 RCX=000000000000038f RDX=0000000000000007
RSI=00000000000000ff RDI=000000000000038f RBP=ffff88007c049ae0 RSP=ffff88007c049ae0
R8 =0000000000000006 R9 =ffff88007c049920 R10=ffff88007c049868 R11=0000000000000009
R12=ffff88007fc0ccc8 R13=ffff88007fc0c9a0 R14=ffff88007fc0cbc4 R15=0000000000000001
RIP=ffffffff8104620a RFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 000fffff 00000000
CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0000 0000000000000000 ffffffff 00c00000
DS =0000 0000000000000000 000fffff 00000000
FS =0000 0000000000000000 000fffff 00000000
GS =0000 ffff88007fc00000 000fffff 00000000
LDT=0000 0000000000000000 000fffff 00000000
TR =0040 ffff88007fc11940 00002087 00008b00 DPL=0 TSS64-busy
GDT= ffff88007fc0a000 0000007f
IDT= ffffffffff529000 00000fff
CR0=80050033 CR2=00000000ffffffff CR3=00000000018de000 CR4=001407f0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000d01
Code=89 d0 5d c3 66 0f 1f 44 00 00 55 89 f0 89 f9 48 89 e5 0f 30 <31> c0 5d c3 66 90 55 89 f9 48 89 e5 0f 33 89 c0 48 c1 e2 20 48 09 c2 48 89 d0 5d c3 66 2e
(qemu) info status
VM status: paused (internal-error)
(qemu) c
Resetting the Virtual Machine is required
(qemu) info status
VM status: paused (internal-error)
(qemu) q
S3 and S4 get the same result(use -cpu Haswell, L2 guest works well).
L2 guest and qemu-kvm works well.
Host detailed info:
1st host
1.cat /proc/cpuinfo
....
processor : 55
vendor_id : GenuineIntel
cpu family : 6
model : 63
model name : Genuine Intel(R) CPU @ 2.20GHz
stepping : 1
microcode : 0x80000013
cpu MHz : 2103.492
cache size : 35840 KB
physical id : 1
siblings : 28
core id : 14
cpu cores : 14
apicid : 61
initial apicid : 61
fpu : yes
fpu_exception : yes
cpuid level : 15
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm
2.# free -g
total used free shared buffers cached
Mem: 31 10 20 0 0 9
-/+ buffers/cache: 0 30
Swap: 15 0 15
2nd host detailed info:
1.cat /proc/cpuinfo
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i5-4670T CPU @ 2.30GHz
stepping : 3
microcode : 0x17
cpu MHz : 2899.976
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 3
cpu cores : 4
apicid : 6
initial apicid : 6
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm
bogomips : 4589.03
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
2.# free -g
total used free shared buffers cached
Mem: 7 1 6 0 0 0
-/+ buffers/cache: 1 6
Swap: 7 0 7
Anyway,For this bug. use different host to test. will get different result.
According to explanation in comment 18 & comment 19. Just I released two intel-wildcatpass-02 and intel-wildcatpass-04. Rich Freiss, QE plan to re-test this bug after you updated the next firmware version. please let me know when you done. I will send a ticket to reserve this machine again. Thanks! If don't consider intel-wildcatpass host & "-cpu host" cause nested fail. Then according to test result in comment 16. This bug is fixed. If intel-wildcatpass host still cause nested fail after updated the next firmware version. QE will file a new bug to track it. Please file a new 7.1 bug for wildcatpass, and verify this one. According to comment20, comment21 and comment22, set this issue as verified. Hi Xiangchun, According to comment21 and comment22, please file a new bz and proposed to rhel7.1 if needed. Best Regards, Junyi I re-tested this issue on wildcatpass Haswell host. still can reproduce as comment16 for wildcatpass Haswell host. I have filed bug 1086058 a new to track it. and proposed to rhel7.1. This request was resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you have further questions about the request. |