Bug 1069089
Summary: | L2 guest will restart automatically(continuous) when booting L2 guest with "-cpu Haswell" and "-smp >1" | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | FuXiangChun <xfu> | ||||||
Component: | kernel | Assignee: | Marcelo Tosatti <mtosatti> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 7.0 | CC: | acathrow, bdas, choma, hhuang, jarod, juzhang, knoel, michen, mtosatti, pbonzini, svenkatr, virt-maint, xfu | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | kernel-3.10.0-105.el7 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2014-06-13 12:50:40 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
FuXiangChun
2014-02-24 06:59:30 UTC
Created attachment 866877 [details]
guest console log when smp=2
Created attachment 866878 [details]
guest console log when smp=1
Patch(es) available on kernel-3.10.0-105.el7 Re-tested this issue with 3.10.0-110.el7.x86_64 and qemu-kvm-1.5.3-52.el7.x86_64 Tested 4 scenarios. S1. Boot L2 guest with "-cpu host" and "-smp 1,sockets=1,cores=1,threads=1"(hit bug 1038427) S2. Boot L2 guest with "-cpu Haswell" and "-smp 1,sockets=1,cores=1,threads=1" S3. Boot L2 guest with "-cpu host" and "-smp 4,sockets=2,cores=2,threads=1" S4. Boot L2 guest with "-cpu Haswell" and "-smp 4,sockets=2,cores=2,threads=1" Result: 1.Guest hang 2.cann't get any message from guest console 3.Guest black screen 4.qemu-kvm monitor output error message (qemu) KVM: entry failed, hardware error 0x0 EAX=00000000 EBX=00000000 ECX=00000000 EDX=000306c1 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=0000e05b EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =f000 000f0000 0000ffff 00009b00 SS =0000 00000000 0000ffff 00009300 DS =0000 00000000 0000ffff 00009300 FS =0000 00000000 0000ffff 00009300 GS =0000 00000000 0000ffff 00009300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 00000000 0000ffff IDT= 00000000 0000ffff CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=00 00 00 00 00 00 00 00 00 00 00 00 00 66 90 66 90 66 90 90 <2e> 66 83 3e 74 d1 00 0f 85 03 e5 31 c0 8e d0 66 bc 00 70 00 00 66 ba d5 41 0f 00 e9 7f e3 5. Host dmesg output error message [270848.121071] nested_vmx_exit_handled failed vm entry 7 Base on this test result, From QE point of view. All scenarios can not work normally. so re-assign this bug. Jarod, If need to test other scenarios, or have any suggestions, pls update to bz. additional info: 1. check L0 host parameter values # cat /sys/module/kvm_intel/parameters/nested Y # cat /sys/module/kvm_intel/parameters/enable_shadow_vmcs Y # cat /sys/module/kvm_intel/parameters/enable_apicv Y # cat /sys/module/kvm_intel/parameters/ept Y 2. host cpuinfo: #cat /proc/cpuinfo ...... processor : 55 vendor_id : GenuineIntel cpu family : 6 model : 63 model name : Genuine Intel(R) CPU @ 2.20GHz stepping : 1 microcode : 0x80000013 cpu MHz : 2147.921 cache size : 35840 KB physical id : 1 siblings : 28 core id : 14 cpu cores : 14 apicid : 61 initial apicid : 61 fpu : yes fpu_exception : yes cpuid level : 15 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm bogomips : 4394.54 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: Update from Bandan: From comment 1 of the bug 1069089, I tried the following on - dell-pet20-01.ml3.eng.bos.redhat.com (Haswell host) : Q1. Booting L2 guest with "-cpu host" Reproduced Bug 1069089 Q2. Booting L2 guest with "-cpu Haswell" & -smp 1 Cannot reproduce, guest boots up fine. Q3. Booting L2 guest with "-cpu Haswell" & -smp >1 Cannot reproduce, guest boots up fine. I am not sure if I am missing something from QE's setup, my cmdline is exactly from the bug report. Bandan FuXiang, I'm not concerned with -cpu host at level 2 right now. It would be nice to come up with a -cpu model that is known to work or a statement that specifying the same CPU model as the host should work. For example -cpu haswell on a haswell host. What is different between QE's setup and Bandan's? (In reply to Karen Noel from comment #12) > FuXiang, > > I'm not concerned with -cpu host at level 2 right now. It would be nice to > come up with a -cpu model that is known to work or a statement that > specifying the same CPU model as the host should work. For example -cpu > haswell on a haswell host. > > What is different between QE's setup and Bandan's? Additionally, I would also suggest that QE retest with a newer kernel, and if possible, on a different host so that if there's any (potential) machine specific strangeness going on, that can be isolated. Re-tested with the latest kernel-3.10.0-115.el7.x86_64(guest and host) and qemu-kvm-1.5.3-58.el7.x86_64. and QE tested 2 Haswell host. 1st host. intel-wildcatpass-04.khw.lab.eng.bos.redhat.com Tested 4 scenarios S1. Booting L2 guest with "-cpu host" & -smp 1 S2. Booting L2 guest with "-cpu host" & -smp >1 S3. Booting L2 guest with "-cpu Haswell" & -smp 1 Q4. Booting L2 guest with "-cpu Haswell" & -smp >1 Get the same test result as following for 4 scenarios above. (qemu) KVM: entry failed, hardware error 0x0 EAX=00000000 EBX=00000000 ECX=00000000 EDX=000306f1 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=0000e05b EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =f000 000f0000 0000ffff 00009b00 SS =0000 00000000 0000ffff 00009300 DS =0000 00000000 0000ffff 00009300 FS =0000 00000000 0000ffff 00009300 GS =0000 00000000 0000ffff 00009300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 00000000 0000ffff IDT= 00000000 0000ffff CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=00 00 00 00 00 00 00 00 00 00 00 00 00 66 90 66 90 66 90 90 <2e> 66 83 3e 74 d1 00 0f 85 03 e5 31 c0 8e d0 66 bc 00 70 00 00 66 ba d5 41 0f 00 e9 7f e3 (qemu) info status VM status: paused (internal-error) (qemu) c Resetting the Virtual Machine is required 2nd host:intel-sharkbay-dh-06.lab.bos.redhat.com S1. Booting L2 guest with "-cpu host" & -smp 1 S2. Booting L2 guest with "-cpu host" & -smp >1 S3. Booting L2 guest with "-cpu Haswell" & -smp 1 Q4. Booting L2 guest with "-cpu Haswell" & -smp >1 result: S1 and S2 get the same result as following. guest hang(black screen) (qemu) KVM: entry failed, hardware error 0x7 RAX=00000000000000ff RBX=ffff88007fc0c9a0 RCX=000000000000038f RDX=0000000000000007 RSI=00000000000000ff RDI=000000000000038f RBP=ffff88007c049ae0 RSP=ffff88007c049ae0 R8 =0000000000000006 R9 =ffff88007c049920 R10=ffff88007c049868 R11=0000000000000009 R12=ffff88007fc0ccc8 R13=ffff88007fc0c9a0 R14=ffff88007fc0cbc4 R15=0000000000000001 RIP=ffffffff8104620a RFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 0000000000000000 000fffff 00000000 CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA] SS =0000 0000000000000000 ffffffff 00c00000 DS =0000 0000000000000000 000fffff 00000000 FS =0000 0000000000000000 000fffff 00000000 GS =0000 ffff88007fc00000 000fffff 00000000 LDT=0000 0000000000000000 000fffff 00000000 TR =0040 ffff88007fc11940 00002087 00008b00 DPL=0 TSS64-busy GDT= ffff88007fc0a000 0000007f IDT= ffffffffff529000 00000fff CR0=80050033 CR2=00000000ffffffff CR3=00000000018de000 CR4=001407f0 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000d01 Code=89 d0 5d c3 66 0f 1f 44 00 00 55 89 f0 89 f9 48 89 e5 0f 30 <31> c0 5d c3 66 90 55 89 f9 48 89 e5 0f 33 89 c0 48 c1 e2 20 48 09 c2 48 89 d0 5d c3 66 2e (qemu) info status VM status: paused (internal-error) (qemu) c Resetting the Virtual Machine is required (qemu) info status VM status: paused (internal-error) (qemu) q S3 and S4 get the same result(use -cpu Haswell, L2 guest works well). L2 guest and qemu-kvm works well. Host detailed info: 1st host 1.cat /proc/cpuinfo .... processor : 55 vendor_id : GenuineIntel cpu family : 6 model : 63 model name : Genuine Intel(R) CPU @ 2.20GHz stepping : 1 microcode : 0x80000013 cpu MHz : 2103.492 cache size : 35840 KB physical id : 1 siblings : 28 core id : 14 cpu cores : 14 apicid : 61 initial apicid : 61 fpu : yes fpu_exception : yes cpuid level : 15 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm 2.# free -g total used free shared buffers cached Mem: 31 10 20 0 0 9 -/+ buffers/cache: 0 30 Swap: 15 0 15 2nd host detailed info: 1.cat /proc/cpuinfo processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 60 model name : Intel(R) Core(TM) i5-4670T CPU @ 2.30GHz stepping : 3 microcode : 0x17 cpu MHz : 2899.976 cache size : 6144 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 6 initial apicid : 6 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm bogomips : 4589.03 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: 2.# free -g total used free shared buffers cached Mem: 7 1 6 0 0 0 -/+ buffers/cache: 1 6 Swap: 7 0 7 Anyway,For this bug. use different host to test. will get different result. According to explanation in comment 18 & comment 19. Just I released two intel-wildcatpass-02 and intel-wildcatpass-04. Rich Freiss, QE plan to re-test this bug after you updated the next firmware version. please let me know when you done. I will send a ticket to reserve this machine again. Thanks! If don't consider intel-wildcatpass host & "-cpu host" cause nested fail. Then according to test result in comment 16. This bug is fixed. If intel-wildcatpass host still cause nested fail after updated the next firmware version. QE will file a new bug to track it. Please file a new 7.1 bug for wildcatpass, and verify this one. According to comment20, comment21 and comment22, set this issue as verified. Hi Xiangchun, According to comment21 and comment22, please file a new bz and proposed to rhel7.1 if needed. Best Regards, Junyi I re-tested this issue on wildcatpass Haswell host. still can reproduce as comment16 for wildcatpass Haswell host. I have filed bug 1086058 a new to track it. and proposed to rhel7.1. This request was resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you have further questions about the request. |