Bug 1178472

Summary: fail to boot win2012r2 guest with hv_relaxed&hv_vapic&hv_spinlocks=0x1fff&hv_time & -smp 80,cores=2,threads=1,sockets=40
Product: Red Hat Enterprise Linux 7 Reporter: FuXiangChun <xfu>
Component: qemu-kvm-rhevAssignee: Vadim Rozenfeld <vrozenfe>
Status: CLOSED ERRATA QA Contact: FuXiangChun <xfu>
Severity: high Docs Contact:
Priority: high    
Version: 7.1CC: chayang, juzhang, michen, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Windows   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.10.0-8.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-11 00:09:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description FuXiangChun 2015-01-04 06:37:42 UTC
Description of problem:
Boot win2012r2 guest with hv_relaxed&hv_vapic&hv_spinlocks=0x1fff&hv_time & -smp 80,cores=2,threads=1,sockets=40.  guest will reboot automatically during booting. 

If use "-smp 40" to boot guest.  guest works well. QE tested qemu-kvm-1.5.3-84.el7 & qemu-kvm-1.5.3-60.el7 & qemu-kvm-rhev-2.1.2-17.el7 as well. All hit this issue.  So this bug isn't regression bug. 

another.  If remove option "hv_relaxed&hv_vapic&hv_spinlocks=0x1fff&hv_time" or "cores=2,threads=1,sockets=40" from qemu command line.  guest works well.



Version-Release number of selected component (if applicable):
qemu-kvm-1.5.3-84.el7 
qemu-kvm-1.5.3-60.el7 
qemu-kvm-rhev-2.1.2-17.el7

host kernel:
3.10.0-220.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1./usr/libexec/qemu-kvm -name win2012R2-64 -M pc-i440fx-rhel7.1.0 -m 8G -smp 80,cores=2,threads=1,sockets=40,maxcpus=240\

-cpu SandyBridge,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_time \

-drive file=/home/win2012r2-64.qcow2-back,if=none,id=drive-blk0-0-0,format=qcow2,cache=none -device ide-drive,drive=drive-blk0-0-0,id=blk0-0-0,bootindex=1 \
2.
3.

Actual results:
fail to boot guest.

Expected results:
guest works well

Additional info:
linux guest didn't hit this bug

host info:
#lscpu

# lscpu 
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                160
On-line CPU(s) list:   0-159
Thread(s) per core:    2
Core(s) per socket:    10
Socket(s):             8
NUMA node(s):          8
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 47
Model name:            Intel(R) Xeon(R) CPU E7- 2860  @ 2.27GHz
Stepping:              2
CPU MHz:               2260.818
BogoMIPS:              4521.89
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              24576K
NUMA node0 CPU(s):     0-9,80-89
NUMA node1 CPU(s):     10-19,90-99
NUMA node2 CPU(s):     20-29,100-109
NUMA node3 CPU(s):     30-39,110-119
NUMA node4 CPU(s):     40-49,120-129
NUMA node5 CPU(s):     50-59,130-139
NUMA node6 CPU(s):     60-69,140-149
NUMA node7 CPU(s):     70-79,150-159

#cat /proc/cpuinfo
.......
processor	: 159
vendor_id	: GenuineIntel
cpu family	: 6
model		: 47
model name	: Intel(R) Xeon(R) CPU E7- 2860  @ 2.27GHz
stepping	: 2
microcode	: 0x37
cpu MHz		: 2260.818
cache size	: 24576 KB
physical id	: 7
siblings	: 20
core id		: 9
cpu cores	: 10
apicid		: 243
initial apicid	: 243
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat epb dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 4521.89
clflush size	: 64
cache_alignment	: 64
address sizes	: 44 bits physical, 48 bits virtual
power management:

Comment 1 Vadim Rozenfeld 2015-01-08 08:51:39 UTC
Does it fail with BSOD, black screen, or just frozen?

Thanks,
Vadim.

Comment 2 FuXiangChun 2015-01-09 02:17:53 UTC
(In reply to Vadim Rozenfeld from comment #1)
> Does it fail with BSOD, black screen, or just frozen?
> 
> Thanks,
> Vadim.

Guest will automatically reboot when guest UI is loaded. As this bug is found on Intel big machine(160 cores), It has been returned. If need debug the other scenarios. QE will reserve it from beaker asap.  

QE tested this bug on AMD big machine(48 cores) with the same steps as comment0 ,  win2012 guest frozen during booting.  over-commit 32 vcpu number.

Comment 6 Vadim Rozenfeld 2017-08-08 03:54:08 UTC
seems as the same issue as https://bugzilla.redhat.com/show_bug.cgi?id=1451959

for more details please look at following information
https://bugzilla.redhat.com/show_bug.cgi?id=1451959#c7

Comment 7 Vadim Rozenfeld 2017-11-14 11:08:47 UTC
can QE try reproducing this issue on the following build https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14535540 ?

Thanks,
Vadim.

Comment 8 huiqingding 2017-11-15 07:00:57 UTC
(In reply to Vadim Rozenfeld from comment #7)
> can QE try reproducing this issue on the following build
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14535540 ?
> 
> Thanks,
> Vadim.

Test win2012r2 guest with the above build, the guest can boot normally.
The command line is as following:
/usr/libexec/qemu-kvm -name win2012 -m 30G -machine pc -S -cpu Haswell-noTSX,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,enforce -vnc :6 -monitor stdio -device VGA -serial unix:/tmp/console,server,nowait -drive file=/mnt/stable_guest_abi/win2012-64-virtio.qcow2,if=none,id=drive-scsi-disk0,format=qcow2,cache=none,werror=stop,rerror=stop -device ide-drive,drive=drive-scsi-disk0 -netdev tap,id=idinWyYp,vhost=on -device e1000,mac=42:ce:a9:d2:4d:d7,id=idlbq7eA,netdev=idinWyYp -smp 80,cores=2,threads=1,sockets=40,maxcpus=240

Comment 9 Vadim Rozenfeld 2017-11-15 07:12:50 UTC
(In reply to huiqingding from comment #8)
> (In reply to Vadim Rozenfeld from comment #7)
> > can QE try reproducing this issue on the following build
> > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14535540 ?
> > 
> > Thanks,
> > Vadim.
> 
> Test win2012r2 guest with the above build, the guest can boot normally.
> The command line is as following:
> /usr/libexec/qemu-kvm -name win2012 -m 30G -machine pc -S -cpu
> Haswell-noTSX,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,enforce -vnc :6
> -monitor stdio -device VGA -serial unix:/tmp/console,server,nowait -drive
> file=/mnt/stable_guest_abi/win2012-64-virtio.qcow2,if=none,id=drive-scsi-
> disk0,format=qcow2,cache=none,werror=stop,rerror=stop -device
> ide-drive,drive=drive-scsi-disk0 -netdev tap,id=idinWyYp,vhost=on -device
> e1000,mac=42:ce:a9:d2:4d:d7,id=idlbq7eA,netdev=idinWyYp -smp
> 80,cores=2,threads=1,sockets=40,maxcpus=240

Thank you.
Vadim.

Comment 10 Vadim Rozenfeld 2017-11-21 09:35:28 UTC
POST the same as https://bugzilla.redhat.com/show_bug.cgi?id=1451959

Comment 11 Vadim Rozenfeld 2017-11-28 07:57:57 UTC
Fix included in qemu-kvm-rhev-2.10.0-8.el7

Comment 13 huiqingding 2017-12-07 03:16:57 UTC
Test win2012r2 guest, the guest can boot normally.

Package version:
kernel-3.10.0-799.el7.x86_64
qemu-kvm-rhev-2.10.0-10.el7.x86_64

The command line is as following:
/usr/libexec/qemu-kvm -name win2012 -m 30G -machine pc -S -cpu SandyBridge,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,enforce -vnc :6 -monitor stdio -device VGA -serial unix:/tmp/console,server,nowait -drive file=/mnt/stable_guest_abi/win2012-64r2-virtio.qcow2,if=none,id=drive-scsi-disk0,format=qcow2,cache=none,werror=stop,rerror=stop -device ide-drive,drive=drive-scsi-disk0 -netdev tap,id=idinWyYp,vhost=on -device e1000,mac=42:ce:a9:d2:4d:d7,id=idlbq7eA,netdev=idinWyYp -smp 80,cores=2,threads=1,sockets=40,maxcpus=240

Comment 14 huiqingding 2017-12-07 03:17:30 UTC
Based on comment #13, set this bug to be verified.

Comment 17 errata-xmlrpc 2018-04-11 00:09:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1104