Bug 873613

Summary: Windows server 2012 guest hang when start guest with -m 128GB -smp 48 on AMD host
Product: Red Hat Enterprise Linux 6 Reporter: Mike Cao <bcao>
Component: qemu-kvmAssignee: Gleb Natapov <gleb>
Status: CLOSED WORKSFORME QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.4CC: acathrow, areis, bcao, bsarathy, cpelland, juzhang, knoel, mazhang, michen, mkenneth, qzhang, rhod, tburke, virt-bugs, virt-maint, xfu
Target Milestone: rcKeywords: Reopened
Target Release: ---Flags: xfu: needinfo-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-06 12:58:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ftrace none

Description Mike Cao 2012-11-06 10:19:16 UTC
Description of problem:


Version-Release number of selected component (if applicable):
2.6.32-279.11.1.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.295.el6_3.5.x86_64


How reproducible:
80%

Steps to Reproduce:
1.Start windows server 2012 with 128GB memeory
CLI:
/usr/libexec/qemu-kvm -boot menu=on -m 256G -smp 48,cores=48,sockets=1,threads=1 -cpu Opteron_G3,family=0xf -drive file=windows_server_2012_max_amd,format=raw,if=none,id=drive-ide0,cache=none,werror=stop,rerror=stop -device ide-drive,drive=drive-ide0,id=ide0,bootindex=1 -netdev tap,sndbuf=0,id=hostnet0,script=/etc/qemu-ifup,downscript=no -device e1000,netdev=hostnet0,mac=00:52:1a:21:62:01,bus=pci.0,addr=0x4,id=virtio-net-pci0 -uuid ac64c74a-a8d5-4c24-9839-fcc491439493 -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device usb-ehci,id=ehci0 -drive file=usb_storage_max,format=raw,if=none,id=drive-usb0,cache=none,werror=stop,rerror=stop -device usb-storage,drive=drive-usb0,removable=on,bus=ehci0.0 -chardev socket,id=111a,path=/tmp/amd-max-sut,server,nowait -mon chardev=111a,mode=readline -name amd-max-sut -vnc :0 -drive file=en_windows_server_2012_x64_dvd_915478.iso,id=drive-cdrom,format=raw,if=none,werror=stop,rerror=stop,media=cdrom -device ide-drive,drive=drive-cdrom,id=cdrom -vga std -numa node,mem=32G,cpus=0,4,8,12,16,20,nodeid=0 -numa node,mem=32G,cpus=24,28,32,36,40,44,nodeid=1-numa node,mem=32G,cpus=3,7,11,15,19,23,nodeid=2 -numa node,mem=32G,cpus=27,31,35,39,43,47,nodeid=3 -numa node,mem=32G,cpus=2,6,10,14,18,22,nodeid=4 -numa node,mem=32G,cpus=26,30,34,38,42,46,nodeid=5 -numa node,mem=32G,cpus=1,5,9,13,17,21,nodeid=6 -numa node,mem=32G,cpus=25,29,33,37,41,45,nodeid=7

  
Actual results:
guest hang 

Expected results:
no hang occurs

Additional info:

Comment 1 Mike Cao 2012-11-06 10:20:48 UTC
Created attachment 639242 [details]
ftrace

Comment 3 Qunfang Zhang 2012-11-13 09:42:27 UTC
Some update:
CLI:
# /usr/libexec/qemu-kvm -boot menu=on -m 256G -smp 48,cores=48,sockets=1,threads=1 -cpu Opteron_G3,family=0xf -drive file=/opt/win2k8-r2-qzhang.raw,format=raw,if=none,id=drive-ide0,cache=none,werror=stop,rerror=stop -device ide-drive,drive=drive-ide0,id=ide0 -netdev tap,sndbuf=0,id=hostnet0,script=/etc/qemu-ifup,downscript=no -device e1000,netdev=hostnet0,mac=00:52:1a:21:62:01,bus=pci.0,addr=0x4,id=virtio-net-pci0 -uuid ac64c74a-a8d5-4c24-9839-fcc491439493 -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device usb-ehci,id=ehci0 -drive format=raw,if=none,id=drive-usb0,cache=none,werror=stop,rerror=stop -device usb-storage,drive=drive-usb0,removable=on,bus=ehci0.0 -name amd-max-sut -vnc :0 -drive file=/root/en_windows_server_2008_r2_standard_enterprise_datacenter_and_web_with_sp1_x64_dvd_617601.iso,id=drive-cdrom,format=raw,if=none,werror=stop,rerror=stop,media=cdrom -device ide-drive,drive=drive-cdrom,id=cdrom -vga std -boot c -monitor stdio

1. Can not reproduce with a *20G virtual size* windows 2012 guest image with "-m 256G -cpu 48" configuration.

2. Can reproduced on windows 2012 guest with a large image size  (360G virtual size)
# qemu-img info win2012-64-qzhang.raw 
image: win2012-64-qzhang.raw
file format: raw
virtual size: 360G (386547056640 bytes)
disk size: 8.1G

And I found the guest will easily hang after login guest and the win-2012 guest "server manager" are opened by default. No response after click the mouse or keyboard at that time. If not login guest, guest will not hang.
After login guest, the qemu-kvm process %CPU on host increases from several hundreds (for example 550%) to more than 1600%. 

1) -m 256G -cpu 48. guest hangs *after login guest*
2) -m 128G -cpu 48. guest hangs *after login guest*
3) -m 128G -cpu 24. passed. 

top info during guest hang:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                          
 6726 root      20   0  261g 249g 4740 R 1641.8 49.5  16:26.66 qemu-kvm   

3. Can not reproduce with win2k8r2 guest with "-m 256G -cpu 48" configuration.
Guest becomes very slow but doesn't hang even the qemu-kvm process nearly consumes 4800% cpu.

4. Tested on both rhel6.3-z and rhel6.4 hosts, get the same results.
rhel6.3-z:
kernel-2.6.32-279.11.1.el6.x86_64
qemu-kvm-0.12.1.2-2.295.el6_3.5.x86_64

rhel6.4:
kernel-2.6.32-341.el6.x86_64
qemu-kvm-0.12.1.2-2.334.el6.x86_64

Comment 4 Ademar Reis 2012-11-14 02:33:06 UTC
Looks like a dupe of Bug 820112, Gleb should be able to confirm it.

Comment 5 Karen Noel 2012-11-14 12:46:32 UTC

*** This bug has been marked as a duplicate of bug 820112 ***

Comment 6 Qunfang Zhang 2012-11-15 03:14:24 UTC
Hi, Ademar and Karen

This bug is closed as a duplicated with the win2k8-r2 issue in bug 820112. But there's some difference between win2k8-r2 and win2012 guests.

When I boot the two guests separately with same command line "-m 128G -smp 48" on an AMD large host (512G mem and 48 cpus):

1. For win2012: qemu-kvm process consumes about 1600% cpu.
   For win2k8-r2: qemu-kvm process consumes about 4800% cpus.

2. For win2012: No response if I click mouse or keyboard. 
   For win2k8-r2: Mouse and keyboard works, but with a little slow response. (2~5 seconds latency)

So, the win2012 guest gets worse result, right? Could you guys help confirm whether this is another issue?

Thanks,
Qunfang

Comment 7 Gleb Natapov 2012-11-15 05:06:10 UTC
(In reply to comment #6)
> Hi, Ademar and Karen
> 
> This bug is closed as a duplicated with the win2k8-r2 issue in bug 820112.
> But there's some difference between win2k8-r2 and win2012 guests.
> 
> When I boot the two guests separately with same command line "-m 128G -smp
> 48" on an AMD large host (512G mem and 48 cpus):
> 
> 1. For win2012: qemu-kvm process consumes about 1600% cpu.
>    For win2k8-r2: qemu-kvm process consumes about 4800% cpus.
> 
> 2. For win2012: No response if I click mouse or keyboard. 
>    For win2k8-r2: Mouse and keyboard works, but with a little slow response.
> (2~5 seconds latency)
> 
> So, the win2012 guest gets worse result, right? Could you guys help confirm
> whether this is another issue?
> 
Run win2012 with numa config. If problem is gone this is exactly same issue.

Comment 8 Qunfang Zhang 2012-11-15 05:17:33 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > Hi, Ademar and Karen
> > 
> > This bug is closed as a duplicated with the win2k8-r2 issue in bug 820112.
> > But there's some difference between win2k8-r2 and win2012 guests.
> > 
> > When I boot the two guests separately with same command line "-m 128G -smp
> > 48" on an AMD large host (512G mem and 48 cpus):
> > 
> > 1. For win2012: qemu-kvm process consumes about 1600% cpu.
> >    For win2k8-r2: qemu-kvm process consumes about 4800% cpus.
> > 
> > 2. For win2012: No response if I click mouse or keyboard. 
> >    For win2k8-r2: Mouse and keyboard works, but with a little slow response.
> > (2~5 seconds latency)
> > 
> > So, the win2012 guest gets worse result, right? Could you guys help confirm
> > whether this is another issue?
> > 
> Run win2012 with numa config. If problem is gone this is exactly same issue.

Run win2012 with numa config, but guest always be killed and can not boot up.
Please refer to:
Bug 872524 - windows server 2012 guest w/ 256GB memory always be killed only when numad is enabled on host(w/ 512GB memory)

Comment 9 Mike Cao 2012-11-15 05:20:41 UTC
Referring to comment #0 .still hit the issue when add -numa in qemu-kvm commandline 

Re-open this bug

Comment 10 Dor Laor 2012-11-15 12:39:52 UTC
(In reply to comment #9)
> Referring to comment #0 .still hit the issue when add -numa in qemu-kvm
> commandline 
> 
> Re-open this bug

Gleb?

Comment 11 Gleb Natapov 2012-11-15 13:32:58 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > Referring to comment #0 .still hit the issue when add -numa in qemu-kvm
> > commandline 
> > 
> > Re-open this bug
> 
> Gleb?

Probably something else than.

Comment 12 Ronen Hod 2012-11-27 17:29:49 UTC
On 16 Nov Andy Cathraw wrote (regarding downgrading to -smp 32 in RHEL6.3.z)
"Downgrading CPUs to get 2012 is acceptable."

To be realistic, insisting on 48 CPUs will cost us too much, so I suggest that we certify 2012 guest with 32 CPUs for now.
Any objection?

Comment 13 Bhavna Sarathy 2012-11-27 19:31:36 UTC
We are downgrading the number of CPUs to 32 to pass Win Server 2012 certification. This bug is not blocking Win Server 2012 certification, and can be fixed in RHEL6.5.  A backport to RHEL6.4.z would be preferable.

Comment 15 Gleb Natapov 2013-05-21 06:52:48 UTC
The numa configuration that was used for this bug is incorrect. 
-smp 48,cores=48,sockets=1,threads=1 means that there is only one socket and such HW configuration cannot be NUMA. It still looks like dup of #820112.

Comment 16 Qunfang Zhang 2013-08-08 09:18:29 UTC
Hi Ronen
You set needinfo, do you want QE to test with correct numa configuration? BTW, same configuration for 24 v-cpu passed, see comment 3.

Comment 17 Ronen Hod 2013-08-08 10:49:19 UTC
Qunfang,

I wanted to bring it to your attention and get your opinion. Yes, it looks as if testing it with the correct NUMA configuration will be the next step. At least we will know if it is a duplicate of bug 820112

Thanks, Ronen.

Comment 18 Qunfang Zhang 2013-08-09 07:46:58 UTC
(In reply to Ronen Hod from comment #17)
> Qunfang,
> 
> I wanted to bring it to your attention and get your opinion. Yes, it looks
> as if testing it with the correct NUMA configuration will be the next step.
> At least we will know if it is a duplicate of bug 820112
> 
> Thanks, Ronen.

Ronen

Ok, got it. We will track it and test it when get the large host.

Comment 19 Qunfang Zhang 2013-08-15 09:53:36 UTC
The host is in hot demand and still used by other people now. xfu is waiting for the large host to verify another bug, and will take care of this bug together. Thanks xfu.

Comment 20 mazhang 2013-09-29 08:36:47 UTC
Can not reproduce this bug on RHEL6.5.

host:
RHEL6.5-20130925.2
qemu-kvm-0.12.1.2-2.407.el6.x86_64
kernel-2.6.32-420.el6.x86_64
numactl-2.0.7-8.el6.x86_64

guest:
en_windows_server_2012_x64_dvd_915478

cli:
/usr/libexec/qemu-kvm -boot menu=on -m 256G -smp 48,cores=48,sockets=1,threads=1 -cpu Opteron_G3,family=0xf -drive file=/home/win2012-64.raw,format=raw,if=none,id=drive-ide0,cache=none,werror=stop,rerror=stop -device ide-drive,drive=drive-ide0,id=ide0 -netdev tap,sndbuf=0,id=hostnet0,script=/etc/qemu-ifup,downscript=no -device e1000,netdev=hostnet0,mac=00:52:1a:21:62:01,bus=pci.0,addr=0x4,id=virtio-net-pci0 -uuid ac64c74a-a8d5-4c24-9839-fcc491439493 -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device usb-ehci,id=ehci0 -drive format=raw,if=none,id=drive-usb0,cache=none,werror=stop,rerror=stop -device usb-storage,drive=drive-usb0,removable=on,bus=ehci0.0 -name amd-max-sut -vnc :0 -vga std -boot c -monitor stdio -numa node,mem=32G,cpus=0,4,8,12,16,20,nodeid=0 -numa node,mem=32G,cpus=24,28,32,36,40,44,nodeid=1 -numa node,mem=32G,cpus=3,7,11,15,19,23,nodeid=2 -numa node,mem=32G,cpus=27,31,35,39,43,47,nodeid=3 -numa node,mem=32G,cpus=2,6,10,14,18,22,nodeid=4 -numa node,mem=32G,cpus=26,30,34,38,42,46,nodeid=5 -numa node,mem=32G,cpus=1,5,9,13,17,21,nodeid=6 -numa node,mem=32G,cpus=25,29,33,37,41,45,nodeid=7

image:
[root@amd-6172-512-2 home]# qemu-img info win2012-64.raw
image: win2012-64.raw
file format: raw
virtual size: 360G (386547056640 bytes)
disk size: 8.2G

Result:
Guest works well with and without numa config.
Guest will open "server manager" automatically, it will make guest mouse very slow, close "server manager" mouse gets smooth.

Comment 21 mazhang 2013-09-29 09:06:26 UTC
Correct command line, all test should be 48 socket, "-m 256G -smp 48,cores=1,sockets=48,threads=1 ".

Comment 22 Mike Cao 2013-11-07 02:45:47 UTC
I agree close this one as we open an new bug https://bugzilla.redhat.com/show_bug.cgi?id=1024754 to track