Bug 873613
Summary: | Windows server 2012 guest hang when start guest with -m 128GB -smp 48 on AMD host | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Mike Cao <bcao> | ||||
Component: | qemu-kvm | Assignee: | Gleb Natapov <gleb> | ||||
Status: | CLOSED WORKSFORME | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 6.4 | CC: | acathrow, areis, bcao, bsarathy, cpelland, juzhang, knoel, mazhang, michen, mkenneth, qzhang, rhod, tburke, virt-bugs, virt-maint, xfu | ||||
Target Milestone: | rc | Keywords: | Reopened | ||||
Target Release: | --- | Flags: | xfu:
needinfo-
|
||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-11-06 12:58:40 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Mike Cao
2012-11-06 10:19:16 UTC
Created attachment 639242 [details]
ftrace
Some update: CLI: # /usr/libexec/qemu-kvm -boot menu=on -m 256G -smp 48,cores=48,sockets=1,threads=1 -cpu Opteron_G3,family=0xf -drive file=/opt/win2k8-r2-qzhang.raw,format=raw,if=none,id=drive-ide0,cache=none,werror=stop,rerror=stop -device ide-drive,drive=drive-ide0,id=ide0 -netdev tap,sndbuf=0,id=hostnet0,script=/etc/qemu-ifup,downscript=no -device e1000,netdev=hostnet0,mac=00:52:1a:21:62:01,bus=pci.0,addr=0x4,id=virtio-net-pci0 -uuid ac64c74a-a8d5-4c24-9839-fcc491439493 -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device usb-ehci,id=ehci0 -drive format=raw,if=none,id=drive-usb0,cache=none,werror=stop,rerror=stop -device usb-storage,drive=drive-usb0,removable=on,bus=ehci0.0 -name amd-max-sut -vnc :0 -drive file=/root/en_windows_server_2008_r2_standard_enterprise_datacenter_and_web_with_sp1_x64_dvd_617601.iso,id=drive-cdrom,format=raw,if=none,werror=stop,rerror=stop,media=cdrom -device ide-drive,drive=drive-cdrom,id=cdrom -vga std -boot c -monitor stdio 1. Can not reproduce with a *20G virtual size* windows 2012 guest image with "-m 256G -cpu 48" configuration. 2. Can reproduced on windows 2012 guest with a large image size (360G virtual size) # qemu-img info win2012-64-qzhang.raw image: win2012-64-qzhang.raw file format: raw virtual size: 360G (386547056640 bytes) disk size: 8.1G And I found the guest will easily hang after login guest and the win-2012 guest "server manager" are opened by default. No response after click the mouse or keyboard at that time. If not login guest, guest will not hang. After login guest, the qemu-kvm process %CPU on host increases from several hundreds (for example 550%) to more than 1600%. 1) -m 256G -cpu 48. guest hangs *after login guest* 2) -m 128G -cpu 48. guest hangs *after login guest* 3) -m 128G -cpu 24. passed. top info during guest hang: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6726 root 20 0 261g 249g 4740 R 1641.8 49.5 16:26.66 qemu-kvm 3. Can not reproduce with win2k8r2 guest with "-m 256G -cpu 48" configuration. Guest becomes very slow but doesn't hang even the qemu-kvm process nearly consumes 4800% cpu. 4. Tested on both rhel6.3-z and rhel6.4 hosts, get the same results. rhel6.3-z: kernel-2.6.32-279.11.1.el6.x86_64 qemu-kvm-0.12.1.2-2.295.el6_3.5.x86_64 rhel6.4: kernel-2.6.32-341.el6.x86_64 qemu-kvm-0.12.1.2-2.334.el6.x86_64 Looks like a dupe of Bug 820112, Gleb should be able to confirm it. *** This bug has been marked as a duplicate of bug 820112 *** Hi, Ademar and Karen This bug is closed as a duplicated with the win2k8-r2 issue in bug 820112. But there's some difference between win2k8-r2 and win2012 guests. When I boot the two guests separately with same command line "-m 128G -smp 48" on an AMD large host (512G mem and 48 cpus): 1. For win2012: qemu-kvm process consumes about 1600% cpu. For win2k8-r2: qemu-kvm process consumes about 4800% cpus. 2. For win2012: No response if I click mouse or keyboard. For win2k8-r2: Mouse and keyboard works, but with a little slow response. (2~5 seconds latency) So, the win2012 guest gets worse result, right? Could you guys help confirm whether this is another issue? Thanks, Qunfang (In reply to comment #6) > Hi, Ademar and Karen > > This bug is closed as a duplicated with the win2k8-r2 issue in bug 820112. > But there's some difference between win2k8-r2 and win2012 guests. > > When I boot the two guests separately with same command line "-m 128G -smp > 48" on an AMD large host (512G mem and 48 cpus): > > 1. For win2012: qemu-kvm process consumes about 1600% cpu. > For win2k8-r2: qemu-kvm process consumes about 4800% cpus. > > 2. For win2012: No response if I click mouse or keyboard. > For win2k8-r2: Mouse and keyboard works, but with a little slow response. > (2~5 seconds latency) > > So, the win2012 guest gets worse result, right? Could you guys help confirm > whether this is another issue? > Run win2012 with numa config. If problem is gone this is exactly same issue. (In reply to comment #7) > (In reply to comment #6) > > Hi, Ademar and Karen > > > > This bug is closed as a duplicated with the win2k8-r2 issue in bug 820112. > > But there's some difference between win2k8-r2 and win2012 guests. > > > > When I boot the two guests separately with same command line "-m 128G -smp > > 48" on an AMD large host (512G mem and 48 cpus): > > > > 1. For win2012: qemu-kvm process consumes about 1600% cpu. > > For win2k8-r2: qemu-kvm process consumes about 4800% cpus. > > > > 2. For win2012: No response if I click mouse or keyboard. > > For win2k8-r2: Mouse and keyboard works, but with a little slow response. > > (2~5 seconds latency) > > > > So, the win2012 guest gets worse result, right? Could you guys help confirm > > whether this is another issue? > > > Run win2012 with numa config. If problem is gone this is exactly same issue. Run win2012 with numa config, but guest always be killed and can not boot up. Please refer to: Bug 872524 - windows server 2012 guest w/ 256GB memory always be killed only when numad is enabled on host(w/ 512GB memory) Referring to comment #0 .still hit the issue when add -numa in qemu-kvm commandline Re-open this bug (In reply to comment #9) > Referring to comment #0 .still hit the issue when add -numa in qemu-kvm > commandline > > Re-open this bug Gleb? (In reply to comment #10) > (In reply to comment #9) > > Referring to comment #0 .still hit the issue when add -numa in qemu-kvm > > commandline > > > > Re-open this bug > > Gleb? Probably something else than. On 16 Nov Andy Cathraw wrote (regarding downgrading to -smp 32 in RHEL6.3.z) "Downgrading CPUs to get 2012 is acceptable." To be realistic, insisting on 48 CPUs will cost us too much, so I suggest that we certify 2012 guest with 32 CPUs for now. Any objection? We are downgrading the number of CPUs to 32 to pass Win Server 2012 certification. This bug is not blocking Win Server 2012 certification, and can be fixed in RHEL6.5. A backport to RHEL6.4.z would be preferable. The numa configuration that was used for this bug is incorrect. -smp 48,cores=48,sockets=1,threads=1 means that there is only one socket and such HW configuration cannot be NUMA. It still looks like dup of #820112. Hi Ronen You set needinfo, do you want QE to test with correct numa configuration? BTW, same configuration for 24 v-cpu passed, see comment 3. Qunfang, I wanted to bring it to your attention and get your opinion. Yes, it looks as if testing it with the correct NUMA configuration will be the next step. At least we will know if it is a duplicate of bug 820112 Thanks, Ronen. (In reply to Ronen Hod from comment #17) > Qunfang, > > I wanted to bring it to your attention and get your opinion. Yes, it looks > as if testing it with the correct NUMA configuration will be the next step. > At least we will know if it is a duplicate of bug 820112 > > Thanks, Ronen. Ronen Ok, got it. We will track it and test it when get the large host. The host is in hot demand and still used by other people now. xfu is waiting for the large host to verify another bug, and will take care of this bug together. Thanks xfu. Can not reproduce this bug on RHEL6.5. host: RHEL6.5-20130925.2 qemu-kvm-0.12.1.2-2.407.el6.x86_64 kernel-2.6.32-420.el6.x86_64 numactl-2.0.7-8.el6.x86_64 guest: en_windows_server_2012_x64_dvd_915478 cli: /usr/libexec/qemu-kvm -boot menu=on -m 256G -smp 48,cores=48,sockets=1,threads=1 -cpu Opteron_G3,family=0xf -drive file=/home/win2012-64.raw,format=raw,if=none,id=drive-ide0,cache=none,werror=stop,rerror=stop -device ide-drive,drive=drive-ide0,id=ide0 -netdev tap,sndbuf=0,id=hostnet0,script=/etc/qemu-ifup,downscript=no -device e1000,netdev=hostnet0,mac=00:52:1a:21:62:01,bus=pci.0,addr=0x4,id=virtio-net-pci0 -uuid ac64c74a-a8d5-4c24-9839-fcc491439493 -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device usb-ehci,id=ehci0 -drive format=raw,if=none,id=drive-usb0,cache=none,werror=stop,rerror=stop -device usb-storage,drive=drive-usb0,removable=on,bus=ehci0.0 -name amd-max-sut -vnc :0 -vga std -boot c -monitor stdio -numa node,mem=32G,cpus=0,4,8,12,16,20,nodeid=0 -numa node,mem=32G,cpus=24,28,32,36,40,44,nodeid=1 -numa node,mem=32G,cpus=3,7,11,15,19,23,nodeid=2 -numa node,mem=32G,cpus=27,31,35,39,43,47,nodeid=3 -numa node,mem=32G,cpus=2,6,10,14,18,22,nodeid=4 -numa node,mem=32G,cpus=26,30,34,38,42,46,nodeid=5 -numa node,mem=32G,cpus=1,5,9,13,17,21,nodeid=6 -numa node,mem=32G,cpus=25,29,33,37,41,45,nodeid=7 image: [root@amd-6172-512-2 home]# qemu-img info win2012-64.raw image: win2012-64.raw file format: raw virtual size: 360G (386547056640 bytes) disk size: 8.2G Result: Guest works well with and without numa config. Guest will open "server manager" automatically, it will make guest mouse very slow, close "server manager" mouse gets smooth. Correct command line, all test should be 48 socket, "-m 256G -smp 48,cores=1,sockets=48,threads=1 ". I agree close this one as we open an new bug https://bugzilla.redhat.com/show_bug.cgi?id=1024754 to track |