Bug 820112
Summary: | usage of cpu is 100% after boot up win2k8R2 guest with -smp 48 and -m 256GB ,sometimes guest BSOD during bootup on AMD host | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Mike Cao <bcao> | ||||
Component: | qemu-kvm | Assignee: | Gleb Natapov <gleb> | ||||
Status: | CLOSED CANTFIX | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 6.3 | CC: | acathrow, areis, bcao, bsarathy, dyasny, fstrauss, gleb, gnatapov, juzhang, knoel, leiwang, ltroan, michen, mkenneth, mtosatti, qguan, qzhang, rhod, tburke, virt-maint, vrozenfe, xfu, yuzhou | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2012-11-14 12:45:05 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 851382 | ||||||
Attachments: |
|
Description
Mike Cao
2012-05-09 08:04:41 UTC
Created attachment 583182 [details]
ftrace
only occurs in AMD host ,did not occurs this bug on Intel host Hi, Vadim I tried the upsteam qemu-kvm Afer guest bootup ,"System" process consume all the cpu resources more than 15 mins. Guest is very slow to response. I did not reproduce graphic driver's crash during boot (comment#5) w/ upstream qemu-kvm and upstream kernel Best Regards, Mike (In reply to comment #21) > Hi, Vadim > > I tried the upsteam qemu-kvm > Afer guest bootup ,"System" process consume all the cpu resources more than 15 > mins. Guest is very slow to response. > > I did not reproduce graphic driver's crash during boot (comment#5) w/ upstream > qemu-kvm and upstream kernel > > Best Regards, > Mike OK, now please try running VM with the following cpu options ",+x2apic,hv_spinlocks=1000,hv_relaxed,hv_vapic" Cheers, Vadim. (In reply to comment #22) > (In reply to comment #21) > > Hi, Vadim > > > > I tried the upsteam qemu-kvm > > Afer guest bootup ,"System" process consume all the cpu resources more than 15 > > mins. Guest is very slow to response. > > > > I did not reproduce graphic driver's crash during boot (comment#5) w/ upstream > > qemu-kvm and upstream kernel > > > > Best Regards, > > Mike > > OK, now please try running VM with the following cpu options > ",+x2apic,hv_spinlocks=1000,hv_relaxed,hv_vapic" > > Cheers, > Vadim. Tried upstream qemu-kvm w/ -cpu host,,+x2apic,hv_spinlocks=1000,hv_relaxed,hv_vapic on AMD host ,still hit the same issue *note* I did not hit this issue on Intel host Re-Test this bug w/ following scnarios: 1.Start Guest w/ -smp 48 -m 256G -rtc base=localtime,clock=host,driftfix=none (-numa node) *12 Actual Results: Guest always BSOD during boot ,referring to comment #5 . After Guest bootup ,the usage of cpu is 2% 2.Start Guest w/ -smp 48 -m 256G -rtc base=localtime,clock=host,driftfix=slew (-numa node) *12 Actual Results: Guest always BSOD during boot ,referring to comment #5 . After Guest bootup ,the usage of cpu is 2% Guest will BSOD ,dumps similiar w/ Bug 801196 3.Start Guest w/ -smp 48 -m 256G -rtc base=localtime,clock=host,driftfix=none w/o numa Actual Results: Guest always BSOD during boot ,refering to comment #5 After Guest bootup, the usage of cpu is 100% 4.Start Guest w/ -smp 32 -m 256G -rtc base=localtime,clock=host,driftfix=none (-numa node) *12 Actual Results: After Guest bootup ,the usage of cpu is 2% 5.Start Guest w/ -smp 32 -m 256G -rtc base=localtime,clock=host,driftfix=slew (-numa node) *12 Actual Results: After Guest bootup ,the usage of cpu is 2% Guest will BSOD ,dumps similiar w/ Bug 801196 Mike (In reply to comment #28) > Re-Test this bug w/ following scnarios: > > 1.Start Guest w/ -smp 48 -m 256G -rtc base=localtime,clock=host,driftfix=none > (-numa node) *12 What's "*12" ? Gleb, care to provide a recommended params for it? > Actual Results: > Guest always BSOD during boot ,referring to comment #5 . What about using qxl driver instead? > After Guest bootup ,the usage of cpu is 2% Does this means the guest managed to survive the BSOD? (In reply to comment #29) > (In reply to comment #28) > > Re-Test this bug w/ following scnarios: > > > > 1.Start Guest w/ -smp 48 -m 256G -rtc base=localtime,clock=host,driftfix=none > > (-numa node) *12 > > What's "*12" ? </usr/libexec/qemu-kvm XXXX > -numa node -numa node -numa node -numa node -numa node -numa node -numa node -numa node -numa node -numa node -numa node -numa node > Gleb, care to provide a recommended params for it? > > > Actual Results: > > Guest always BSOD during boot ,referring to comment #5 . > > What about using qxl driver instead? xfu Tried this ,Did not hit the BSOD related to graphic driver's > > > After Guest bootup ,the usage of cpu is 2% > > Does this means the guest managed to survive the BSOD? There are 2 kinds of BSOD I hit during this Bug one is related to graphic driver(referring to comment #5) ,This kind of BSOD 6/8 times occurs during guest bootup When I use -smp 48 and -vnc another one is related to CLOCK_WATCHDOG_TIMEOUT(referring to Bug 801196).This kind of BSOD almost 100% occurs after I login the guest When use -rtc driftfix=slew and -numa node Too late for RHEL6.3, postponing to 6.4 (ask for the z-stream if necessary). *** Bug 823839 has been marked as a duplicate of this bug. *** Might be the same issue as Bug 821377 We suspect that it might be related to bug 842211 (in POST) Gleb tested that bug fix using the brew build https://brewweb.devel.redhat.com/taskinfo?taskID=4639681 so until we have official 6.4 kernel build, can you please try it. Thanks, Ronen. (In reply to comment #48) > We suspect that it might be related to bug 842211 (in POST) > Gleb tested that bug fix using the brew build > https://brewweb.devel.redhat.com/taskinfo?taskID=4639681 > so until we have official 6.4 kernel build, can you please try it. > > Thanks, Ronen. host use below kernel to re-test this bug. still has the same issue. https://brewweb.devel.redhat.com/taskinfo?taskID=4639681 (In reply to comment #49) > (In reply to comment #48) > > We suspect that it might be related to bug 842211 (in POST) > > Gleb tested that bug fix using the brew build > > https://brewweb.devel.redhat.com/taskinfo?taskID=4639681 > > so until we have official 6.4 kernel build, can you please try it. > > > > Thanks, Ronen. > > host use below kernel to re-test this bug. still has the same issue. > https://brewweb.devel.redhat.com/taskinfo?taskID=4639681 Which issue exactly? There is a lot of issues discussed throughout this BZ. What I am interested in checking is to run qemu with 12 numa nodes specified and see if we get a BSOD like in comment#5. (In reply to comment #50) > (In reply to comment #49) > > (In reply to comment #48) > > > We suspect that it might be related to bug 842211 (in POST) > > > Gleb tested that bug fix using the brew build > > > https://brewweb.devel.redhat.com/taskinfo?taskID=4639681 > > > so until we have official 6.4 kernel build, can you please try it. > > > > > > Thanks, Ronen. > > > > host use below kernel to re-test this bug. still has the same issue. > > https://brewweb.devel.redhat.com/taskinfo?taskID=4639681 > > Which issue exactly? There is a lot of issues discussed throughout this BZ. > What I am interested in checking is to run qemu with 12 numa nodes specified > and see if we get a BSOD like in comment#5. tested two scenarios with this kernel https://brewweb.devel.redhat.com/taskinfo?taskID=4639681 1. boot guest without numa /usr/libexec/qemu-kvm -m 256G -smp 48,cores=48,sockets=1,threads=1 testing result: guest don't show BSOD, but cpu utilization is 100% yet. 2. boot guest with numa /usr/libexec/qemu-kvm -m 256G -smp 48,cores=48,sockets=1,threads=1 ..... -numa node,nodeid=0 -numa node,nodeid=1 -numa node,nodeid=2 -numa node,nodeid=3 -numa node,nodeid=4 -numa node,nodeid=5 -numa node,nodeid=6 -numa node,nodeid=7 -numa node,nodeid=8 -numa node,nodeid=9 -numa node,nodeid=10 -numa node,nodeid=11 testing result: guest work well. host info: cpu:24cores,AMD Opteron(tm) Processor 6168 memory size:264516632 kB Numa number: # cat /proc/buddyinfo Node 0, zone DMA 0 1 3 2 0 0 1 1 0 1 3 Node 0, zone DMA32 89 50 47 16 8 7 74 77 36 2 36 Node 0, zone Normal 186 43 21 9 24 16 14 14 21 1 2 Node 1, zone Normal 105 193 122 42 21 12 8 9 30 2 1 Node 2, zone Normal 57 23 20 7 10 12 10 7 5 14 2 Node 3, zone Normal 10 15 41 19 11 10 4 4 33 0 1 (In reply to comment #51) > (In reply to comment #50) > > (In reply to comment #49) > > > (In reply to comment #48) > > > > We suspect that it might be related to bug 842211 (in POST) > > > > Gleb tested that bug fix using the brew build > > > > https://brewweb.devel.redhat.com/taskinfo?taskID=4639681 > > > > so until we have official 6.4 kernel build, can you please try it. > > > > > > > > Thanks, Ronen. > > > > > > host use below kernel to re-test this bug. still has the same issue. > > > https://brewweb.devel.redhat.com/taskinfo?taskID=4639681 > > > > Which issue exactly? There is a lot of issues discussed throughout this BZ. > > What I am interested in checking is to run qemu with 12 numa nodes specified > > and see if we get a BSOD like in comment#5. > > tested two scenarios with this kernel > https://brewweb.devel.redhat.com/taskinfo?taskID=4639681 > > 1. boot guest without numa > /usr/libexec/qemu-kvm -m 256G -smp 48,cores=48,sockets=1,threads=1 > > testing result: > guest don't show BSOD, but cpu utilization is 100% yet. > > 2. boot guest with numa > /usr/libexec/qemu-kvm -m 256G -smp 48,cores=48,sockets=1,threads=1 ..... > -numa node,nodeid=0 -numa node,nodeid=1 -numa node,nodeid=2 -numa > node,nodeid=3 -numa node,nodeid=4 -numa node,nodeid=5 -numa node,nodeid=6 > -numa node,nodeid=7 -numa node,nodeid=8 -numa node,nodeid=9 -numa > node,nodeid=10 -numa node,nodeid=11 > > testing result: > guest work well. > > > host info: > cpu:24cores,AMD Opteron(tm) Processor 6168 > memory size:264516632 kB > Numa number: > # cat /proc/buddyinfo > > Node 0, zone DMA 0 1 3 2 0 0 1 > 1 0 1 3 > Node 0, zone DMA32 89 50 47 16 8 7 74 > 77 36 2 36 > Node 0, zone Normal 186 43 21 9 24 16 14 > 14 21 1 2 > Node 1, zone Normal 105 193 122 42 21 12 8 > 9 30 2 1 > Node 2, zone Normal 57 23 20 7 10 12 10 > 7 5 14 2 > Node 3, zone Normal 10 15 41 19 11 10 4 > 4 33 0 1 I don't think we use the right environment to verify this bug . Pls find the host w/ 48 cores and 512 GB memory and re-test it . Mike I will re-test it when I reserve a big machine. Gleb, Brew build is closed, I want to use it to re-retest this bug in another big machine, Could you provide it for me again? https://brewweb.devel.redhat.com/taskinfo?taskID=4639681 (In reply to comment #54) > Gleb, > Brew build is closed, I want to use it to re-retest this bug in another > big machine, Could you provide it for me again? > https://brewweb.devel.redhat.com/taskinfo?taskID=4639681 rhel6 kernels starting from 2.6.32-290.el6 includes the patch already. verify this bug with 2.6.32-293.el6.x86_64 1.if boot guest with Numa, guest works well(cpu usage is aobut 1%~2%) 2.if boot guest without Numa, guest still will show BSOD. cli: /usr/libexec/qemu-kvm -m 256G -smp 48,cores=48,sockets=1,threads=1 -name win2k8r2 -rtc base=localtime,clock=host,driftfix=slew -drive file=/root/win2k8r2.qcow2,if=none,id=virtio0,format=qcow2,cache=none -device ide-drive,drive=virtio0,id=virtio0-device -monitor stdio -vnc :1 -k en-us -numa node,nodeid=0 -numa node,nodeid=1 -numa node,nodeid=2 -numa node,nodeid=3 -numa node,nodeid=4 -numa node,nodeid=5 -numa node,nodeid=6 -numa node,nodeid=7 -numa node,nodeid=8 -numa node,nodeid=9 -numa node,nodeid=10 -numa node,nodeid=11 host: Mem:512G cpu:48 cores, amd-6172 # cat /proc/buddyinfo Node 0, zone DMA 1 2 2 2 2 1 2 1 1 0 3 Node 0, zone DMA32 15 4 7 6 6 16 7 8 6 4 599 Node 0, zone Normal 541 590 324 121 49 25 6 5 3 12 15281 Node 1, zone Normal 1054 768 427 208 121 64 53 37 20 34 15946 Node 2, zone Normal 1074 539 369 179 76 35 21 14 11 27 15176 Node 3, zone Normal 1257 604 297 152 71 38 26 24 15 31 15937 Node 4, zone Normal 1611 1054 563 321 171 119 55 45 35 56 15895 Node 5, zone Normal 1538 949 509 287 188 79 45 29 14 27 15946 Node 6, zone Normal 880 1016 494 255 145 90 77 54 43 55 15893 Node 7, zone Normal 1598 782 334 162 89 33 24 22 19 33 15942 (In reply to comment #56) > verify this bug with 2.6.32-293.el6.x86_64 > > 1.if boot guest with Numa, guest works well(cpu usage is aobut 1%~2%) So you are no longer seeing BSODs in this config during boot with vnc? This is good. (In reply to comment #57) > (In reply to comment #56) > > verify this bug with 2.6.32-293.el6.x86_64 > > > > 1.if boot guest with Numa, guest works well(cpu usage is aobut 1%~2%) > So you are no longer seeing BSODs in this config during boot with vnc? This > is good. Yes,I always boot guest with vnc. If boot guest with spicec qxl, then guest don't show BSOD but cpu usage is 100%. Summary: 1. with vnc result:BSOD 2. with vnc and numa result:guest work well(and cpu usage is normal) 3. with spice qxl result:guest don't show BSOD but cpu usage is 100%(guest response is very slow) (In reply to comment #58) > (In reply to comment #57) > > (In reply to comment #56) > > > verify this bug with 2.6.32-293.el6.x86_64 > > > > > > 1.if boot guest with Numa, guest works well(cpu usage is aobut 1%~2%) > > So you are no longer seeing BSODs in this config during boot with vnc? This > > is good. > > Yes,I always boot guest with vnc. > > If boot guest with spicec qxl, then guest don't show BSOD but cpu usage is > 100%. > > Summary: > > 1. with vnc > result:BSOD > > 2. with vnc and numa > result:guest work well(and cpu usage is normal) If we will not hit BSODs w/ VNC + numa . then it is not a testblocker for SVVP. Since I was asked to run svvp test over windows server 2012 platform on RHEL6.3.z host . So It is a testblocker to me (In reply to comment #62) > Since I was asked to run svvp test over windows server 2012 platform on > RHEL6.3.z host . > > So It is a testblocker to me Sorry for updating to a wrong bug ignore this comment pls. (In reply to comment #63) > (In reply to comment #62) > > Since I was asked to run svvp test over windows server 2012 platform on > > RHEL6.3.z host . > > > > So It is a testblocker to me > > Sorry for updating to a wrong bug > ignore this comment pls. I'm assuming adding TestBlocker flag was a mistake as well, and therefore I'm removing it. Please re-add it if it's indeed a test blocker to you. *** Bug 873613 has been marked as a duplicate of this bug. *** |