Bug 872524
Summary: | windows server 2012 guest w/ 256GB memory always be killed only when numad is enabled on host(w/ 512GB memory) | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Mike Cao <bcao> | ||||
Component: | numad | Assignee: | Bill Gray <bgray> | ||||
Status: | CLOSED ERRATA | QA Contact: | Jakub Prokes <jprokes> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 6.4 | CC: | andebjor, areis, bcao, bgray, bsarathy, cpelland, ddumas, drjones, jherrman, jprokes, jsynacek, juzhang, leiwang, lijin, lnovich, michen, mkenneth, nobody, perfbz, psklenar, qe-baseos-daemons, qzhang, rbalakri, sradvan, virt-maint, xfu | ||||
Target Milestone: | rc | Keywords: | ZStream | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Known Issue | |||||
Doc Text: |
Previously, running the numad daemon on a system executing a process with very large resident memory (such as a Windows Server 2012 guest) could cause memory swapping. As a consequence, significant latencies under some circumstances occurred on the system, which could in turn lead to other processes (such as qemu-kvm) becoming unresponsive. With this update, numad no longer causes memory swapping in the above scenario, and the consequent latencies and hangs no longer occur.
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1112280 (view as bug list) | Environment: | |||||
Last Closed: | 2014-10-14 08:21:27 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 871829, 883516, 957226, 1002699, 1112280 | ||||||
Attachments: |
|
Description
Mike Cao
2012-11-02 10:42:31 UTC
Created attachment 637035 [details]
dmesg
processor : 47 vendor_id : AuthenticAMD cpu family : 16 model : 9 model name : AMD Opteron(tm) Processor 6172 stepping : 1 cpu MHz : 2100.142 cache size : 512 KB physical id : 1 siblings : 12 core id : 5 cpu cores : 12 apicid : 27 initial apicid : 27 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid amd_dcm pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr npt lbrv svm_lock nrip_save pausefilter bogomips : 4200.41 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate # cat /proc/meminfo MemTotal: 529297552 kB MemFree: 514979580 kB Buffers: 27648 kB Cached: 3732784 kB SwapCached: 3792 kB Active: 7912824 kB Inactive: 129600 kB Active(anon): 4275460 kB Inactive(anon): 10096 kB Active(file): 3637364 kB Inactive(file): 119504 kB Unevictable: 34992 kB Mlocked: 10464 kB SwapTotal: 4194296 kB SwapFree: 4168956 kB Dirty: 40 kB Writeback: 0 kB AnonPages: 4331988 kB Mapped: 26516 kB Shmem: 924 kB Slab: 1124940 kB SReclaimable: 122964 kB SUnreclaim: 1001976 kB KernelStack: 10736 kB PageTables: 20564 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 268843072 kB Committed_AS: 4643860 kB VmallocTotal: 34359738367 kB VmallocUsed: 894832 kB VmallocChunk: 33888258768 kB HardwareCorrupted: 0 kB AnonHugePages: 886784 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 6756 kB DirectMap2M: 3129344 kB DirectMap1G: 533725184 kB [root@dell-per815-01 cgroup]# Since I was asked to run svvp test over windows server 2012 platform on RHEL6.3.z host . So It is a testblocker to me this issue only occurs when enable numad on the host #mount cgroup -t cgroup -o cpuset /cgroup #numad -D /cgroup After remove it ,this issue has gone ,but guest always hang at a blank screen Will report a new bug to track it . CLI: /usr/libexec/qemu-kvm -boot menu=on -m 256G -smp 48,cores=48,sockets=1,threads=1 -cpu Opteron_G3,family=0xf -drive file=windows_server_2012_max_amd,format=raw,if=none,id=drive-ide0,cache=none,werror=stop,rerror=stop -device ide-drive,drive=drive-ide0,id=ide0,bootindex=1 -netdev tap,sndbuf=0,id=hostnet0,script=/etc/qemu-ifup,downscript=no -device e1000,netdev=hostnet0,mac=00:52:1a:21:62:01,bus=pci.0,addr=0x4,id=virtio-net-pci0 -uuid ac64c74a-a8d5-4c24-9839-fcc491439493 -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -device usb-ehci,id=ehci0 -drive file=usb_storage_max,format=raw,if=none,id=drive-usb0,cache=none,werror=stop,rerror=stop -device usb-storage,drive=drive-usb0,removable=on,bus=ehci0.0 -chardev socket,id=111a,path=/tmp/amd-max-sut,server,nowait -mon chardev=111a,mode=readline -name amd-max-sut -vnc :0 -drive file=en_windows_server_2012_x64_dvd_915478.iso,id=drive-cdrom,format=raw,if=none,werror=stop,rerror=stop,media=cdrom -device ide-drive,drive=drive-cdrom,id=cdrom -vga std -numa node,mem=32G,cpus=0,4,8,12,16,20,nodeid=0 -numa node,mem=32G,cpus=24,28,32,36,40,44,nodeid=1-numa node,mem=32G,cpus=3,7,11,15,19,23,nodeid=2 -numa node,mem=32G,cpus=27,31,35,39,43,47,nodeid=3 -numa node,mem=32G,cpus=2,6,10,14,18,22,nodeid=4 -numa node,mem=32G,cpus=26,30,34,38,42,46,nodeid=5 -numa node,mem=32G,cpus=1,5,9,13,17,21,nodeid=6 -numa node,mem=32G,cpus=25,29,33,37,41,45,nodeid=7 Reproduce this issue with the same command line, and: 1) This issue only happens for win2012 guest. I tried win2k8r2 and win7, have no problem. 2) Tested with "-m 256G" and assign each numa node with 32G mem. Failed (5/5 times). Tested with "-m 192G"/"-m 184G"/"-m 144G" mem, all failed. Tested with "-m 128G" and assign each numa node with 16G mem (8 numa node in total), passed. Guest can boot up successfully. (3/3 times) Tested with "-m 120G" and assign each numa node with 15G mem.(1/4 failed, 3/4 passed.) 3) Stop numad service and re-test, this issue does not appear. And this issue also happens on the latest rhel6.4 host. kernel-2.6.32-341.el6.x86_64 qemu-kvm-0.12.1.2-2.334.el6.x86_64 I'm assuming numad is the culprit here. Please investigate and let us know if you believe qemu-kvm is the faulty component instead. Disable KSM and see if that helps. Bill Gray, and other suggestions/input? (In reply to comment #7) > this issue only occurs when enable numad on the host > #mount cgroup -t cgroup -o cpuset /cgroup > #numad -D /cgroup > > After remove it ,this issue has gone ,but guest always hang at a blank screen > Will report a new bug to track it . What state is the guest and its vcpus in at this point? Guest is still "running"? vcpus are ?? (check 'ps -eLo pid,comm,s | grep qemu' frequently - or maybe just watch top) (In reply to comment #10) > I'm assuming numad is the culprit here. Please investigate and let us know > if you believe qemu-kvm is the faulty component instead. I'm guessing it's not numad's fault, but rather the state the qemu threads are causing cgroups to choke when attempting to add them - which leads to them getting killed somehow (or "cleaned up"). We should focus on the hang without numad first. I assume the bug opened for it is bug 874406? Try reproducing this without any guests using e1000 nics (I see e1000 in the cmdline in comment 7). If this issue still reproduces then it's separate, otherwise we can dup this bug to bug 874406. (In reply to comment #12) > (In reply to comment #7) > > this issue only occurs when enable numad on the host > > #mount cgroup -t cgroup -o cpuset /cgroup > > #numad -D /cgroup > > > > After remove it ,this issue has gone ,but guest always hang at a blank screen > > Will report a new bug to track it . > > What state is the guest and its vcpus in at this point? Guest is still > "running"? vcpus are ?? (check 'ps -eLo pid,comm,s | grep qemu' frequently - > or maybe just watch top) Details referring to https://bugzilla.redhat.com/show_bug.cgi?id=873613 > > (In reply to comment #10) > > I'm assuming numad is the culprit here. Please investigate and let us know > > if you believe qemu-kvm is the faulty component instead. > > I'm guessing it's not numad's fault, but rather the state the qemu threads > are causing cgroups to choke when attempting to add them - which leads to > them getting killed somehow (or "cleaned up"). We should focus on the hang > without numad first. I assume the bug opened for it is bug 874406? No. it should be https://bugzilla.redhat.com/show_bug.cgi?id=873613 (In reply to comment #13) > Try reproducing this without any guests using e1000 nics (I see e1000 in the > cmdline in comment 7). If this issue still reproduces then it's separate, > otherwise we can dup this bug to bug 874406. Still can reproduce this issue only w/ virtio-net-pci and rtl8139 emulated NICs Mike (In reply to comment #15) > No. it should be https://bugzilla.redhat.com/show_bug.cgi?id=873613 That bug says that > 32 vcpus won't work, the cmdline in comment 7 has 48. Did you reproduce this issue with only 32 vcpus? and no e1000 nics? We need to remove all known issues in order to see if there is anything left. I suggest using a known-good config, but with 256G of memory. a) test without numad - make sure it works b) test with numad - see what happens (In reply to comment #17) > (In reply to comment #15) > > No. it should be https://bugzilla.redhat.com/show_bug.cgi?id=873613 > > That bug says that > 32 vcpus won't work, the cmdline in comment 7 has 48. > Did you reproduce this issue with only 32 vcpus? and no e1000 nics? We need > to remove all known issues in order to see if there is anything left. I am using -smp 48 to test this issue ,I did not try vcpu=32 . Do we need it ? > > I suggest using a known-good config, but with 256G of memory. > a) test without numad - make sure it works -smp 48 + w/o numa ---> guest always hang > b) test with numad - see what happens -smp 48 + w/numa -----> qemu-kvm process killed Mike (In reply to comment #18) > (In reply to comment #17) > > (In reply to comment #15) > > > No. it should be https://bugzilla.redhat.com/show_bug.cgi?id=873613 > > > > That bug says that > 32 vcpus won't work, the cmdline in comment 7 has 48. > > Did you reproduce this issue with only 32 vcpus? and no e1000 nics? We need > > to remove all known issues in order to see if there is anything left. > > I am using -smp 48 to test this issue ,I did not try vcpu=32 . > Do we need it ? > Yes, based on bug 873613 comment 3, I would say so. In general, we need to eliminate all config options that have found other bugs (and other bugs have already been opened for them). This bug is to address a possible problem with 256G configs and numad. So the config for the test guest should be a known-good config (i.e. one that works) plus 256G. Then, the numad variable should be toggled, as I outlined. > > > > > I suggest using a known-good config, but with 256G of memory. > > a) test without numad - make sure it works > -smp 48 + w/o numa ---> guest always hang > > b) test with numad - see what happens > -smp 48 + w/numa -----> qemu-kvm process killed The problem with debugging like this is that both of these symptoms can be from the same problem. As I wrote at the bottom of comment 12, the process getting killed is likely just a result of numad trying to manage a broken guest. To debug we need to compare working vs. not-working. Not, not-working-one-way vs. not-working-another-way. (In reply to comment #19) > (In reply to comment #18) > > (In reply to comment #17) > > > (In reply to comment #15) > > > > No. it should be https://bugzilla.redhat.com/show_bug.cgi?id=873613 > > > > > > That bug says that > 32 vcpus won't work, the cmdline in comment 7 has 48. > > > Did you reproduce this issue with only 32 vcpus? and no e1000 nics? We need > > > to remove all known issues in order to see if there is anything left. > > > > I am using -smp 48 to test this issue ,I did not try vcpu=32 . > > Do we need it ? > > > > Yes, based on bug 873613 comment 3, I would say so. In general, we need to > eliminate all config options that have found other bugs (and other bugs have > already been opened for them). This bug is to address a possible problem > with 256G configs and numad. So the config for the test guest should be a > known-good config (i.e. one that works) plus 256G. Then, the numad variable > should be toggled, as I outlined. Will try w/ -smp 32 with numad enabled . > > > > > > > > > I suggest using a known-good config, but with 256G of memory. > > > a) test without numad - make sure it works > > -smp 48 + w/o numa ---> guest always hang > > > b) test with numad - see what happens > > -smp 48 + w/numa -----> qemu-kvm process killed > > The problem with debugging like this is that both of these symptoms can be > from the same problem. As I wrote at the bottom of comment 12, the process > getting killed is likely just a result of numad trying to manage a broken > guest. To debug we need to compare working vs. not-working. Not, > not-working-one-way vs. not-working-another-way. Hi, Andrew What's the "broken guest" mean ? the image I am using is a image which I used for SVVP test now ,it should not be a broken guest .after reproducing this bug ,the image I am still using it for SVVP test ,and it works fine . For this bug ,when numad service running,qemu-kvm process has been killed ,qemu-kvm process will exist when numad stopped .why idea why this may dup of 873613? Thanks, Mike (In reply to comment #20) > > Will try w/ -smp 32 with numad enabled . AND with it disabled first to make sure that works. I.e. get a clean baseline FIRST. > What's the "broken guest" mean ? the image I am using is a image which I > used for SVVP test now ,it should not be a broken guest .after reproducing > this bug ,the image I am still using it for SVVP test ,and it works fine . > Not broken image, but broken config. If there's an existing bug that says win2012 guests don't work with >32 vcpus, then why are we still creating configs with >32 vcpus? > For this bug ,when numad service running,qemu-kvm process has been killed > ,qemu-kvm process will exist when numad stopped .why idea why this may dup > of 873613? If testing a 256G guest with numad doesn't produce any problems, then this bug could be closed as a dup, or just NOTABUG. (In reply to comment #21) > (In reply to comment #20) > > > > Will try w/ -smp 32 with numad enabled . > > AND with it disabled first to make sure that works. I.e. get a clean > baseline FIRST. > e1000+ -smp32 + 256GB +without numa service-->guest works e1000+ -smp32 + 256GB +with numa service --->the terminal which runs qemu-kvm process freezed ,I use vncviewer to track guest ,find guest freeze during boot Will try test w/o e1000 later (In reply to comment #22) > (In reply to comment #21) > > (In reply to comment #20) > > > > > > Will try w/ -smp 32 with numad enabled . > > > > AND with it disabled first to make sure that works. I.e. get a clean > > baseline FIRST. > > > > > e1000+ -smp32 + 256GB +without numa service-->guest works > e1000+ -smp32 + 256GB +with numa service --->the terminal which runs > qemu-kvm process freezed ,I use vncviewer to track guest ,find guest freeze > during boot > > Will try test w/o e1000 later Need to mention I wait for more than 1 hour , I tried ps -eaf|grep qemu when nuamd is running ,but output is hang . Then I stop nuamd service ,the terminal running ps process works ,and it shows [qemu-kvm] <defunct> grep qemu Re-test this issue on kernel-351 & qemu-kvm-rhev-348 w/out e1000 Results: rtl8139+ smp32 + 256GB + w/o nuamd ---> guest works fine rtl8139+ smp32 + 256GB + w nuamd ---> guest has been killed cat /var/log/numad PID 4023 moved to node(s) 1 in 103.2 seconds Removing obsolete cpuset : /cgroup/nuamd.4023 (In reply to comment #24) > Re-test this issue on kernel-351 & qemu-kvm-rhev-348 w/out e1000 > > Results: > > rtl8139+ smp32 + 256GB + w/o nuamd ---> guest works fine > rtl8139+ smp32 + 256GB + w nuamd ---> guest has been killed > > > cat /var/log/numad > PID 4023 moved to node(s) 1 in 103.2 seconds > Removing obsolete cpuset : /cgroup/nuamd.4023 OK, this data is starting to point at numad/cgroups. We now have a clean baseline (guest works w/out numad) and we have logs from numad stating a migration took 103 seconds. If the guest was blocked 103 seconds, then it wouldn't be too surprising that it died. I tried to reproduce this on my small numa system with no luck. Likely the large guest memory configuration is required - which would also make for longer migration times. Jan, I suggest we find a system where we can reproduce this, and then experiment with rate-limiting the migrations, or just avoiding large migrations all together. Yes, we might need to release note this, since clearly identifying the problem is unlikely in time for 6.4. Numad does nothing different for various types of guests, yet per comment 8, other varieties of Windows guests seem to work fine with numad. It might also be relevant that the Windows 2012 guest seems to hang without numad (if I read the comments correctly). Though maybe the more recent Windows 2012 guest tests without numad are working OK? Thanks very much for running so many tests! One more I might suggest would be using less than 30% of the system resources for the guest: so about 160GB RAM and about 30 vCPUs -- just to be sure 2x the guest fits well within the system resources. Assuming this also fails when numad is running, please start numad with the -l7 option before starting the guest, to capture more detailed debugging information. Thanks! Added a DocText. The additional information about swapping leading to latencies comes from my experimentation on amd-dinar-08.lab.bos.redhat.com (32G 4 node system running 28G win2012 guest). I saw with top that shortly after numad kicked in a bunch of migration threads we got this 13190 root 20 0 18592 528 388 D 3.3 0.0 5:46.18 numad 339 root 20 0 0 0 0 D 1.7 0.0 0:26.51 kswapd1 341 root 20 0 0 0 0 D 1.7 0.0 0:28.07 kswapd3 340 root 20 0 0 0 0 D 1.3 0.0 0:30.57 kswapd2 22808 root 20 0 15436 1720 944 R 0.7 0.0 0:01.29 top 141 root 20 0 0 0 0 S 0.3 0.0 0:10.12 events/10 1933 root 20 0 0 0 0 S 0.3 0.0 10:51.94 kondemand/0 1964 root 20 0 0 0 0 S 0.3 0.0 0:45.07 kondemand/31 1 root 20 0 19352 1152 984 S 0.0 0.0 0:02.87 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd 3 root RT 0 0 0 0 S 0.0 0.0 0:12.61 migration/0 4 root 20 0 0 0 0 S 0.0 0.0 0:03.73 ksoftirqd/0 5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0 6 root RT 0 0 0 0 S 0.0 0.0 0:00.84 watchdog/0 7 root RT 0 0 0 0 S 0.0 0.0 0:17.21 migration/1 8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/1 9 root 20 0 0 0 0 S 0.0 0.0 0:05.04 ksoftirqd/1 10 root RT 0 0 0 0 S 0.0 0.0 0:00.53 watchdog/1 11 root RT 0 0 0 0 S 0.0 0.0 0:06.48 migration/2 12 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/2 13 root 20 0 0 0 0 S 0.0 0.0 0:00.78 ksoftirqd/2 14 root RT 0 0 0 0 S 0.0 0.0 0:00.61 watchdog/2 15 root RT 0 0 0 0 S 0.0 0.0 0:05.61 migration/3 16 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/3 17 root 20 0 0 0 0 S 0.0 0.0 0:01.30 ksoftirqd/3 18 root RT 0 0 0 0 S 0.0 0.0 0:00.55 watchdog/3 19 root RT 0 0 0 0 S 0.0 0.0 0:07.01 migration/4 20 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/4 21 root 20 0 0 0 0 S 0.0 0.0 0:00.58 ksoftirqd/4 22 root RT 0 0 0 0 S 0.0 0.0 0:00.74 watchdog/4 23 root RT 0 0 0 0 S 0.0 0.0 0:08.23 migration/5 24 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/5 25 root 20 0 0 0 0 S 0.0 0.0 0:01.51 ksoftirqd/5 26 root RT 0 0 0 0 S 0.0 0.0 0:00.54 watchdog/5 27 root RT 0 0 0 0 S 0.0 0.0 0:05.08 migration/6 28 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/6 29 root 20 0 0 0 0 S 0.0 0.0 0:01.36 ksoftirqd/6 30 root RT 0 0 0 0 S 0.0 0.0 0:00.56 watchdog/6 31 root RT 0 0 0 0 S 0.0 0.0 0:04.36 migration/7 32 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/7 33 root 20 0 0 0 0 S 0.0 0.0 0:01.08 ksoftirqd/7 34 root RT 0 0 0 0 S 0.0 0.0 0:00.54 watchdog/7 35 root RT 0 0 0 0 S 0.0 0.0 0:04.84 migration/8 36 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/8 37 root 20 0 0 0 0 S 0.0 0.0 0:03.47 ksoftirqd/8 (In reply to comment #31) > Added a DocText. The additional information about swapping leading to > latencies comes from my experimentation on amd-dinar-08.lab.bos.redhat.com > (32G 4 node system running 28G win2012 guest). Changed the doctext from "Bug Fix" to "Known Issue", as there's no fix at the moment. (In reply to comment #32) > (In reply to comment #31) > > Added a DocText. The additional information about swapping leading to > > latencies comes from my experimentation on amd-dinar-08.lab.bos.redhat.com > > (32G 4 node system running 28G win2012 guest). > > Changed the doctext from "Bug Fix" to "Known Issue", as there's no fix at > the moment. Had to change the text as well otherwise bugzilla reverts it back to "Bug Fix". Thanks for adding the DocText. Yes, it is probably true that the system will be forced to swap when trying to use too much memory. However, per comment 8 ("This issue only happens for win2012 guest. I tried win2k8r2 and win7, have no problem.") the DocText should indicate this appears to cause trouble only with very large Windows 2012 guests. (In reply to comment #34) > Thanks for adding the DocText. Yes, it is probably true that the system > will be forced to swap when trying to use too much memory. However, per > comment 8 ("This issue only happens for win2012 guest. I tried win2k8r2 and > win7, have no problem.") the DocText should indicate this appears to cause > trouble only with very large Windows 2012 guests. The latencies will occur for any huge task being migrated, which would be all Windows VMs that have have large amounts of memory allocated to them in their configs - due to their page zeroing. Maybe only 2012 was overly sensitive to that latency though? I'm not opposed to adding 2012 the DocText, but it probably wouldn't hurt to keep it more general. There's really no point in using numad in these scenarios anyway (moving a 4 node guest to 3 nodes). The potential for problems seems to outweigh the benefit. Thanks. Since there appears to be such a clear Windows 2012 specific component here, it would be odd not to mention it. I agree there is marginal (if any) benefit from moving a huge guest from 100% of system to 75% of system -- but there might be benefit to moving to 50% of the system nodes, depending on the work load... While we know moving huge amounts of memory takes time, and we suspect Windows 2012 is more sensitive to it than other guest types, we don't yet clearly know the root cause of the problem. It might be most accurate to explicitly communicate the facts as we know them: Windows 2008r2 and Windows 7 guests as large as 256GB appear to work correctly in a numad environment, but Windows 2012 guests as small as 120GB sometimes seem to hang in a numad environment, depending on the system memory quantity and configuration. We have reproduced this issue, and learned more about it. The issue is specific to Windows 2012 guests that use more memory than exists in a single node. Windows 2012 guests appear to allocate memory more gradually than other Windows guest types, which triggers the issue. Note that other varieties of Windows guests do not seem to experience this problem. You can work around this problem by: (1) limiting Windows 2012 guests to less memory than exists in a given node -- so on a typical 4 node system with even memory distribution the guest would need to be less than the total amount of system memory divided by 4; or (2) allowing the Windows 2012 guests to finish allocating all of its memory before allowing numad to run. Numad will handle extremely huge Windows 2012 guests correctly after allowing a few minutes for the guest to finish allocating all of its memory. We will work on a general fix to handle all Windows 2012 guests, for some subsequent release. (In reply to comment #42) > (In reply to comment #40) > > (In reply to comment #39) > > > The testing environment for this bug is um.. nontrivial (both hw and sw). > > > Mike, could we reuse your setup for testing the fix during rhel6.5 testing > > > phase? Or would you retest it yourself once the fixed packages are available? > > > > > > Thanks in advance > > > > I tested this issue with the same machine as comment 2. and the latest > kernel and qemu on host. > > kernel: 2.6.32-376.el6.x86_64 qemu-kvm:qemu-kvm-0.12.1.2-2.369.el6.x86_64 > and qemu-kvm-rhev-0.12.1.2-2.369.el6.x86_64 > > command line: > /usr/libexec/qemu-kvm -boot menu=on -m 256G -smp > 48,cores=48,sockets=1,threads=1 -cpu Opteron_G3,family=0xf -drive > file=/home/win2012-64-virtio.qcow2,format=qcow2,if=none,id=drive-ide0, > cache=none,werror=stop,rerror=stop -device > ide-drive,drive=drive-ide0,id=ide0,bootindex=1 -netdev > tap,sndbuf=0,id=hostnet0,script=/etc/qemu-ifup,downscript=no -device > e1000,netdev=hostnet0,mac=00:52:1a:21:62:01,bus=pci.0,addr=0x4,id=virtio-net- > pci0 -uuid ac64c74a-a8d5-4c24-9839-fcc491439493 -rtc > base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -chardev > socket,id=111a,path=/tmp/amd-max-sut,server,nowait -mon > chardev=111a,mode=readline -name amd-max-sut -vnc :0 -vga std -numa > node,mem=32G,cpus=0,4,8,12,16,20,nodeid=0 -numa > node,mem=32G,cpus=24,28,32,36,40,44,nodeid=1 -numa > node,mem=32G,cpus=3,7,11,15,19,23,nodeid=2 -numa > node,mem=32G,cpus=27,31,35,39,43,47,nodeid=3 -numa > node,mem=32G,cpus=2,6,10,14,18,22,nodeid=4 -numa > node,mem=32G,cpus=26,30,34,38,42,46,nodeid=5 -numa > node,mem=32G,cpus=1,5,9,13,17,21,nodeid=6 -numa > node,mem=32G,cpus=25,29,33,37,41,45,nodeid=7 > > Test result(with or without numa): guest work well. don't hit this issue. > > > Ales and bcao, If I have any wrong, please correct me. Pay attention to the bug component ,This is bug is numad issue ,while I did not find you start numad service on the host ,pls retest it with numad services running Re-test this issue with 2.6.32-376.el6.x86_64 qemu-kvm:qemu-kvm-rhev-0.12.1.2-2.369.el6.x86_64. If start numad service on host, then this bug is reproduced. 1.# mount cgroup -t cgroup -o cpuset /cgroup # numad -D /cgroup # /etc/rc.d/init.d/numad status numad (pid 13514) is running... 2. boot guest as comment 42 # /usr/libexec/qemu-kvm -boot menu=on -m 256G -smp 48,cores=48,sockets=1,threads=1 -cpu Opteron_G3,family=0xf -drive file=/home/win2012-64-virtio.qcow2,format=qcow2,if=none,id=drive-ide0,cache=none,werror=stop,rerror=stop -device ide-drive,drive=drive-ide0,id=ide0,bootindex=1 -netdev tap,sndbuf=0,id=hostnet0,script=/etc/qemu-ifup,downscript=no -device e1000,netdev=hostnet0,mac=00:52:1a:21:62:01,bus=pci.0,addr=0x4,id=virtio-net-pci0 -uuid ac64c74a-a8d5-4c24-9839-fcc491439493 -rtc base=localtime,clock=host,driftfix=slew -no-kvm-pit-reinjection -chardev socket,id=111a,path=/tmp/amd-max-sut,server,nowait -mon chardev=111a,mode=readline -name amd-max-sut -vnc :0 -vga std -numa node,mem=32G,cpus=0,4,8,12,16,20,nodeid=0 -numa node,mem=32G,cpus=24,28,32,36,40,44,nodeid=1 -numa node,mem=32G,cpus=3,7,11,15,19,23,nodeid=2 -numa node,mem=32G,cpus=27,31,35,39,43,47,nodeid=3 -numa node,mem=32G,cpus=2,6,10,14,18,22,nodeid=4 -numa node,mem=32G,cpus=26,30,34,38,42,46,nodeid=5 -numa node,mem=32G,cpus=1,5,9,13,17,21,nodeid=6 -numa node,mem=32G,cpus=25,29,33,37,41,45,nodeid=7 Killed numad version: # rpm -qa|grep numa numad-0.5-8.20121015git.el6.x86_64 numactl-2.0.7-6.el6.x86_64 numactl-devel-2.0.7-6.el6.x86_64 This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release. wrt comment 52 asking if this can be resolved in 6.5 at all: Yes, I will try to get the fixed version in 6.5.z shortly after the verified fix is in an early 6.6 base level. Attempting to get it in 6.6 ... Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1594.html |