Bug 833425
Summary: | 3.1.z - vdsm cpuCores shows the wrong number of cores on multi node systems - AMD (Magny-Cours 61XX) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Amador Pahim <asegundo> | ||||||||
Component: | vdsm | Assignee: | Douglas Schilling Landgraf <dougsland> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Ido Begun <ibegun> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | urgent | ||||||||||
Version: | 6.2 | CC: | abaron, aburden, adevolder, bazulay, chetan, danken, dconsoli, dfediuck, htaira, iheim, ilvovsky, jbiddle, leiwang, marcelo.barbosa, mburns, oramraz, pkrempa, pmdyermms, qguan, rvaknin, sgordon, sgrinber, ykaul | ||||||||
Target Milestone: | rc | Keywords: | Patch, ZStream | ||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | infra sla | ||||||||||
Fixed In Version: | vdsm-4.9.6-39.0 | Doc Type: | Release Note | ||||||||
Doc Text: |
On systems with AMD Magny-Cours and Bulldozer CPUs, the number of CPU cores reported always includes hyperthreads. This allows virtual machines running on the host to use up to double the recommended number of virtual CPUs. Additionally, this issue may lead to biased scheduling to favor affected hosts over others in the cluster if all hosts do not have the same number and type of CPU.
|
Story Points: | --- | ||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2012-12-04 19:00:11 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | 825095, 864543, 874050, 877024 | ||||||||||
Bug Blocks: | |||||||||||
Attachments: |
|
Description
Amador Pahim
2012-06-19 12:58:50 UTC
Patch: http://gerrit.ovirt.org/5481 Created attachment 593015 [details]
Affected system /proc/cpuinfo file
The same problem found on AMD Opteron(tm) Processor 6172, which has 48 cores (4 sockets * 12 cores per sockets) but also be recognized as 24 cores by vdsm. Created attachment 593196 [details]
amd-6172-cpuinfo
(In reply to comment #0) > Multi node systems has the same combination of "physical id" and "core id" > to different physical cores. does node above refers to numa-node ? Yes. lscpu from AMD Opteron Processor 6164 HE: # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 48 On-line CPU(s) list: 0-47 Thread(s) per core: 1 Core(s) per socket: 12 CPU socket(s): 4 NUMA node(s): 8 Vendor ID: AuthenticAMD CPU family: 16 Model: 9 Stepping: 1 CPU MHz: 1700.038 BogoMIPS: 3400.05 Virtualization: AMD-V L1d cache: 64K L1i cache: 64K L2 cache: 512K L3 cache: 5118K NUMA node0 CPU(s): 0,4,8,12,16,20 NUMA node1 CPU(s): 24,28,32,36,40,44 NUMA node2 CPU(s): 1,5,9,13,17,21 NUMA node3 CPU(s): 25,29,33,37,41,45 NUMA node4 CPU(s): 2,6,10,14,18,22 NUMA node5 CPU(s): 26,30,34,38,42,46 NUMA node6 CPU(s): 3,7,11,15,19,23 NUMA node7 CPU(s): 27,31,35,39,43,47 Kernel people had a good discussion here: http://kerneltrap.org/mailarchive/linux-kernel/2010/8/12/4606365 I think it's fixed in libvirt already (BZ#836919). it may be a duplicate. no, it isn't. vdsm cpuCores is not coming from libvirt. Hi Dan, Since we don't have needinfo flag in gerrit.. Can you please share your thoughts about my comment in http://gerrit.ovirt.org/#/c/5481/ ? Thanks Douglas Moving the bugzilla to POST since we have a patch available. New patch from Dan available, review in progress: http://gerrit.ovirt.org/#/c/7097/ Hi, Just to share that Amador's patch resolves the wrong number of cores on multi node systems. Dan's patch resolves the count of threads as cores again. Review in progress for Amador's patch. Cheers Douglas For reference only, patch downstream: http://gerrit.usersys.redhat.com/#change,1397 Thanks Douglas the current patches seem to resolve ad-hoc config, which is relevant to intel hyperthreading. what about the amd multi node case which isn't about hyperthreading? > the current patches seem to resolve ad-hoc config, which is relevant to intel
> hyperthreading.
> what about the amd multi node case which isn't about hyperthreading?
Yes, it works. Amador shared few versions with the same behavior (patchset 1 is simple, others versions include libvirt approach and reading data from filesystem as well.
The proposed patch http://gerrit.ovirt.org/#/c/5481/ solves multi numa issue regardless vendor and ht availability. Hi Douglas, In order to go forth with this bz, checking libvirt 0.9 series, it has the same issue probing cpu topology as in vdsm. But 0.10 series is ok. Tested system: AMD Opteron(tm) Processor 6134 - 4 Sockets - 8 Cores per Socket - 1 Thread per Core - Total processors = 32 Instead of developing one CPU with 8 cores, this AMD Magny-Cours is actually two 4 core "Bulldozer" CPUs combined in to one "package". Each package has two NUMA nodes, and the two numa nodes share the same core ID set. So, the expected result in libvirt "nodeinfo" AND "capabilities" would be like this: - NUMA cells = 8 - Sockets per numa = 1 - Cores per socket = 4 - Threads per core = 1 - Total CPUS = 32 ---------------------------------------- # rpm -qa libvirt libvirt-0.9.11.5-3.fc17.x86_64 # virsh nodeinfo CPU model: x86_64 CPU(s): 32 CPU frequency: 2300 MHz CPU socket(s): 4 Core(s) per socket: 4 Thread(s) per core: 1 NUMA cell(s): 1 Memory size: 65964292 kB # virsh capabilities snip... <topology sockets='4' cores='4' threads='1'/> snip... <cells num='8'> snip... ---------------------------------------- As we can see, nodeinfo and capabilities are completely out of sync each other. But newer libvirt seems to solve the issue: ---------------------------------------- # rpm -qa libvirt libvirt-0.10.2-3.fc17.x86_64 # virsh nodeinfo CPU model: x86_64 CPU(s): 32 CPU frequency: 2300 MHz CPU socket(s): 1 Core(s) per socket: 4 Thread(s) per core: 1 NUMA cell(s): 8 Memory size: 65964292 KiB # virsh capabilities snip... <topology sockets='1' cores='4' threads='1'/> snip... <cells num='8'> snip... ---------------------------------------- Now libvirt is coherent, showing 8 NUMA cells (two per package), with 1 quad core socket each. As .spec upstream is requiring libvirt >= 0.10.1-1, I think we are now safe to use the libvirt API and solve this BZ. New Patch Set sent, now using libvirt "capabilities()", as pointed by http://www.redhat.com/archives/libvir-list/2010-November/msg01093.html Regards, Amador Pahim Hi Amador, > Hi Douglas, > > In order to go forth with this bz, checking libvirt 0.9 series, it has the > same issue probing cpu topology as in vdsm. But 0.10 series is ok. Right. As we talked previously, looks like libvirt folks improved a lot this CPU/NUMA area in the last versions. > As we can see, nodeinfo and capabilities are completely out of sync each > other. But newer libvirt seems to solve the issue: > > ---------------------------------------- > # rpm -qa libvirt > libvirt-0.10.2-3.fc17.x86_64 > > # virsh nodeinfo > CPU model: x86_64 > CPU(s): 32 > CPU frequency: 2300 MHz > CPU socket(s): 1 > Core(s) per socket: 4 > Thread(s) per core: 1 > NUMA cell(s): 8 > Memory size: 65964292 KiB > > # virsh capabilities > snip... > <topology sockets='1' cores='4' threads='1'/> > snip... > <cells num='8'> > snip... > ---------------------------------------- > > Now libvirt is coherent, showing 8 NUMA cells (two per package), with 1 quad > core socket each. That's make sense, thanks for sending the new version/tests. At this point, let's wait more people review your upstream patch. After that, if merged, will be required to handle that downstream with libvirt guys too. Cheers Douglas *** Bug 860507 has been marked as a duplicate of this bug. *** Merged upstream: http://gerrit.ovirt.org/gitweb?p=vdsm.git;a=commit;h=90b3392ccc88bf9c9537947def58ac3542911364 Hi, I think I might have reproduced the bug with: libvirt-0.9.10-21.el6_3.5.x86_64 vdsm change id: I1ebd7c424e03942d6a13fa1c993dec3e3e78c9ed Oct 11 10:00:49 dev-09 vdsm vds ERROR Exception raised (most recent call last): File "/usr/share/vdsm/vdsm", line 82, in run serve_clients(log)n File "/usr/share/vdsm/vdsm", line 50, in serve_clients cif = clientIF.getInstance(log) File "/usr/share/vdsm/clientIF.py", line 126, in getInstance cls._instance = clientIF(log) File "/usr/share/vdsm/clientIF.py", line 93, in __init__ caps.CpuTopology().cores()) File "/usr/share/vdsm/caps.py", line 88, in __init__ self._topology = _getCpuTopology(capabilities) File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 822, in __call__ value = self.func(*args) File "/usr/share/vdsm/caps.py", line 116, in _getCpuTopology 'sockets': int(cpu.getElementsByTagName('topology')[0]. IndexError: list index out of range Oct 11 10:00:53 dev-09 respawn: slave '/usr/share/vdsm/vdsm' died too quickly for more than 30 seconds, master sleeping for 900 seconds (In reply to comment #33) > Hi, I think I might have reproduced the bug with: Toni, it would be more exact to say that the patch introduced a regression on your hardware, that for some reason is missing a element like <topology sockets='1' cores='4' threads='2'/> in <capabilities><host><cpu>. we should either handle the missing element somehow, or require a newer libvirt that reports it for your hardware. Could you post some information about your /proc/cpuinfo (here, or in a libvirt bz)? Dan, I just sorted it out, the culprit was a missing: /usr/share/libvirt/cpu_map.xml reinstalling libvirt-client fixed the issue for me. *** Bug 866708 has been marked as a duplicate of this bug. *** Tested this on SI24.1 a host with 2 AMD Opteron(TM) Processor 6272 CPU's (8 cores each, supports hyperthreading). cpuinfo lists 32 CPU's as expected. I got those results on libvirt-0.9.10-21.el6_3.5: With report_host_threads_as_cores=false (default), vdsClient reported 64 CPU cores (instead of 16). With report_host_threads_as_cores=true, vdsClient reported 32 cores (as expected). However, when testing on libvirt-0.9.10-21.el6_3.6 (considering https://bugzilla.redhat.com/show_bug.cgi?id=869723), vdsClient reported 32 CPU cores on both cases. Seeing as values are still off, moving this back to ASSIGNED. The node info detection as of libvirt-0.9.10-21.el6_3.6 still doesn't work 100% OK on some machines (especially AMD Bulldozer that has "modules" that count both as cores _and_ threads. That throws off the detection so that we report the machine has 2x the number of CPUS. This issue is already fixed upstream: https://bugzilla.redhat.com/show_bug.cgi?id=874050 With the fix libvirt checks if the topology that is detected is compatible with the number of CPUs the system has. If it isn't (that happens on AMD Bulldozer) libvirt returns a compatibility fallback topology and the actual NUMA topology has to be determined from the capabilities XML. The above scratch build seems to have a regression in comparison to libvirt-0.9.10-21.el6_3.6.x86_64, tested on AMD Opteron(TM) Processor 6272. Original issue reproduced with libvirt-0.9.10-21.el6_3.6.x86_64, the amount of cpuCores reported by vdsm is: report_host_threads_as_cores = false - 32 (should be 16) report_host_threads_as_cores = true - 32 When using libvirt-0.9.10-21.el6_3.6bulldozer.x86_64, the amount of cpuCores reported by vdsm is: report_host_threads_as_cores = false - 128 (instead of 16!!!) report_host_threads_as_cores = true - 32 # virsh -r capabilities <capabilities> <host> <uuid>74cabcdd-43c9-44ec-8452-d50d61ff028a</uuid> <cpu> <arch>x86_64</arch> <model>Opteron_G4</model> <vendor>AMD</vendor> <topology sockets='1' cores='32' threads='1'/> <feature name='nodeid_msr'/> <feature name='wdt'/> <feature name='skinit'/> <feature name='ibs'/> <feature name='osvw'/> <feature name='cr8legacy'/> <feature name='extapic'/> <feature name='cmp_legacy'/> <feature name='fxsr_opt'/> <feature name='mmxext'/> <feature name='osxsave'/> <feature name='monitor'/> <feature name='ht'/> <feature name='vme'/> </cpu> <power_management> <suspend_disk/> </power_management> <migration_features> <live/> <uri_transports> <uri_transport>tcp</uri_transport> </uri_transports> </migration_features> <topology> <cells num='4'> <cell id='0'> <cpus num='8'> <cpu id='0'/> <cpu id='1'/> <cpu id='2'/> <cpu id='3'/> <cpu id='4'/> <cpu id='5'/> <cpu id='6'/> <cpu id='7'/> </cpus> </cell> <cell id='1'> <cpus num='8'> <cpu id='8'/> <cpu id='9'/> <cpu id='10'/> <cpu id='11'/> <cpu id='12'/> <cpu id='13'/> <cpu id='14'/> <cpu id='15'/> </cpus> </cell> <cell id='2'> <cpus num='8'> <cpu id='16'/> <cpu id='17'/> <cpu id='18'/> <cpu id='19'/> <cpu id='20'/> <cpu id='21'/> <cpu id='22'/> <cpu id='23'/> </cpus> </cell> <cell id='3'> <cpus num='8'> <cpu id='24'/> <cpu id='25'/> <cpu id='26'/> <cpu id='27'/> <cpu id='28'/> <cpu id='29'/> <cpu id='30'/> <cpu id='31'/> </cpus> </cell> </cells> </topology> </host> <guest> <os_type>hvm</os_type> <arch name='i686'> <wordsize>32</wordsize> <emulator>/usr/libexec/qemu-kvm</emulator> <machine>rhel6.3.0</machine> <machine canonical='rhel6.3.0'>pc</machine> <machine>rhel6.2.0</machine> <machine>rhel6.1.0</machine> <machine>rhel6.0.0</machine> <machine>rhel5.5.0</machine> <machine>rhel5.4.4</machine> <machine>rhel5.4.0</machine> <domain type='qemu'> </domain> <domain type='kvm'> <emulator>/usr/libexec/qemu-kvm</emulator> </domain> </arch> <features> <cpuselection/> <deviceboot/> <pae/> <nonpae/> <acpi default='on' toggle='yes'/> <apic default='on' toggle='no'/> </features> </guest> <guest> <os_type>hvm</os_type> <arch name='x86_64'> <wordsize>64</wordsize> <emulator>/usr/libexec/qemu-kvm</emulator> <machine>rhel6.3.0</machine> <machine canonical='rhel6.3.0'>pc</machine> <machine>rhel6.2.0</machine> <machine>rhel6.1.0</machine> <machine>rhel6.0.0</machine> <machine>rhel5.5.0</machine> <machine>rhel5.4.4</machine> <machine>rhel5.4.0</machine> <domain type='qemu'> </domain> <domain type='kvm'> <emulator>/usr/libexec/qemu-kvm</emulator> </domain> </arch> <features> <cpuselection/> <deviceboot/> <acpi default='on' toggle='yes'/> <apic default='on' toggle='no'/> </features> </guest> </capabilities> the customer is currently using this workaround: ps -L -C qemu-kvm | awk '/qemu-kvm/ {print $2}' | xargs -n1 taskset -c -p 0-7 This changes the affinity of every thread id belonging to qemu-kvm processes to use all 8 cores. Is this actually relevant to this BZ? Hi Peter, Could you please check Rami's comment #51? https://bugzilla.redhat.com/show_bug.cgi?id=833425#c51 Should we share the comment into https://bugzilla.redhat.com/show_bug.cgi?id=877024 as well? Thanks Douglas In case of AMD Magny Cours (AMD Piledriver) the output of nodeinfo/capabilities is correct. The problem arises on AMD Bulldozer and it's new core/thread topology where one Bulldozer "module" is reported as both separate cores and separate threads. This confuses libvirt so that the output of the nodeinfo (previous to my patch) was actually twice the number of processors (when counted by multiplying all nodeinfo fields). I fixed this for Bulldozer and many other strange architectures so that it reports a synthetic (... or maybe it would be better to call it "made up") topology that has 1 NUMA node, 1 socket, 1 thread and the number of cores equals to the number of CPUs in the host. This is done as a final solution to every possible NUMA machine that we might come across (hopefully). In this case, the only sane way how to detect the actual topology is to use output of capabilities where the CPUs are grouped into NUMA nodes by the actual topology. Unfortunately due to historic reasons we can't actually change nodeinfo to report better results. The ouput of capabilities contains valuable information that can be used to assign CPUs to guests in a way that won't hurt performance. This kind of information cannot be acquired from nodeinfo. Nodeinfo isn't really useful with the modern machines. The data shown in comment#51 are the result of the synthetic topology reported on the AMD Bulldozer machine. The machine (previously to that patch) would report a topology of 4 nodes, 1 socket (per node), 8 cores (per socket) and 2 threads (per core). Multiplying the result would yield a result of 64 that wouldn't be correct. The corrected topology on the other hand is 1 node, 1 socket, 32 cores and 1 thread yielding the correct result of 32 cpus. Hi Peter, Thanks a lot for the clarification, we have two different reports here one using AMD processor 61XX family which describe this bugzilla (tests in comment #54) and other bug about AMD Bulldozer (comment #51). I do believe we should move this tests about AMD bulldozer to https://bugzilla.redhat.com/show_bug.cgi?id=877024. Agreed? Thanks Douglas Hi, Just to clarify this long bugzilla, comments #42 and #51 are tests based on processor from 62XX family (Bulldozer) [1]. The original report from Amador is based on 61XX family (Magny-Cours), from comment #54 shows that we fixed the original report.I have changed the bugzilla subject to track down both cases. Peter, could you please give additional info about the comment in #51? Output capabilities #51 shows ================================================ <topology sockets='1' cores='32' threads='1'/> ... <cells num='4'> ================================================== Shouldn't be: ================================================= <topology sockets='1' cores='4' threads='2'/> ... <cells num='4'> ================================================== Where: - cell numa = 4 - sockets per numa = 1 - cores per numa = 4 - threads = 2 (1 physical core + 1 thread) 4 Numa * 1 Socket per Numa * 4 cores per Numa * 2 Threads = 32 Based on the above statement, here how VDSM shows report_host_threads_as_cores: If disabled, calculates cells * sockets * core = 4 * 1 * 4 = 16. On the other hand, if enabled it takes: cells.getElementsByTagName('cpu').length Thanks Douglas [1] http://www.amd.com/us/press-releases/Pages/amd-opteron-6200-series-processor-family-wins-2012jan23.aspx We are used to have one or more sockets inside one NUMA cell. But according with [1] (and confirmed by /sys fs), AMD 6200 series has two NUMA cells inside the same Socket. As libvirt shows the Socket per NUMA instead of total sockets, the count here will be difficult. I think to have libvirt showing 0,5 sockets per NUMA is not reasonable. So, the statement on #57 seems the best way to represent 6200 topology: <topology sockets='1' cores='4' threads='2'/> ... <cells num='4'> I drew an image (attached) based on [1], [2] and "lscpu" info to clarify 6272 architecture. [1] - http://www.redhat.com/archives/libvir-list/2012-May/msg00663.html [2] - http://en.wikipedia.org/wiki/Bulldozer_(microarchitecture) Created attachment 657675 [details]
6272 arch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-1508.html The picture is correct, although each of the "threads" on the picture has separate core_IDs, so they technically count also as cores. From a management app point of view, the physical topology of the machine is irrelevant. What counts is the NUMA topology as that is what limits the memory bandwidth. Guests should be scheduled on cores within one NUMA node. The nodeinfo output in libvirt is limited due to historic reasons and it's basicaly usable just for determining the maximum number of CPUS in a system. Hi Peter, Thanks for your feedback. From comment #62: > The picture is correct, although each of the "threads" on the picture has > separate core_IDs, so they technically count also as cores. Understood, it will be like: <topology sockets='1' cores='32' threads='1'/> It's ok, but we have others system resources sharing a different output: /proc/cpuinfo we have the split: amd-dinar-07.lab.bos.redhat.com (Bulldozer machine): ========================================================== <snip> cpu cores : 8 (number of cores per CPU package) siblings : 16 (HT per CPU package) * (number of cores per CPU package) </snip> Socket: ===================== # cat /proc/cpuinfo | grep "physical id" | sort | uniq | wc -l 2 also from lscpu: ========================= <snip> Thread(s) per core: 2 (core + thread) Core(s) per socket: 8 CPU socket(s): 2 On-line CPU(s) list: 0-31 NUMA node(s): 4 NUMA node0 CPU(s): 0-7 NUMA node1 CPU(s): 8-15 NUMA node2 CPU(s): 16-23 NUMA node3 CPU(s): 24-31 </snip> From comment #62: > output in libvirt is limited due to historic reasons and it's basicaly usable > just for determining the maximum number of CPUS in a system. To avoid change the historic output from libvirt, what about add into the xml output the 'total CPU sockets' like lscpu and /pro/cpuinfo does and leave the current libvirt as: <topology sockets='1' cores='8' threads='2'/> (as upstream libvirt-1.0.0 too) ... <cells num='4'> ================================================== Would be like: <topology totalsockets='2' sockets='1' cores='8' threads='2'/> ... <cells num='4'> ================================================== This will show the totalSockets = 2 and sockets per numa = 1 (as libvirt already shows) Just to clarify our needs, vdsm gets the total sockets and total cores (without threads) from libvirt. To this system for example, we are looking for a way to have 2 sockets and 16 cores total from libvirt. With that, we would report in vdsm the field 'report_host_threads_as_cores' as: if enabled: ================ (cores = 8) * (threads = 2) * (new_libvirt_field_total_sockets = 2) = 32 total cores if disabled: =================== (cores = 8) * (new_libvirt_field_total_sockets = 2) = 16 total cores Thanks |