+++ This bug was initially created as a clone of Bug #714249 +++ Description of problem: matahari host properties cpu count wrong Version-Release number of selected component (if applicable): [root@hp-dl360g5-02 matahari]# rpm -qa | grep matahari matahari-lib-0.4.0-5.el6.x86_64 matahari-service-0.4.0-5.el6.x86_64 matahari-host-0.4.0-5.el6.x86_64 matahari-net-0.4.0-5.el6.x86_64 matahari-broker-0.4.0-5.el6.x86_64 matahari-agent-lib-0.4.0-5.el6.x86_64 matahari-0.4.0-5.el6.x86_64 [root@hp-dl360g5-02 matahari]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.1 (Santiago) [root@hp-dl360g5-02 matahari]# How reproducible: 100% >>> host1.getProperties() (hostname, 'hp-z200-07.rhts.eng.bos.redhat.com'), (cpu_count, 4), (cpu_cores, 16), (cpu_model, 'Core(TM) i3 CPU 540 @ 3.07GHz'), [root@hp-z200-07 init.d]# cat /proc/cpuinfo | egrep 'processor|cores' processor : 0 cpu cores : 2 processor : 1 cpu cores : 2 processor : 2 4 cpus x 2 cores = 8, not 16 as above cpu cores : 2 processor : 3 cpu cores : 2 [root@hp-z200-07 init.d]# >>> host2.getProperties() (hostname, 'hp-dl360g5-02.rhts.eng.bos.redhat.com'), (cpu_count, 8), (cpu_cores, 64), (cpu_model, 'Xeon'), [root@hp-dl360g5-02 matahari]# cat /proc/cpuinfo | egrep 'processor|cores' processor : 0 cpu cores : 2 processor : 1 cpu cores : 2 processor : 2 cpu cores : 2 processor : 3 cpu cores : 2 8 cpus x 2 cores = 16 cores, not 64 as above processor : 4 cpu cores : 2 processor : 5 cpu cores : 2 processor : 6 cpu cores : 2 processor : 7 cpu cores : 2 [root@hp-dl360g5-02 matahari]# >>> host3.getProperties() (hostname, 'dell-pe2850-01.rhts.eng.bos.redhat.com'), (cpu_count, 4), (cpu_cores, 16), (cpu_model, 'Xeon'), [root@dell-pe2850-01 ~]# cat /proc/cpuinfo | egrep 'processor|cores' processor : 0 cpu cores : 1 processor : 1 cpu cores : 1 processor : 2 cpu cores : 1 processor : 3 cpu cores : 1 --- Additional comment from pm-rhel on 2011-06-17 13:22:37 EDT --- Since this issue was entered in bugzilla, the release flag has been set to ? to ensure that it is properly evaluated for this release. --- Additional comment from abeekhof on 2011-07-07 23:44:36 EDT --- The sigar API appears to store "the total number of cores in the system" in a "per cpu" struct. Highly intuitive. The correct logic would appear to be: diff --git a/src/lib/host.c b/src/lib/host.c index 311149f..2259e2d 100644 --- a/src/lib/host.c +++ b/src/lib/host.c @@ -244,12 +244,10 @@ host_get_cpu_details(void) sigar_cpu_info_list_get(host_init.sigar, &cpus); cpuinfo.cpus = cpus.number; - for (lpc = 0; lpc < cpus.number; lpc++) { + if(cpus.number > 0) { sigar_cpu_info_t *cpu = (sigar_cpu_info_t *) cpus.data; - if (cpuinfo.model == NULL) { - cpuinfo.model = g_strdup(cpu->model); - } - cpuinfo.cores += cpu->total_cores; + cpuinfo.model = g_strdup(cpu->model); + cpuinfo.cores = cpu->total_cores; } sigar_cpu_info_list_destroy(host_init.sigar, &cpus); Adam: can you test and confirm please? --- Additional comment from astokes on 2011-07-12 10:24:02 EDT --- Here is test output from my machine with applied patch: akimbo, /tmp/matahari $ cat /proc/cpuinfo | egrep 'processor|cores' processor : 0 cpu cores : 2 processor : 1 cpu cores : 2 processor : 2 cpu cores : 2 processor : 3 cpu cores : 2 So this looks like 4 cpus x 2 = 8 cores My test output is as follows In MhApiHostSuite::testCpuCount: mh_api_host.h:111: Trace: "Verify cpu count: 4" . In MhApiHostSuite::testCpuNumberOfCores: /mh_api_host.h:119: Trace: "Verify cpu num of cores: 4" . In MhApiHostSuite::testCpuWordSize: mh_api_host.h:127: Trace: "Verify cpu wordsize: 64" So our cpu cores output still seems to be incorrect --- Additional comment from abeekhof on 2011-07-13 23:25:34 EDT --- Adam, you might need to dig into sigar, perhaps their API is misbehaving. --- Additional comment from astokes on 2011-08-09 17:01:15 EDT --- Opened a thread here: http://forums.hyperic.com/jiveforums/thread.jspa?threadID=42917 --- Additional comment from zbitter on 2011-08-11 07:02:49 EDT --- OK, actually this is pretty simple and the problem here is hyperthreads. There are 3 different levels of things we're counting: 1) Sockets (sockets) -> actually pretty irrelevant; we don't really care how the cores are distributed between sockets. 2) Cores (cores) -> cpu_cores 3) Hyperthreads (processors) -> cpu_count Remember, to the kernel a "processor" is just "a thing that will run stuff for me" and that means a hyperthread, not a physical core. So, in the examples given: >>> host1.getProperties() (hostname, 'hp-z200-07.rhts.eng.bos.redhat.com'), (cpu_count, 4), (cpu_cores, 16), (cpu_model, 'Core(TM) i3 CPU 540 @ 3.07GHz'), i3 is a dual-core CPU with 2 hyperthreads per core. Dave suggested that cpu_cores should be 8, not 16; actually it should be 2. >>> host2.getProperties() (hostname, 'hp-dl360g5-02.rhts.eng.bos.redhat.com'), (cpu_count, 8), (cpu_cores, 64), (cpu_model, 'Xeon'), This is a Xeon box with 2 sockets, each containing a dual-core CPU, each core having 2 hyperthreads. So the correct cpu_cores is 4, not 16 or 64. >>> host3.getProperties() (hostname, 'dell-pe2850-01.rhts.eng.bos.redhat.com'), (cpu_count, 4), (cpu_cores, 16), (cpu_model, 'Xeon'), And this is 2 single-core Xeons with 2 hyperthreads each. So the correct cpu_cores here is 2. Finally, it looks like Adam has a dual-core CPU with 2 hyperthreads per core, so it should be reporting 2 cores, not 4 or 8. Although a lot of this is unintuitive, I should mention that sigar is being completely consistent with the kernel here; the output is exactly the same as what you see in /proc/cpuinfo. (Having said that, the lscpu output is generally more helpful.) --- Additional comment from zbitter on 2011-08-11 08:14:48 EDT --- Actually, I take that last bit back. On my machine with 1 socket 4 cores 8 processors sigar reports: 8 cpus 8 total_sockets 16 cores_per_socket 8 total_cores Report 8 *sockets* is completely insane, and there's no obvious way of telling the actual number of cores. I'll have to dig into their code and find out what they're doing. --- Additional comment from zbitter on 2011-08-11 09:43:28 EDT --- The sigar implementation was presumably written by someone with an AMD (or otherwise non-hyperthreaded) box. total_cores = sysconf(_SC_NPROCESSORS) That's wrong, because it's giving the number of processors, not the number of cores. Clearly. It's right there in the name. cores_per_socket = logical processor count from cpuid instruction According to AMD, "LogicalProcessorCount is the number of cores per processor." Unfortunately that's only true for AMD, since they don't have hyperthreading. In the presence of hyperthreading, LogicalProcessorCount is actually the logical processor count. (Funnily enough, that one is right there in the name too.) How it's coming up with 16 on my quad-core/8-thread box is something of a mystery. total_sockets = (total_cores < cores_per_socket) ? total_cores : total_cores / cores_per_socket Which would actually be correct, were all the input data not complete rubbish. Options: 1) Write it ourselves 2) Report something other than the number of cores as the number of cores 3) Fix sigar All of which are horrible. --- Additional comment from zbitter on 2011-08-12 13:30:12 EDT --- Created attachment 518067 [details] sigar (trunk) patch Attaching a (completely untested) patch for sigar (trunk). I will attempt to test this next week, assuming I can find a way to get sigar compiling on Fedora. This is relying on /proc/cpuinfo for all of its information, since the kernel appears to be the only reliable source (even the hardware CPUINFO instruction lies, apparently). However, as a result, this may well not work with older versions of the kernel. Even if it does, verifying it on every kernel on every imaginable piece of hardware will be a nightmare. So it may be very difficult to get this patch upstream. Backporting to sigar 1.6 is also not straightforward. --- Additional comment from zbitter on 2011-08-12 13:34:27 EDT --- I've posted a patch very similar to Andrew's (above) to the mailing list. This at least will get us reporting the number of logical cpus as the number of cores (rather than the square of that number). I've also attached an as-yet untested patch to get sigar to tell us the actual number of cores (see previous comment). --- Additional comment from abeekhof on 2011-08-15 23:53:06 EDT --- (In reply to comment #6) > So, in the examples given: > > >>> host1.getProperties() > (hostname, 'hp-z200-07.rhts.eng.bos.redhat.com'), > (cpu_count, 4), > (cpu_cores, 16), > (cpu_model, 'Core(TM) i3 CPU 540 @ 3.07GHz'), > > i3 is a dual-core CPU with 2 hyperthreads per core. Dave suggested that > cpu_cores should be 8, not 16; actually it should be 2. I think we want to report the total number of cores rather than the cores/cpu metric that sigar is giving us (if that means we have to do the multiplication ourselves, I'm fine with that). So I'd expect 8 in this case too. --- Additional comment from errata-xmlrpc on 2011-08-16 01:45:11 EDT --- Bug report changed to ON_QA status by Errata System. A QE request has been submitted for advisory RHBA-2011:11386-01 http://errata.devel.redhat.com/errata/show/11386 --- Additional comment from zbitter on 2011-08-16 04:04:13 EDT --- (In reply to comment #11) > (In reply to comment #6) > > So, in the examples given: > > > > >>> host1.getProperties() > > (hostname, 'hp-z200-07.rhts.eng.bos.redhat.com'), > > (cpu_count, 4), > > (cpu_cores, 16), > > (cpu_model, 'Core(TM) i3 CPU 540 @ 3.07GHz'), > > > > i3 is a dual-core CPU with 2 hyperthreads per core. Dave suggested that > > cpu_cores should be 8, not 16; actually it should be 2. > > I think we want to report the total number of cores rather than the cores/cpu > metric that sigar is giving us (if that means we have to do the multiplication > ourselves, I'm fine with that). > > So I'd expect 8 in this case too. The total number of cores is 2. There is only one socket. If you want to count threads instead of cores, there's 4 (but note that the cpu_count is already reporting that). There is in no way, shape or form 8 of anything in this box. --- Additional comment from zbitter on 2011-08-16 04:24:52 EDT --- I am confused as to how this is ON_QA when there are no patches upstream or in distcvs to address it? --- Additional comment from dajohnso on 2011-08-17 10:49:57 EDT --- Still seeing this as an issue in v0.4.2-6 --- Additional comment from zbitter on 2011-08-17 11:51:47 EDT --- I have now delivered the matahari patch I sent to the ML to matahari upstream. (However it is not in the v0.4.2-6 build.) I have also posted a tested version of the sigar patch to the Sigar development forum, as a reply to the thread that Adam started: http://forums.hyperic.com/jiveforums/thread.jspa?threadID=42917 A version that will apply cleanly to the current version of sigar in Fedora is here: https://github.com/zaneb/sigar/commit/cae27197baf656ae0d7734913d8714995a787ee4 --- Additional comment from rbryant on 2011-09-02 09:55:33 EDT --- The matahari patch is included in the latest build, but I'm moving this back to ASSIGNED since we should probably hold off until the sigar patch is available in a build as well. --- Additional comment from errata-xmlrpc on 2011-09-07 16:57:40 EDT --- Bug report changed from MODIFIED to ON_QA status by the Errata System: Advisory RHBA-2011:11386-01: http://errata.devel.redhat.com/errata/stateview/11386 --- Additional comment from dajohnso on 2011-09-09 14:42:00 EDT --- Still looks like this is broke. I did find and install sigar-1.6.5-0.3 which had the following comment * Wed Sep 07 2011 Zane Bitter <zbitter> - 1.6.5-0.3.git58097d9 - Get CPU counts from /proc/cpuinfo Resolves: #714249 host.properties(): ------------------ hostname dell-pem600-01.rhts.eng.bos.redhat.com last_updated 1315593329000000000 free_swap 6160376 sequence 7 free_mem 2913080 os Linux (2.6.32-193.el6.x86_64) process_statistics {'running': 1, 'sleeping': 232, 'zombie': 0, 'idle': 0, 'stopped': 0, 'total': 233} cpu_count 8 cpu_model Xeon cpu_cores 8 <- should be 8 x 4 = 32 wordsize 64 swap 6160376 custom_uuid U04YE1SC9Q7CFJUPOQXF memory 3921168 update_interval 5 cpu_flags fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm dts tpr_shadow vnmi flexpriority arch x86_64 uuid ce76da54cbe7c1d9ed1d44ce00000023 qmf: quit [root@dell-pem600-01 ~]# cat /proc/cpuinfo | egrep 'processor|cores' processor : 0 cpu cores : 4 processor : 1 cpu cores : 4 processor : 2 cpu cores : 4 processor : 3 cpu cores : 4 processor : 4 cpu cores : 4 processor : 5 cpu cores : 4 processor : 6 cpu cores : 4 processor : 7 cpu cores : 4 [root@dell-pem600-01 ~]# rpm -qa | egrep 'matahari|sigar' | sort matahari-0.4.4-2.el6.x86_64 matahari-agent-lib-0.4.4-2.el6.x86_64 matahari-broker-0.4.4-2.el6.x86_64 matahari-consoles-0.4.4-2.el6.x86_64 matahari-debuginfo-0.4.4-2.el6.x86_64 matahari-host-0.4.4-2.el6.x86_64 matahari-lib-0.4.4-2.el6.x86_64 matahari-network-0.4.4-2.el6.x86_64 matahari-service-0.4.4-2.el6.x86_64 matahari-sysconfig-0.4.4-2.el6.x86_64 sigar-1.6.5-0.3.git58097d9.el6.x86_64 --- Additional comment from zbitter on 2011-09-14 09:10:18 EDT --- That is correct behaviour. That box does not have 32 cores; it has 2 quad-core sockets. 2 x 4 = 8.
sigar-1.6.5-0.5.git58097d9.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/sigar-1.6.5-0.5.git58097d9.fc16
Package sigar-1.6.5-0.5.git58097d9.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing sigar-1.6.5-0.5.git58097d9.fc16' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2011-14706 then log in and leave karma (feedback).
sigar-1.6.5-0.5.git58097d9.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.