Bug 472824

Summary: sysfs doesn't export CPU cache info for some CPUs
Product: Red Hat Enterprise Linux 5 Reporter: Chris Snook <csnook>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED WONTFIX QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: low Docs Contact:
Priority: low    
Version: 5.3CC: csnook, philip
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-02 13:07:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chris Snook 2008-11-24 21:38:06 UTC
Description of problem:
Since the format of /proc/cpuinfo varies greatly between architectures, /sys/devices/system/cpu/cpu0/cache is the preferred method for userspace applications to get information about the CPU cache.  On (at least my) netburst-based Xeon system, this sysfs directory does not exist, making it more difficult for applications to optimize their data structures for the CPU.

Version-Release number of selected component (if applicable):
kernel-2.6.18-124.el5.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Install RHEL 5.3 Server beta on a system with dual 2.8 GHz Prescott Xeons.
2. ls /sys/devices/system/cpu/cpu0/cache
  
Actual results:
ls: /sys/devices/system/cpu/cpu0/cache: No such file or directory

Expected results:
index0  index1  index2

Additional info:
Might be due to the screwy clflush/cache_alignment disagreement on some netburst processors, such as this one.  I do have Adjacent Cache Line prefetch enabled in the BIOS.  Since the system is dual-socket/single-core/hyperthreaded, I get four entries like this in /proc/cpuinfo:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      :                   Intel(R) Xeon(TM) CPU 2.80GHz
stepping        : 3
cpu MHz         : 2793.262
cache size      : 2048 KB
physical id     : 3
siblings        : 2
core id         : 0
cpu cores       : 1
apicid          : 7
fpu             : yes
fpu_exception   : yes
cpuid level     : 3
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl cid cx16 xtpr
bogomips        : 5586.29
clflush size    : 64
cache_alignment : 128
address sizes   : 36 bits physical, 48 bits virtual
power management:

Comment 1 Chris Snook 2008-11-25 21:12:05 UTC
Just saw this again inside a KVM guest running on a Core 2 Duo E6850.  Baremetal sees the sysfs file, both in F10 and RHEL 5.3 beta.

Comment 2 Philip Withnall 2009-07-24 15:02:05 UTC
Investigation into brc#511278
=============================

The /sys/devices/system/cpu/cpuX/cache/ sysfs directories are created in arch/i386/kernel/cpu/intel_cacheinfo.c (for both i386 and x86_64).
The addition of the cache information for this processor is either failing because (num_cache_leaves == 0), or failing somewhere in cache_add_dev(), since it
can be assumed that once register_hotcpu_notifier() is called, cacheinfo_cpu_callback() is correctly called as appropriate.

The case that (num_cache_leaves == 0) is only possible if the CPUID[1] instruction is failing on the processor. This is the most likely problem, since there are
other reports[2] of odd behaviour of the CPUID instruction on Xeon processors. This can be checked for by using the cpuid program in userspace, and noting its output.
No other functions or esoteric instructions are called to initialise num_cache_leaves:

	do {
		++i;
		/* Do cpuid(4) loop to find out num_cache_leaves */
		cpuid_count(4, i, &eax, &ebx, &ecx, &edx);
		cache_eax.full = eax;
	} while (cache_eax.split.type != CACHE_TYPE_NULL);

It is possible that adding the cache information is failing in cache_add_dev(). The only processor-dependent failure point here is the call to cpuid4_cache_sysfs_init(),
which results in a call to detect_cache_attributes(). Here, either the set_cpus_allowed() call is failing (unlikely), or cpuid4_cache_lookup() is failing. This brings
us round to the same conclusion: that the CPUID instruction, as issued by Linux, doesn't work on this particular model of processor.

Digging into the exact CPUID call in cpuid_count() in include/asm-i386/processor.h, we have the following assembly:

	__asm__("cpuid"
		: "=a" (*eax),
		  "=b" (*ebx),
		  "=c" (*ecx),
		  "=d" (*edx)
		: "0" (op), "c" (count));

The calls in intel_cacheinfo.c are the only ones in the kernel which set ecx to a non-zero value before issuing the CPUID instruction. Perhaps this is what's causing
the problem? The information displayed in /proc/cpuinfo is returned by a call to CPUID with ecx=0, and that appears to work fine.

More analysis would be possible if the cpuid program could be run on the system and its output provided.

There is no indication that the problem's caused by the mismatch between clflush size and cache alignment.

[1]: http://en.wikipedia.org/wiki/CPUID
[2]: http://bugzilla.kernel.org/show_bug.cgi?id=11074

Comment 3 RHEL Program Management 2014-03-07 13:38:49 UTC
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.

Comment 4 RHEL Program Management 2014-06-02 13:07:12 UTC
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).

Comment 5 Red Hat Bugzilla 2023-09-14 01:14:23 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days