Description of problem: This is on Fedora Rawhide aarch64. The hardware is the APM Mustang. # lscpu Architecture: aarch64 Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 1 Core(s) per socket: 2 Socket(s): 4 This system is definitely not 4 sockets :-) It has a single system on chip (SoC), ie. 1 socket, with 8 real cores (no threads). Version-Release number of selected component (if applicable): kernel 4.0.7-300.fc22.aarch64 util-linux-2.26.2-3.fc24.aarch64
With kernel 4.2.0-0.rc2.git0.1.fc23.aarch64 the output is the same: # lscpu Architecture: aarch64 Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 1 Core(s) per socket: 2 Socket(s): 4 I'm attaching the dmesg output in case that is useful.
Created attachment 1063171 [details] dmesg
Currently, the report by lspu, which is derived from the cpu topology under sysfs, is comprised from two possible sources: -- DT entries, which are often not filled out; -- but more of it is getting filled out for NUMA-based machines/SoCs, b/c correct cpu tolopolgy is critical for the kernel numa code to DTRT. -- note: ACPI-based NUMA specification, SRAT, solves most of the issue except for the next field & its correlation to SRAT-cpu-values. -- MPIDR Aff<x> fields. On ARMv8, unlike ARMv7, MPIDR contents are *implementation specific* -- an architected register w/un-architected contents. duh! ARMv7 had a simple spec that lower 8-bits(Aff0) were cores, next 8 bits(Aff1) were 'clusters' that shared L2 caches; I think the Aff3 field was for shared L3. But since 99.999999% of ARMv7's are single socket, relative small core counts, it was easy to make the MPIDR register go from unspecified to 'we all agree to do it this way' ... b/c they happen to do it that way. On ARMv8, the upstream topology code in interpreting Aff1 as the socket count, when in fact, it is a 'cluster' index, indicating shared caches. If you run the same command on a RHELSA kernel, you will find proper values, because RHELSA is carrying a patch that works for Xgene & Seattle. IIRC, that patch will fail for ThunderX, and will need further modification. jcm has a to-do on this issue w/ARM & to come up with an ACPI-based solution that will yield the equivalent of x86's cpu-id -based, cpu topology information. [read the arch/x86/kernel/topology.c code to learn the details, and search the Intel tech pages for a paper on their topology architecture wrt cpu-id probing. The DT-based solution is TBD, and the DT-huggers can resolve it with yet-another-specification-to-cpu-DT. Until ARM & it's partners determine/specify/require an architected cpu topology specification, this problem becomes a per-SoC, 'quirk' to the arch/arm64/kernel/topology.c support.
So if I read correctly Don's comment #3 then the problem is mostly in stuff exported by kernel in /sys and for patched kernels (e.g. RHELSA) lscpu generates valid output. Right? ... then it's time to reassign to kernel :-) Richard, it would be nice to have /sys and /proc dump from the system, in upstream tree we have a script https://raw.githubusercontent.com/karelzak/util-linux/master/tests/ts/lscpu/mk-input.sh to create a tarball. Please, send me (or add to BZ) the tarball.
Created attachment 1068857 [details] proc.tar.gz
Created attachment 1068858 [details] sys.tar.gz
Created attachment 1068859 [details] mustang.tar.gz I should have read the whole comment before posting those previous attachments. Attached is the output from the command sudo ./mk-input.sh mustang
This is an old one, and while not fixed on the mustang, should be fixed on any aarch64/ACPI machine with a correct PPTT. It maybe worth closing this.
Thanks. I'm going to close this then. I think the best approach would be to reopen for specific machines that don't have a correct PPTT is problems still show up.