Bug 2221836 - [ThinkEdge SE10 RHEL9.2GA] run profiler_hardware_uncore cert fail
Summary: [ThinkEdge SE10 RHEL9.2GA] run profiler_hardware_uncore cert fail
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Certification Program
Classification: Red Hat
Component: redhat-certification
Version: 1.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Jianwei Weng
QA Contact: rhcert qe
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-11 01:31 UTC by Kean
Modified: 2023-07-12 04:17 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-07-12 04:17:58 UTC
Target Upstream Version:
Embargoed:
asaswadk: needinfo-


Attachments (Terms of Use)
report (10.52 MB, application/xml)
2023-07-11 01:31 UTC, Kean
no flags Details
sosreport files (13.12 MB, application/x-xz)
2023-07-11 01:32 UTC, Kean
no flags Details

Description Kean 2023-07-11 01:31:08 UTC
Created attachment 1975047 [details]
report

[steps reproduce]
 
1、Install RHEL9.2GA system
2、Enter OS
3、Install dpRHEL9 and ts8.61
4、execute rhcert-cli run --test=profiler_hardware_uncore

Failure rate: 100%

[expected result]

profiler_hardware_uncore can cert successfull.

[actual result]
execute rhcert-cli run --test=profiler_hardware_uncore,find not this plan,when add it,it will cert fail
 

[addtional infromation]
Wireless:MEDIATEK Corp . MT7921 802.11ax
Networking: Intel I225-V (rev 03)
CPU:Intel Atom(R) x6425RE Processor @ 1.90GHz 
Video:  Intel UHD Graphic Gen11 32EU
Memory: 32GB
Bios: M4XKT0CA
OS:RedHat 9.2GA

Comment 1 Kean 2023-07-11 01:32:16 UTC
Created attachment 1975048 [details]
sosreport files

Comment 2 Kean 2023-07-11 01:34:41 UTC
Hi, 

May I know why there is no test option 'profiler_hardware_uncore' in the certification plan, when we add it and test it, we got a FAIL result, but others platform has this testing option, please check and review the logs.

Thank.

Comment 3 Kean 2023-07-11 05:41:04 UTC
Hi,

Does it the same situation with https://bugzilla.redhat.com/show_bug.cgi?id=2181947?

I got the same result:

[root@localhost ~]# lsmod | grep -i uncore
[root@localhost ~]# modprobe intel_uncore
modprobe: ERROR: could not insert 'intel_uncore': No such device
[root@localhost ~]# perf list | grep -P '^\s{1,3}unc' | awk '{print $1}'
[root@localhost ~]# 

[testing@localhost ~]$ uname -a
Linux localhost.localdomain 5.14.0-284.18.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Wed May 31 10:39:18 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux

If it is the same situation, what we should do for SE10 certification? Pass it or other?

Thanks.

Comment 4 asaswadk 2023-07-11 08:36:33 UTC
Hi Kean,

I think the uncore events are not supported in the CPU "Intel Atom(R) x6425RE Processor" which is based on Tremont microarchitecture having codename Elkhart(EHL). Also, I did a little digging and found this https://github.com/intel/perfmon/tree/main/EHL/events

Have you tried executing the listed 3 commands on Fedora, just like you did in https://bugzilla.redhat.com/show_bug.cgi?id=2181947?

Thanks,
Akshay

Comment 5 Kean 2023-07-11 09:19:35 UTC
Hi asaswadk,

No, I wasn't exec those commands on Fedora38 due to the device being away from me, we will carry all devices to a new office, so maybe will get testing on next week as quickly as possible.

Thanks. I also believe the Atom does not support the uncore events, so in this case, does the testing option become not necessary testing? And not be a blocker of cert of SE10?

Thanks.

Comment 6 ltao 2023-07-11 10:11:48 UTC
I just take a little dive into the upstream kernel code. 

This is the output of boot_cpu_data structure on "Intel Atom(R) x6425RE Processor":
crash> p boot_cpu_data
boot_cpu_data = $1 = {
  x86 = 6 '\006', 
  x86_vendor = 0 '\000', 
  x86_model = 150 '\226', ====> 0x96
  x86_stepping = 1 '\001', 
  x86_tlbsize = 0,

According to https://elixir.bootlin.com/linux/v6.5-rc1/source/arch/x86/include/asm/intel-family.h, the cpu module 0x96 is INTEL_FAM6_ATOM_TREMONT.

And in arch/x86/events/intel/uncore.c:intel_uncore_init -> x86_match_cpu, which will check cpu module against intel_uncore_match[] table. And there is no INTEL_FAM6_ATOM_TREMONT in the table, only INTEL_FAM6_ATOM_TREMONT_D.

So from the code view, the INTEL_FAM6_ATOM_TREMONT, or the "Intel Atom(R) x6425RE Processor" doesn't support intel uncore, the intel_uncore module will not get initialized successfully.

Thanks,
Tao Liu

Comment 7 ltao 2023-07-11 10:13:55 UTC
(In reply to ltao from comment #6)
> I just take a little dive into the upstream kernel code. 
> 
> This is the output of boot_cpu_data structure on "Intel Atom(R) x6425RE
> Processor":
> crash> p boot_cpu_data
> boot_cpu_data = $1 = {
>   x86 = 6 '\006', 
>   x86_vendor = 0 '\000', 
>   x86_model = 150 '\226', ====> 0x96
>   x86_stepping = 1 '\001', 
>   x86_tlbsize = 0,
> 
> According to
> https://elixir.bootlin.com/linux/v6.5-rc1/source/arch/x86/include/asm/intel-
> family.h, the cpu module 0x96 is INTEL_FAM6_ATOM_TREMONT.
> 
> And in arch/x86/events/intel/uncore.c:intel_uncore_init -> x86_match_cpu,
> which will check cpu module against intel_uncore_match[] table. And there is
> no INTEL_FAM6_ATOM_TREMONT in the table, only INTEL_FAM6_ATOM_TREMONT_D.
> 
> So from the code view, the INTEL_FAM6_ATOM_TREMONT, or the "Intel Atom(R)
> x6425RE Processor" doesn't support intel uncore, the intel_uncore module
> will not get initialized successfully.
> 
> Thanks,
> Tao Liu

s/cpu module/cpu model/g

Comment 8 asaswadk 2023-07-11 11:43:50 UTC
Hi Tao,

Thank you for the information & confirming the "Intel Atom(R) x6425RE Processor" doesn't support uncore.


Hi @renhai2,

In this case, where the CPU doesn't support uncore, the uncore test won't be planned and is not required to be tested, i.e. not a blocker.

Thanks,
Akshay

Comment 9 Kean 2023-07-12 02:46:55 UTC
Thanks @ltao, @asaswadk. 
Let close it.
Kean


Note You need to log in before you can comment on or make changes to this bug.