Bug 1746961

Summary: New tuned profile requested for Marvell ThunderX*-based platforms
Product: Red Hat Enterprise Linux 8 Reporter: Peter Rival <frival>
Component: tunedAssignee: Jaroslav Škarvada <jskarvad>
Status: CLOSED ERRATA QA Contact: Robin Hack <rhack>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 8.2CC: aokuliar, atomasov, bgray, ctatman, dshaks, jeder, jhladky, jskarvad, natashba, osabart, rhack
Target Milestone: rcKeywords: Patch, Triaged, Upstream
Target Release: 8.0   
Hardware: x86_64   
OS: Linux   
URL: https://github.com/redhat-performance/tuned/pull/276/commits/89083a8459e71789c9791aa98eeb74c0cd34105c
Whiteboard:
Fixed In Version: tuned-2.14.0-0.1.rc1.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-04 02:03:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Peter Rival 2019-08-29 15:39:33 UTC
Marvell has requested we add a new tuned profile for their ThunderX2-based and upcoming platforms.  Tuning suggestions so far are:

 - Disable transparent hugepages by default
 - Disable kernel numa_balancing by default
 - Adjust kernel.sched_latency_ns and kernel.sched_migration_cost_ns based on benchmark results

Comment 1 Jaroslav Škarvada 2019-08-30 09:59:21 UTC
Comments from the bug 1746957 are also valid here.

Comment 2 Jaroslav Škarvada 2020-05-07 21:01:20 UTC
According to the previous communication we will probably not add new profile, but we will incorporate the tuning into the existing throughput-performance profile. It will result in better user experience - i.e. user will select throughput-performance profile and arch specific tuning will be automatically applied depending on the architecture Tuned is running on.

Comment 3 Jaroslav Škarvada 2020-05-07 21:06:56 UTC
Can we apply it on all ARM machines which cpuinfo string contains ThunderX? Or on all ARM machines? If not, could you provide list of cpuinfo strings this should be applied on?

Comment 4 Jaroslav Škarvada 2020-05-27 19:12:31 UTC
Also we still don't know values for the kernel.sched_latency_ns and kernel.sched_migration_cost_ns.

Comment 5 Jaroslav Škarvada 2020-06-03 17:35:17 UTC
For now I am using the following 'CPU part' numbers:

0x516,  thunderx2t99
0x0516, thunderx2t99
0xaf,   thunderx2t99
0x0af,  thunderx2t99
0xa1,   thunderxt88
0x0a1,  thunderxt88

Please let me know if you know more.

Also due to the [1] it's possible to force the tuning even on platforms that have new CPU part numbers which are currently unknown to Tuned, e.g. by adding the following to the /etc/tuned/tuned-main.conf:
uname_string = aarch64
cpuinfo_string = CPU part	: 0x0af

[1] https://github.com/redhat-performance/tuned/pull/270

Comment 6 Jaroslav Škarvada 2020-06-03 19:30:42 UTC
According to the previous communication we targeted all AMDs. PR for what we currently have:
https://github.com/redhat-performance/tuned/pull/276/commits/89083a8459e71789c9791aa98eeb74c0cd34105c

Comment 7 Jaroslav Škarvada 2020-06-03 19:37:27 UTC
(In reply to Jaroslav Škarvada from comment #6)
> According to the previous communication we targeted all AMDs. PR for what we
> currently have:
> https://github.com/redhat-performance/tuned/pull/276/commits/
> 89083a8459e71789c9791aa98eeb74c0cd34105c

Of course it's targeting the ARM CPUs from the comment 5, not the AMD :) It's one PR with two commits adding support for both ARM and AMD and I have just copied the wrong description here :)

Comment 9 Ondřej Lysoněk 2020-06-04 13:10:31 UTC
(In reply to Jaroslav Škarvada from comment #5)
> For now I am using the following 'CPU part' numbers:
> 
> 0x516,  thunderx2t99
> 0x0516, thunderx2t99
> 0xaf,   thunderx2t99
> 0x0af,  thunderx2t99
> 0xa1,   thunderxt88
> 0x0a1,  thunderxt88

May I ask where you got these numbers?

Comment 10 Jaroslav Škarvada 2020-06-04 14:41:52 UTC
(In reply to Ondřej Lysoněk from comment #9)
> (In reply to Jaroslav Škarvada from comment #5)
> > For now I am using the following 'CPU part' numbers:
> > 
> > 0x516,  thunderx2t99
> > 0x0516, thunderx2t99
> > 0xaf,   thunderx2t99
> > 0x0af,  thunderx2t99
> > 0xa1,   thunderxt88
> > 0x0a1,  thunderxt88
> 
> May I ask where you got these numbers?

GCC sources. I checked it with the subset of machines I was able to found in Beaker and it matched.

It seems in gcc-10.1.1 there are even more IDs:
0xa0, 0x0a2, 0x0a3, 0x0b8

I will add them.

I was also asking whether we could match the "CPU implementer" with the 0x43, but I didn't get answer to it. It also seems the 0x42 is used. Both seems to be used by Broadcom - maybe it's too rough identification.

Comment 11 Peter Rival 2020-06-05 02:20:30 UTC
Pinging Chris Tatman, our Marvell EPM.  Chris, can we get this in front of Marvell to make sure this is how they'd like to have the CPUs identified?  I believe we can add or alter the list when Altra is supported but I'd like to get their input as we discussed before and during the TRF last week.

Comment 12 Jaroslav Škarvada 2020-06-05 08:36:56 UTC
I used the following regex:
'CPU part\s+:\s+(0x0?516)|(0x0?af)|(0x0?a[0-3])|(0x0?b8)\b'

We can alter it any time by Tuned update. Admin can also override it with help of https://github.com/redhat-performance/tuned/pull/270.

Comment 17 Jiri Hladky 2020-06-22 10:45:43 UTC
Hi Robin,

we have this ThunderX2 system:
https://beaker.engineering.redhat.com/view/armatura1.slevarna.tpb.lab.eng.brq.redhat.com#details

It has ThunderX2 CN9975 - Cavium CPU:
https://en.wikichip.org/wiki/cavium/thunderx2/cn9975

Is that what you are looking for? If so, we can loan the system to you. 

Thanks
Jirka

Comment 28 errata-xmlrpc 2020-11-04 02:03:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (tuned bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4559