Bug 2227278
| Summary: | On AMD EPYC3, Linpack benchmark compiled with -march=x86-64-v3 runs slower by 22% compared to -march=x86-64-v2 | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Jiri Hladky <jhladky> | ||||
| Component: | gcc | Assignee: | Jakub Jelinek <jakub> | ||||
| Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 38 | CC: | dmalcolm, fweimer, jakub, jlaw, jwakely, mcermak, mpolacek, msebor, nickc, sipoyare | ||||
| Target Milestone: | --- | Keywords: | Performance | ||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2024-05-28 13:37:00 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Jiri Hladky
2023-07-28 15:28:55 UTC
Created attachment 1980485 [details]
A standalone reproducer
Steps to Reproduce:
1. dnf install hwloc-devel
2. make run
3. Compare the performance reported:
grep -PoH "[0-9]+ Kflops" *log
See included README file for more details.
GCC generates a tight loop with the FMA chain. For Zen-based AMD CPUs, the FMA chain with dependency causes a regression. A patch in GCC is being developed https://gcc.gnu.org/legacy-ml/gcc-patches/2017-12/msg01053.html. The solution is to compile it with -mno-fma We will need to wait till the patch is complete and merged into GCC. Created attachment 1981064 [details]
Updated testcase with -mno-fma variant
Update testcase with -mno-fma variant
It includes all results from AMD EPYC 7573X 32-Core server.
1) Log files with results
$ grep -PoH "[0-9]+ Kflops" *log
linpackd_x86-64-v2.log:5682088 Kflops
linpackd_x86-64-v3.log:4452227 Kflops
linpackd_x86-64-v3_mtune_native.log:4506376 Kflops
linpackd_x86-64-v3_mtune_native-no_fma.log:5645570 Kflops
linpackd_x86-64-v3-no_fma.log:5744116 Kflops
2) Instruction usage histogram
$ ls *hist | cat
linpackd_x86-64-v2.hist
linpackd_x86-64-v3.hist
linpackd_x86-64-v3_mtune_native.hist
linpackd_x86-64-v3_mtune_native-no_fma.hist
linpackd_x86-64-v3-no_fma.hist
That patch is in GCC for 5 years, so either it doesn't work in this case, or it is some other related bug.
The current setting is
/* X86_TUNE_AVOID_128FMA_CHAINS: Avoid creating loops with tight 128bit or
smaller FMA chain. */
DEF_TUNE (X86_TUNE_AVOID_128FMA_CHAINS, "avoid_fma_chains", m_ZNVER1 | m_ZNVER2 | m_ZNVER3)
/* X86_TUNE_AVOID_256FMA_CHAINS: Avoid creating loops with tight 256bit or
smaller FMA chain. */
DEF_TUNE (X86_TUNE_AVOID_256FMA_CHAINS, "avoid_fma256_chains", m_ZNVER2 | m_ZNVER3
| m_ALDERLAKE | m_SAPPHIRERAPIDS | m_CORE_ATOM)
What -mtune= you get for -mtune=native in your case? gcc -S -mtune=native -v -xc /dev/null -o /dev/null 2>&1 | grep -v mtune
should show that... If it is -mtune=znver4, perhaps we need to add | m_ZNVER4 to some of those.
With -mtune=generic (the default) this workaround is not in effect.
(In reply to Jakub Jelinek from comment #4) > With -mtune=generic (the default) this workaround is not in effect. And -march=x86-64-v3 should use those default tunings. Maybe we should have an upstream discussion whether we should change the default tuning. We certainly shouldn't change the default tuning (that we tune with -mtune=generic), what we could consider including | m_GENERIC in those. But that requires wide discussions between Intel and AMD, as -mtune=generic is tuning for recent chips from both of those vendors, and it matters how much it gains for some CPUs and how much it makes things slower on others. Fedora Linux 38 entered end-of-life (EOL) status on 2024-05-21. Fedora Linux 38 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora Linux please feel free to reopen this bug against that version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see the version field. If you are unable to reopen this bug, please file a new report against an active release. Thank you for reporting this bug and we are sorry it could not be fixed. |