Bug 2228124
| Summary: | gcc 12 takes up-to 2.5x longer to compile PHP source code compared to gcc 11 | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Jiri Hladky <jhladky> | ||||
| Component: | gcc | Assignee: | Marek Polacek <mpolacek> | ||||
| gcc sub component: | gcc-toolset-12 | QA Contact: | qe-baseos-tools-bugs | ||||
| Status: | CLOSED NOTABUG | Docs Contact: | |||||
| Severity: | unspecified | ||||||
| Priority: | unspecified | CC: | ahajkova, fweimer, jakub, jhladky, jmario, jvozar, kkolakow, mimehta, ohudlick, sipoyare | ||||
| Version: | 9.2 | Keywords: | Performance | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2023-08-02 13:10:07 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Jiri Hladky
2023-08-01 13:08:37 UTC
Both Intel Icelake and AMD Epyc3 have the runtime degradation: http://reports.perfqe.tpb.lab.eng.brq.redhat.com/testing/sched/reports/Phoronix/amd-epyc3-milan-7313-2s.tpb.lab.eng.brq.redhat.com/RHEL-9.3.0-20230718.0vsRHEL-9.2.0/2023-07-20T14:03:11.500000vs2023-07-20T12:35:20.100000/7e0944de-b3af-591f-bb5c-65f432f6f1fb/index.html#build-php_section http://reports.perfqe.tpb.lab.eng.brq.redhat.com/testing/sched/reports/Phoronix/intel-icelake-gold-6330-2s.lab.eng.brq2.redhat.com/RHEL-9.3.0-20230718.0vsRHEL-9.2.0/2023-07-20T14:03:11.500000vs2023-07-20T12:35:20.100000/ef759836-121e-5ca3-9dbf-1dbddb276410/index.html#build-php_section Michey Mehta has analyzed the slow down and here are the key takeaways: 1) The slowdown is entirely unrelated to GCC being compiled with x86-64-v3. There is no need to use any extra repos to reproduce the problem. Using default GCC v11 from RHEL-9.2 and gcc-toolset-12 clearly shows the problem. 2) Compiling file parse_date.c (this file is a preprocessed version of a file in the PHP sources) shows the problem: Compile using this: time gcc -ftime-report -O2 -fno-tree-vectorize -march=x86-64-v2 -c parse_date.c gcc 11 takes about 7s, gcc 12 takes about 18s (on a AMD EPYC 7573X 32-Core Processor) 3) perf showed that iterate_fix_dominators got the most hits in gcc 12 (about 10%) 4) this is also seen in the ftime-report: for gcc 12, "dominance computation" takes 7s compared to 0.37 on gcc 11. I have reproduced the problem on amd-epyc3-milanx-7573x-2s.lab.eng.brq2.redhat.com with AMD EPYC 7573X 32-Core Processor using these gcc versions. gcc v11: gcc (GCC) 11.3.1 20221121 (Red Hat 11.3.1-4) gcc v12: gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-7) I'm going to upload a tiny self-contained reproducer. Could you please review the slowdown and decide whether this is expected? Thanks a lot Jirka I'm afraid I'm lost in what you're actually measuring. Two versions of the same compiler built with different ISA flags (like -march=...) measured with the same flags used to build PHP (that would show how those ISA flags improve or don't compilation speed on the workload), or the same compiler with 2 different sets of options (e.g. different ISA flags etc.) when building the workload (in this case I'd note that it is far more important whether the generated code is faster/smaller than any compilation speed differences), or comparing two different versions of compiler built with the same ISA flags and with same options on the workload (I'd say that this in this case it is even far more important how well is the generated code optimized than compilation speed), or some weird mix of these (then it is hard to guess). E.g. GCC 12 compared to GCC 11 enables vectorization by default at -O2, while GCC 11 didn't, that can result in larger compile time which greatly pays off if the generated code is faster. Created attachment 1981305 [details]
Standalone reproducer with results
To reproduce the problem, do the following.
Install RHEL-9.2 and use default gcc v11 compiler.
./test.sh
scl enable gcc-toolset-12 'bash'
./test.sh
Compare generated log files. In my case:
$grep real *log
2023-Aug-02_12h24m38s_11.3.1_20221121.log:real 0m7.233s
2023-Aug-02_13h26m43s_12.2.1_20221121.log:real 0m18.097s
$grep "dominance computation" *log
2023-Aug-02_12h24m38s_11.3.1_20221121.log: dominance computation : 0.37 ( 5%) 0.00 ( 0%) 0.39 ( 5%) 0 ( 0%)
2023-Aug-02_13h26m43s_12.2.1_20221121.log: dominance computation : 7.00 ( 40%) 0.00 ( 0%) 7.23 ( 40%) 0 ( 0%)
Hi Jakub, I'm sorry for the confusion. We found the issue when recompiling RHEL-9.2 userspace packages with march=x86-64-v3. As described in comment #2, later, we found that this is entirely unrelated to x86-64-v3 recompilation. The current issue is that this compilation: time gcc -ftime-report -O2 -fno-tree-vectorize -march=x86-64-v2 -c parse_date.c takes 7 seconds with gcc v11 from RHEL-9.2 and 18 seconds with gcc v12 from gcc-toolset-12. Please note that we explicitly disable vectorization to make the comparison more fair. I'm unable to assess if this is a real issue. As you noted, the increased compilation time can pay off if the resulting code is faster. I will leave the decision on you. Could you please get the testcase from comment #4, run it on RHEL-9.2, and judge whether this is a real problem? If yes, I can open a new BZ to avoid confusion. Feel free to close this BZ if this is not a real issue. Thanks a lot for your help! Jirka Sorry, this was a misunderstanding. The original compiler flags I saw did not include -O2, so I assumed the benchmark evaluated the compilation speed without optimization, which is arguably a more well-defined target. With -O2 and other optimization levels, there of course complicated trade-offs between compile-time and run-time performance. I can reproduce it, but current gcc trunk is back at gcc 11 time (17.38s gcc 11, 30.80s gcc 12, 17.59s gcc trunk), so it doesn't seem to be worth even investigating, as we wouldn't be changing GCC 12 because of this anyway. And compile time is hard to bisect on our gcc bisect seed, as everything there is unoptimized builds. |