Bug 1422848
| Summary: | Long compilation time for aarch64 OpenMP enabled application | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Developer Toolset | Reporter: | Petr Sury <psury> | ||||
| Component: | gcc | Assignee: | Jakub Jelinek <jakub> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Michael Petlan <mpetlan> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | DTS 7.0 RHEL 7 | CC: | ernunes, jbastian, jhladky, kanderso, law, mcermak, mnewsome, mpetlan, mpolacek, psury, yselkowi | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | aarch64 | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | devtoolset-7-gcc-7.1.1-7.el7 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-10-24 09:47:20 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1402684 | ||||||
| Attachments: | 
 | ||||||
| 
        
          Description
        
        
          Petr Sury
        
        
        
        
        
          2017-02-16 11:42:42 UTC
        
       Some more comments: 1) We need to use gcc version >= 6.3 See https://bugzilla.redhat.com/show_bug.cgi?id=1389276#c9 2) On HP m400 the compilation takes under 1 minute. The problem is specific to Mustang systems. Jirka With the same command line options (and no -march=native)? Then the only reason I can think of would be you don't have enough memory and swap to death. Both systems (HP m400 and Mustang) has the same amount of RAM - 16GB. @Petr - could you please check the exact command line options being used? (In reply to Jiri Hladky from comment #4) > Both systems (HP m400 and Mustang) has the same amount of RAM - 16GB. The HP m400 systems have 64GB RAM. [root@hp-moonshot-03-c01 ~]# free -g total used free shared buff/cache available Mem: 63 0 61 0 1 57 Swap: 11 0 11 It seems to be related to the -O3 optimizations. I removed that flag from config/make.def and the build time is much better without it. I was watching the system with 'top' while building with -O3 and the memory usage was normal (plenty of free RAM, no swap used), the CPU was just pegged at 100% trying to do O3 optimizations. :::::::::::::: :: With -O3 :: :::::::::::::: [jbastian@centipede NPB3.3-OMP]$ which gfortran /opt/rh/devtoolset-6/root/usr/bin/gfortran [jbastian@centipede NPB3.3-OMP]$ gfortran --version | head -1 GNU Fortran (GCC) 6.3.1 20170118 (Red Hat 6.3.1-2) [jbastian@centipede NPB3.3-OMP]$ time make lu CLASS=C ... gfortran -c -O3 -fopenmp -mcmodel=large rhs.f ^C make: *** [lu] Interrupt real 7m48.872s user 0m4.571s sys 0m0.134s ::::::::::::::::: :: Without -O3 :: ::::::::::::::::: [jbastian@centipede NPB3.3-OMP]$ vi config/make.def [jbastian@centipede NPB3.3-OMP]$ make clean ... [jbastian@centipede NPB3.3-OMP]$ time make lu CLASS=C ... gfortran -c -fopenmp -mcmodel=large rhs.f gfortran -c -fopenmp -mcmodel=large l2norm.f ... gfortran -fopenmp -mcmodel=large -o ../bin/lu.C.x lu.o read_input.o domain.o setcoeff.o setbv.o exact.o setiv.o erhs.o ssor.o rhs.o l2norm.o jacld.o blts.o jacu.o buts.o error.o syncs.o pintgr.o verify.o ../common/print_results.o ../common/timers.o ../common/wtime.o make[2]: Leaving directory '/home/jbastian/NPB/bz1422848/NPB3.3-OMP/LU' make[1]: Leaving directory '/home/jbastian/NPB/bz1422848/NPB3.3-OMP/LU' real 0m3.015s user 0m2.583s sys 0m0.200s Using -O2 also works well: [jbastian@centipede NPB3.3-OMP]$ time make lu CLASS=C ... gfortran -c -O2 -fopenmp -mcmodel=large ssor.f gfortran -c -O2 -fopenmp -mcmodel=large rhs.f gfortran -c -O2 -fopenmp -mcmodel=large l2norm.f ... real 0m6.423s user 0m6.053s sys 0m0.133s Petr, I'm not able to reproduce your HP m400 results. That is, it also gets stuck in O3 optimizations for a very long time for me on an HP m400 system. [jbastian@hp-moonshot-03-c01 NPB3.3-OMP]$ which gfortran /opt/rh/devtoolset-6/root/usr/bin/gfortran [jbastian@hp-moonshot-03-c01 NPB3.3-OMP]$ gfortran --version | head -1 GNU Fortran (GCC) 6.3.1 20170118 (Red Hat 6.3.1-2) [jbastian@hp-moonshot-03-c01 NPB3.3-OMP]$ make clean ... [jbastian@hp-moonshot-03-c01 NPB3.3-OMP]$ time make lu CLASS=C ... gfortran -c -O3 -fopenmp -mcmodel=large rhs.f ^C make: *** [lu] Interrupt real 2m35.502s user 0m4.668s sys 0m0.073s While it was building, the top output (hiding idle processes): top - 16:49:39 up 2 days, 22:46, 2 users, load average: 0.83, 0.32, 0.20 Tasks: 222 total, 3 running, 219 sleeping, 0 stopped, 0 zombie %Cpu(s): 12.5 us, 0.0 sy, 0.0 ni, 87.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 66940352 total, 63613312 free, 1152320 used, 2174720 buff/cache KiB Swap: 11722688 total, 11722688 free, 0 used. 59443008 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24482 jbastian 20 0 455808 309056 16064 R 100.0 0.5 1:44.36 f951 511 root 20 0 0 0 0 S 0.3 0.0 0:04.85 xfsaild/dm+ 24522 root 20 0 126016 8256 3968 R 0.3 0.0 0:00.06 top Jirka, flags are -O3 -fopenmp -mcmodel=large or in some cases without -O3 Jeff, it seams I am not using -O3 on HP machine which explains why I missed performance problem. Hello everyone, short summary of current status. It seems that compilation problem is caused by optimization switch, to be concrete -O3. Compilation time for -O3 is about 50 minutes on both systems (HP m400 and Mustang), -O2 compilation time is about minute or two. As for memory usage, both systems have more than enough memory (64 and 16 GB RAM). /usr/bin/time -v says that Maximum resident set size (kbytes): 342848. (Full log bellow) If you need any more information, please let me know. Log form mustang system Command being timed: "make lu CLASS=C" User time (seconds): 2884.58 System time (seconds): 0.24 Percent of CPU this job got: 100% Elapsed (wall clock) time (h:mm:ss or m:ss): 48:04.61 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 342848 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 27944 Voluntary context switches: 220 Involuntary context switches: 1090 Swaps: 0 File system inputs: 0 File system outputs: 7040 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 65536 Exit status: 0 Log from HP m400 system Command being timed: "make lu CLASS=C" User time (seconds): 3226.08 System time (seconds): 0.25 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 53:46.40 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 342848 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 27944 Voluntary context switches: 275 Involuntary context switches: 994 Swaps: 0 File system inputs: 1096 File system outputs: 7040 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 65536 Exit status: 0 Reproduced with devtoolset-6-gcc on a Mustang. It is enough to compile rhs.f. $ time scl enable devtoolset-6 -- gfortran -c -O3 -fopenmp -mcmodel=large rhs.f Without DTS (using system gcc) or with DTS-7, it compiles in about two seconds. VERIFIED for devtoolset-7-gcc-gfortran-7.2.1-1.el7.aarch64. .... and, results from the time command from comment #18. real 21m59.579s user 0m0.129s sys 0m0.015s (after hitting Ctrl-C) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3016 |